[Feature] Leader Rotation Scoring #4797

GheisMohammadi · 2024-11-11T14:55:09Z

Scoring each validator based on its performance before assigning it as the next leader is an excellent way to maintain network stability and ensure that the leader role goes to a reliable node. Here are several approaches you could take to create a performance-based scoring system for selecting the next leader:

1. Average Block Production Time

Metric: Track the time taken by each validator to produce blocks during its previous leadership periods, calculating an average block production time.
Evaluation: Define a threshold or target time. Validators with a block production time below this threshold can receive a higher score. Validators that regularly produce blocks more slowly might receive a penalty, indicating they are less suitable for leadership.
Implementation: Store block production timestamps for each leader and compute a rolling average. Add weights if recent blocks are more indicative of the validator’s current state than older ones.

2. Block Finality Rate

Metric: Measure how quickly and consistently a validator’s blocks are finalized. A leader that creates blocks with fewer rejections or reorgs indicates a stable performance.
Evaluation: Calculate the finality rate by dividing the number of blocks accepted into the main chain by the total blocks produced. Validators with a higher finality rate (closer to 100%) receive a higher score.
Implementation: Track and update the finality rate periodically, possibly at the end of each epoch, and penalize validators that have lower rates or have blocks frequently rejected.

3. Network Latency and Response Time

Metric: Measure the latency between when a leader sends PREPARE messages and when the network responds, as well as the response times for other key consensus messages.
Evaluation: Nodes with lower latencies are likely better suited for the leader role. Average the latency and response times over the last several blocks or epochs.
Implementation: During consensus rounds, capture timestamp differences for each validator’s responses to consensus messages. Calculate average latencies and rank nodes accordingly.

4. Historical Uptime and Availability

Metric: Track each validator’s uptime and availability over previous epochs, assigning higher scores to validators with high availability.
Evaluation: Set a threshold for minimum availability (e.g., 99% uptime). Validators that meet or exceed this threshold consistently rank higher for leadership.
Implementation: Continuously monitor each validator’s uptime by tracking missed blocks or messages during each consensus round.

5. Penalty for Dropped Messages and Skipped Blocks

Metric: Track the number of consensus messages each validator drops or the number of blocks it skips during its leadership period.
Evaluation: Each missed message or skipped block decreases the validator’s score slightly. Validators that skip blocks often or frequently miss consensus messages are less suitable for leadership.
Implementation: Keep a rolling count of dropped messages and missed blocks for each validator. Apply penalties directly to the validator’s score for each dropped message or skipped block.

6. Weighted Score for Recent Performance

Metric: Combine recent performance metrics (e.g., block production time, latency, finality rate) with a heavier weight on the most recent data.
Evaluation: Give more recent performance a higher weight, as it reflects the validator's current reliability. This approach makes the score adaptive to changes in validator performance, helping adjust leader rotations if a node’s performance declines.
Implementation: Use a weighted scoring formula with factors such as

0.5 * recent performance + 0.3 * mid-term performance + 0.2 * historical performance.

7. Penalty for Rejected or Orphaned Blocks

Metric: Track blocks proposed by each validator that were eventually rejected or orphaned by the network.
Evaluation: Each rejected block reduces the validator’s score, while blocks included in the main chain positively impact the score. Validators with fewer rejected blocks should rank higher for leadership.
Implementation: Maintain a count of accepted and rejected blocks. Each validator's score is adjusted based on this ratio over time.

8. Adaptive Scoring with Self-Healing

Metric: Instead of statically scoring, create a self-healing scoring model where validators that improve over time regain their score.
Evaluation: Validators that were previously penalized (e.g., for missing blocks) can regain their score by maintaining a good track record over a set period or number of blocks.
Implementation: Apply a penalty decay where penalized scores gradually improve if the validator performs consistently without additional issues.

Example Scoring Formula
To implement a composite score, you might combine these metrics in a formula, for example:

Score=(W1⋅Block Production Time)+(W2⋅Finality Rate)+(W3⋅Uptime)−(W4⋅Rejected Blocks)

Where W1,W2,W3 and W4 are weights that balance the impact of each factor based on its importance in maintaining a reliable leader.

Leader Selection Based on Score
Threshold: Set a minimum score threshold for leader eligibility. Only validators with scores above this threshold are eligible.
Dynamic Rotation: Instead of a fixed rotation, select the validator with the highest score as the leader and only rotate if performance drops below a threshold.
This scoring system could be integrated into the rotateLeader function, allowing it to evaluate each candidate’s score before designating the next leader.

Validator Scoring Integration
To integrate a scoring system into the rotateLeader function, we can add a scoring mechanism that evaluates each validator based on a few key metrics. We’ll define and calculate scores for each metric in a new function, calculateLeaderScore, which can then be used within rotateLeader to assess whether a validator is eligible for leadership. Here's how we could set it up:

func (consensus *Consensus) rotateLeader(epoch *big.Int, defaultKey *bls.PublicKeyWrapper) *bls.PublicKeyWrapper {
    // Existing setup logic...
    ...
    // Fetch committee members
    ...
    // Caclculate best candidate
    var bestCandidate *bls.PublicKeyWrapper
    highestScore := -1
    for _, slot := range committee.Slots {
       candidate := slot.EcdsaAddress()
       candidateScore := consensus.calculateLeaderScore(candidate, epoch, committee.Slots, blocksCountAliveness)
        if candidateScore > highestScore {
            highestScore = candidateScore
            bestCandidate = candidate
        }
    }
    if bestCandidate != nil {
        return bestCandidate
    }
    return defaultKey
}

In this modified rotateLeader:

Each candidate's score is calculated based on their recent performance metrics.
The candidate with the highest score becomes the next leader. If no candidate meets the criteria, the function defaults to the current leader.

Fine-tuning Scoring and Testing

1. Tuning Weights and Thresholds: Adjust weights and thresholds based on real data to achieve optimal balance between stability and performance.

2. Testing: We need to test this rotation mechanism, especially with different leader performance scenarios, to ensure the leader rotation remains robust and responsive.

The text was updated successfully, but these errors were encountered:

GheisMohammadi added the enhancement New feature or request label Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Leader Rotation Scoring #4797

[Feature] Leader Rotation Scoring #4797

GheisMohammadi commented Nov 11, 2024

[Feature] Leader Rotation Scoring #4797

[Feature] Leader Rotation Scoring #4797

Comments

GheisMohammadi commented Nov 11, 2024