Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dsmr: implement chunk rate limiter #1922

Merged
merged 16 commits into from
Feb 13, 2025
Merged

Conversation

tsachiherman
Copy link
Contributor

@tsachiherman tsachiherman commented Feb 10, 2025

What ?

This PR adds chunk rate limiting.

How ?

The rules contains now one additional method:

GetMaxAccumulatedProducerChunkWeight() uint64

The storage have an additional function named CheckRateLimit that checks if adding a given chunk would breach the rate limit threshold in the following way:

  • for each producer, we keep an accumulator of the total pending chunks weight.
  • If adding the new chunk would cause this accumulator to exceed the max allowed weight, an error would get generated.

Where ?

The new CheckRateLimit is called from two places:

  • During the ChunkBuilding, right after the chunk is being created, the new chunk is being validated. This would ensure that an honest node wouldn't generate a chunk that would exceed the rate limit threshold.
  • During ChunkSignatureRequestVerifier, a call was added to ensure that inbound requests would not result in adding a chunk that would exceed the limit.

Note

This is a subtle version of #1916, which was time-window aware.

@tsachiherman tsachiherman self-assigned this Feb 10, 2025
@tsachiherman tsachiherman marked this pull request as ready for review February 10, 2025 19:57
@tsachiherman tsachiherman linked an issue Feb 10, 2025 that may be closed by this pull request
x/dsmr/p2p.go Outdated
Comment on lines 138 to 143
if err := c.storage.CheckRateLimit(chunk); err != nil {
return &common.AppError{
Code: p2p.ErrUnexpected.Code,
Message: err.Error(),
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this above c.storage.GetChunkBytes(...) ?

It's preferred to perform the lightest checks first, so that if a request will fail, we minimize the resources used to fail it. This is not a critical difference since a malicious peer can inspect our code and craft a response that will fail at a specific check, but it's a good practice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(unrelated to this PR) I think the above check if the bytes are present is unnecessary.

Since we ignore the result of the query, the only case where it changes our handling is if we hit an unexpected database error (ie. not database.ErrNotFound). I think we're just wasting a potential database read here.

Comment on lines 471 to 479
// find the chunk that corresponds to this expiry in the chunks slice.
chunkIndex := -1
for i, chunk := range chunks {
if chunk.Expiry == chunkExpiry {
// found.
chunkIndex = i
break
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use slices.Index or slices.IndexFunc helper here, which does the same search for an equivalent value?

require.NoError(storage.AddLocalChunkWithCert(chunk, nil))
}

if testCase.minTime != 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this check creates the possibility of a test case where a field is accidentally ignored ie. minTime = 0, but there's a non-zero number of accepted chunks, which will be ignored.

Could we remove this possibility by just removing this check and relying on the fact that SetMin(0, nil) will be a no-op that should be safe to call.

Comment on lines +435 to +443
{
name: "success - accepted block clear previous limit",
expiryTimes: []int64{0, 50, 100},
newChunkExpiry: 150,
weightLimit: chunkSize * 3,
expectedErr: nil,
minTime: 10,
acceptedChunksExpiry: []int64{50},
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should accepting a block clear it from the limit within a window?

I think it should still count against the rate limit because it was consumed within that window.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a matter of framing, right ? Just think about this as the side of unaccepted chunks per producer.
Can you think of any clear advantage of maintaining a separate list of accepted-but-not-yet-expired chunks ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I think it's preferable that accepting a chunk does not allow the same validator to consume the resources within that window of time, but this can be changed without a network upgrade, so think this is fine for now.

If we do this over a time window (irrespective of accepted/pending), then we have a clear upper bound on the max amount of chunk storage consumed by a single validator. If we clear the chunks when they are accepted, then we don't have a well defined limit because once the chunks are accepted the validator that produced it can consume more resources.

x/dsmr/storage.go Outdated Show resolved Hide resolved
@@ -228,20 +238,24 @@ func (s *ChunkStorage[T]) VerifyRemoteChunk(c Chunk[T]) (*warp.BitSetSignature,
return nil, nil
}

// putVerifiedChunk assumes that the given chunk is gurenteed not to surpass the producer's rate limit.
// The rate limit is being checked via a call to CheckRateLimit from BuildChunk as well as by VerifyRemoteChunk.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make sure this comment is up to date? This is called to self-regulate within BuildChunk and it is called in the p2p handler ChunkSignatureRequestVerifier to rate limit peers, but this is not called in VerifyRemoteChunk atm.

Also, with the current code, it's not safe to call this within VerifyRemoteChunk because we call VerifyRemoteChunk within Accept. This means that if a peer consumed more bandwidth, this could fail within accept (which would fatal).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

tsachiherman and others added 2 commits February 13, 2025 14:42
Co-authored-by: aaronbuchwald <[email protected]>
Signed-off-by: Tsachi Herman <[email protected]>
x/dsmr/storage.go Outdated Show resolved Hide resolved
Signed-off-by: aaronbuchwald <[email protected]>
@aaronbuchwald aaronbuchwald enabled auto-merge (squash) February 13, 2025 22:44
@aaronbuchwald aaronbuchwald merged commit 2ad6559 into main Feb 13, 2025
10 checks passed
@aaronbuchwald aaronbuchwald deleted the tsachi/ratelimiter-simplified branch February 13, 2025 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement chunk rate limiter
2 participants