-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: add raft.proposal.leader.applied.latency #143094
base: master
Are you sure you want to change the base?
Conversation
a51aa19
to
186829a
Compare
e802284
to
66da83a
Compare
This commit adds raft.proposal.leader.applied.latency to StoreMetrics. The duration elapsed between a proposal being submitted to Raft and its local Raft leader application to the state machine. While similar to 'raft.replication.latency', this metric starts counting only after the proposal is being submitted to Raft, excluding earlier processing steps such as acquiring for proposal quota. Only successful write commands are measured. Measurements are not recorded for: - Failed proposals - Read-only commands - Read-write commands that don't result in actual writes Note that this metric only captures application on the proposing leader node itself but not including the time taken for follower replicas to apply the changes. Informs: cockroachdb#72393 Release note: none
This commit adds raft.proposal.leader.ack.latency to StoreMetrics. This metrics tracks the duration elapsed between a proposal being submitted to Raft and its local Raft leader's acknowledgement on the success to the client. Note that this may not include the time taken to apply the command to the state machine for asynchronous consensus. Only successful write commands are measured. Measurements are not recorded for: - Failed proposals - Read-only commands - Read-write commands that don't result in actual writes Informs: cockroachdb#72393 Release note: none
I know that Nathan proposed this in the original issue, but these two metrics seem very similar. The only difference is that cockroach/pkg/kv/kvserver/replica_raft.go Lines 1092 to 1094 in 2e9012e
which is just a bit earlier in the same handleRaftReady cycle that also applies the command. We already measure the latency of the entire ready cycle (in a histogram), so I'm not sure why we would need both metrics - histograms are fairly expensive (#137780 comes to mind - an overhead we haven't been able to fully remove even after @dhartunian spent plenty of time trying to optimize it away). Also, Long story short, I'm hesitant to grow the zoo around replication-related metrics with very similar ones. Instead, a metric that show the true replication lag - how long does it take to get a quorum, or how much is inflight - would be more useful because they'd give us deeper insight into the replication layer. It's tricky to get this right, I think. Half the battle is thinking it through ahead of time. I see in #143094 that you don't need thee metrics for your current project, so putting this on ice would be an option. Let me know if you think I'm missing something here! |
TFTR! Agreed that these two metrics are very similar to
Curious to learn more here - in this scenario, do clients continue to wait here cockroach/pkg/kv/kvserver/replica_write.go Line 215 in d81743d
cockroach/pkg/kv/kvserver/apply/task.go Lines 225 to 230 in c83c57d
|
The "stats" name overloading strikes again! #137780 is not related to the histograms @dhartunian optimized, it is related to SQL table stats histograms, i.e., histograms of SQL column values. After landing various optimizations there, I no longer measure significant overhead from them. #143230 will close that issue. |
🙈 apologies, I meant to link #133306. |
@wenyihu6: see here: cockroach/pkg/kv/kvserver/replica_application_cmd.go Lines 104 to 125 in 948e5fb
Most "regular" intent writes are async consensus at this point. So EndTxn is the important case that hits the optimization. |
kvserver: add raft.proposal.leader.applied.latency
This commit adds raft.proposal.leader.applied.latency to StoreMetrics.
The duration elapsed between a proposal being submitted to Raft and its local
Raft leader application to the state machine.
While similar to 'raft.replication.latency', this metric starts counting only
after the proposal is being submitted to Raft, excluding earlier processing
steps such as acquiring for proposal quota.
Only successful write commands are measured. Measurements are not recorded for:
Note that this metric only captures application on the proposing leader node
itself but not including the time taken for follower replicas to apply the
changes.
Informs: #72393
Release note: none
kvserver: add raft.proposal.leader.ack.latency
This commit adds raft.proposal.leader.ack.latency to StoreMetrics.
This metrics tracks the duration elapsed between a proposal being submitted to
Raft and its local Raft leader's acknowledgement on the success to the client.
Note that this may not include the time taken to apply the command to the
state machine for asynchronous consensus.
Only successful write commands are measured. Measurements are not recorded for:
Informs: #72393
Release note: none