-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats: reduce overhead of distinct estimation #140772
Labels
A-sql-table-stats
Table statistics (and their automatic refresh).
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
T-sql-queries
SQL Queries Team
target-release-25.2.0
Comments
axiomhq/hyperloglog#43 should be a fairly significant improvement that was introduced in v0.2.1 (we are currently on v0.2.0). So an upgrade might benefit us. I also found some more minor improvements and submitted a PR here: axiomhq/hyperloglog#50 |
mgartner
added a commit
to mgartner/cockroach
that referenced
this issue
Mar 18, 2025
The hyperloglog library has been upgraded from v0.2.0 to v0.2.5. See the commits in this upgrade here: axiomhq/hyperloglog@v0.2.0...v0.2.5 Fixes cockroachdb#140772 Release note: None
mgartner
added a commit
to mgartner/cockroach
that referenced
this issue
Mar 18, 2025
The hyperloglog library has been upgraded from v0.2.0 to v0.2.5. See the commits in this upgrade here: axiomhq/hyperloglog@v0.2.0...v0.2.5 Fixes cockroachdb#140772 Release note: None
mgartner
added a commit
to mgartner/cockroach
that referenced
this issue
Mar 18, 2025
The hyperloglog library has been upgraded from v0.2.0 to v0.2.5. See the commits in this upgrade here: axiomhq/hyperloglog@v0.2.0...v0.2.5 Fixes cockroachdb#140772 Release note: None
craig bot
pushed a commit
that referenced
this issue
Mar 20, 2025
142979: kvserver: update raft log stats with trunc state r=tbg a=pav-kv This PR iterates on the raft log truncation code, and makes it more consolidated. It also fixes one bug: the log size update is now done in the same `Replica.mu` critical section with the `RaftTruncatedState` update. This is achieved by moving the truncated files size computation from post-apply to pre-apply stage. The latter change bears no performance implications because the pre-apply stage already reads from FS when determining whether a truncation affects any sideloaded entries. Epic: none Release note: none 143087: go.mod: update hyperloglog to v0.2.5 r=mgartner a=mgartner The hyperloglog library has been upgraded from v0.2.0 to v0.2.5. See the commits in this upgrade here: axiomhq/hyperloglog@v0.2.0...v0.2.5 Fixes #140772 Release note: None Co-authored-by: Pavel Kalinnikov <[email protected]> Co-authored-by: Marcus Gartner <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-sql-table-stats
Table statistics (and their automatic refresh).
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
T-sql-queries
SQL Queries Team
target-release-25.2.0
We currently use
hyperloglog
library to estimate the number of distinct elements. I just collected a 50s cpu profile non-nil datum alloc (about the time it took for ANALYZE to complete) on a cluster that only hadANALYZE tpcc.customer
running, and this distinct estimation is the most expensive part of the stats collection (this was on dbb0baa plus a revert of 2831511 and another commit to introduce a cluster setting for using nil or non-nil DatumAlloc in stats):We should investigate whether it's possible to reduce this overhead. We recently upgraded the hyperloglog library, so there is no quick fix like that :/
There were some ideas floated around that we could avoid this expensive computation altogether for key columns if we were to scan the secondary indexes.
Related to #135988.
nil datum alloc
Jira issue: CRDB-47355
The text was updated successfully, but these errors were encountered: