structlogging: restructure hot range logger for testability #142996

angles-n-daemons · 2025-03-17T17:59:38Z

structlogging: restructure hot range logger for testability

This change does a few things to improve the testability of the hot
ranges logger. The includes:

Logger:

The introduction of a shouldLog function, which determines whether
the system should log or not.
The breakout of the logging action into its own function.

Tests:

The addition of a setup and teardown utility for the hot ranage
logger.
The breakout of the default case and a timed case.
Tests for hot ranges which exist in the system to start.

Fixes: #142995
Epic: CRDB-43150

cockroach-teamcity · 2025-03-17T17:59:50Z

This change is

xinhaoz · 2025-03-18T15:25:49Z

pkg/server/structlogging/hot_ranges_log.go

@@ -24,6 +24,13 @@ import (
 // ReportTopHottestRanges limits the number of ranges to be reported per iteration
 const ReportTopHottestRanges = 5

+// HotRangeLogManualTicker is a channel that can be used to force the hot range
+// the logging task to tick.


nit: is this an extra the ?

ah it is, I'll get rid of it.

xinhaoz · 2025-03-18T18:09:44Z

pkg/server/structlogging/hot_ranges_log.go

+// logHotRanges collects the hot ranges from this node's status server and
+// sends them to the TELEMETRY log channel.
+func (s *hotRangesLoggingScheduler) logHotRanges(ctx context.Context, stopper *stop.Stopper) {
+	// early exit conditions


nit: use full sentences for comments

This can actually be removed.

xinhaoz · 2025-03-18T18:22:29Z

pkg/server/structlogging/hot_ranges_log.go

+// Within normal operation, there will only be one function listening to this
+// ticker, but in the tests there may be multiple "nodes" within the process.
+// Tests then will need to send multiple requests, to trigger all the nodes.
+var HotRangeLogManualTicker = make(chan struct{}, 0)


What if we provided a way to mock the ticker being used in tests instead of having both a manual test ticker and the regular one?

Not easily, because the channel isn't exposed in a way that an interface can be built around it. Would you prefer it? I can give an example of what it would look like here, or provide an alternative utility.

xinhaoz · 2025-03-18T18:24:38Z

pkg/server/structlogging/hot_ranges_log_test.go

+	time.Sleep(intervalDuration * 2)
+	testutils.SucceedsSoon(t, func() error {


Typically we discourage test.Sleep to wait out intervals. You're using SucceedsSoon already so there's no need to sleep.

Makes sense, I'll get rid of this one.

xinhaoz · 2025-03-18T18:27:34Z

pkg/server/structlogging/hot_ranges_log_test.go

+
+	// very that there's no logged hot ranges, despite the system ticking
+	structlogging.HotRangeLogManualTicker <- struct{}{}
+	if time.Since(start) > intervalDuration {


This part seems a little fragile. From what I understand we want to test that no logs are sent before we've waited out the duration. Maybe there's a better way to do this. Can we check timestamps in the log spy?

Although, if we move to mock the ticker directly I think that might be more straightforard. I'm not really sure what the purpose of the manual ticker is.

This test we can actually get rid of. The ranges themselves actually won't come online before some period of time - so any waits may not do what we want.

xinhaoz · 2025-03-18T18:29:35Z

pkg/server/structlogging/hot_ranges_log_test.go

+	structlogging.TelemetryHotRangesStatsInterval.Override(ctx, &ts.ClusterSettings().SV, time.Millisecond)
+	structlogging.TelemetryHotRangesStatsLoggingDelay.Override(ctx, &ts.ClusterSettings().SV, 0*time.Millisecond)
+
+	structlogging.HotRangeLogManualTicker <- struct{}{}


Why do we need this if we have SucceedsSoon?

We don't necessarily need it - I'll remove it for now.

xinhaoz · 2025-03-25T20:21:46Z

pkg/server/structlogging/hot_ranges_log.go

+// shouldLog checks the below conditions to see whether it should emit logs.
+//   - Is the cluster setting server.telemetry.hot_ranges_stats.enabled true?
+func (s *hotRangesLoggingScheduler) shouldLog() bool {
+	return TelemetryHotRangesStatsEnabled.Get(&s.st.SV)
+}


Maybe this is a stylistic preference, but this function just checks 1 cluster setting and there's also only one caller of shouldLog as far as I can tell. Maybe we can just inline this call in maybeLogHotRanges?

No, your question is exactly correct - for now this function acts only as a passthrough. Part of the reasoning behind this is the additional checks which will shortly follow this PR. In the future the function will look like:

func shouldLog(lastLogTime time.Time, topReplicaCPU ) bool { setttingIntervalDuration := IntervalSetting.Get(&s.st.SV) settingEnabled := EnabledSetting.Get(&s.st.SV) if settingEnabled && time.Since(lastLogTime) > settingIntervalDuration { return true } if topReplicaCPU > LogCPUThreshold { return true } return false }

We're going to move the ticker interval to something small like 1m. From there we aim to log at the cluster setting's interval, or when the node in question goes over a certain load.

xinhaoz · 2025-03-26T15:14:24Z

pkg/server/structlogging/hot_ranges_log_test.go

-		// Get first 5 logs since the logging loop may have fired multiple times.
-		// We should have gotten 5 distinct range ids, one for each split point above.
-		logs := spy.Logs()[:5]
+		// Depend on a range which we don't exist to go anywhere.


I don't quite understand this comment.

Would it make sense to say "Look for a descriptor, which we always expect to exist in the system."

Yeah that sgtm! Ah, did you mean expect in the original comment?

Ahh yes! yeah - the original wording doesn't make too much sense.

xinhaoz · 2025-03-26T15:16:36Z

pkg/server/structlogging/hot_ranges_log_test.go

-			return nil
-		})
-		structlogging.TelemetryHotRangesStatsInterval.Override(ctx, &ts.ClusterSettings().SV, 1*time.Hour)
+// TestHotRangesStatsTenants tests that hot ranges stats are logged per node.


nit: Commented test name doesn't match - also I'm not sure the comment actually reflects what's being tested anymore.

Whoops, wrong PR, this should be changed now.

xinhaoz

LGTM, thanks for the updates!

This change does a few things to improve the testability of the hot ranges logger. The includes: __Logger__: * The introduction of a shouldLog function, which determines whether the system should log or not. * The breakout of the logging action into its own function. __Tests__: * The addition of a setup and teardown utility for the hot ranage logger. * The breakout of the default case and a timed case. * Tests for hot ranges which exist in the system to start. Fixes: cockroachdb#142995 Epic: CRDB-43150 Release note: None

angles-n-daemons · 2025-03-26T18:33:02Z

np, thanks for the review!

angles-n-daemons · 2025-03-26T19:32:10Z

bors r+

craig · 2025-03-26T20:12:28Z

Build succeeded:

angles-n-daemons requested a review from a team as a code owner March 17, 2025 17:59

angles-n-daemons requested review from xinhaoz and removed request for a team March 17, 2025 17:59

angles-n-daemons force-pushed the hot-range-logger-cleanup branch 2 times, most recently from ab85aff to 93bbfb0 Compare March 17, 2025 18:01

xinhaoz reviewed Mar 18, 2025

View reviewed changes

angles-n-daemons force-pushed the hot-range-logger-cleanup branch 6 times, most recently from 5be14fd to 3a0700e Compare March 21, 2025 16:32

xinhaoz reviewed Mar 26, 2025

View reviewed changes

angles-n-daemons force-pushed the hot-range-logger-cleanup branch from 3a0700e to 248495c Compare March 26, 2025 15:43

xinhaoz approved these changes Mar 26, 2025

View reviewed changes

angles-n-daemons force-pushed the hot-range-logger-cleanup branch from 248495c to d13ad97 Compare March 26, 2025 18:23

craig bot merged commit 07b1421 into cockroachdb:master Mar 26, 2025
23 of 24 checks passed

celeste-cockroachdb bot added the target-release-25.2.0 label Mar 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

structlogging: restructure hot range logger for testability #142996

structlogging: restructure hot range logger for testability #142996

angles-n-daemons commented Mar 17, 2025 •

edited

Loading

cockroach-teamcity commented Mar 17, 2025

xinhaoz Mar 18, 2025

angles-n-daemons Mar 19, 2025

xinhaoz Mar 18, 2025

angles-n-daemons Mar 19, 2025

xinhaoz Mar 18, 2025

angles-n-daemons Mar 19, 2025 •

edited

Loading

xinhaoz Mar 18, 2025

angles-n-daemons Mar 19, 2025

xinhaoz Mar 18, 2025

angles-n-daemons Mar 19, 2025

xinhaoz Mar 18, 2025

angles-n-daemons Mar 19, 2025

xinhaoz Mar 25, 2025

angles-n-daemons Mar 26, 2025

xinhaoz Mar 26, 2025

angles-n-daemons Mar 26, 2025 •

edited

Loading

xinhaoz Mar 26, 2025

angles-n-daemons Mar 26, 2025

xinhaoz Mar 26, 2025

angles-n-daemons Mar 26, 2025 •

edited

Loading

angles-n-daemons Mar 26, 2025

xinhaoz left a comment

angles-n-daemons commented Mar 26, 2025

angles-n-daemons commented Mar 26, 2025

craig bot commented Mar 26, 2025

		time.Sleep(intervalDuration * 2)
		testutils.SucceedsSoon(t, func() error {

structlogging: restructure hot range logger for testability #142996

structlogging: restructure hot range logger for testability #142996

Conversation

angles-n-daemons commented Mar 17, 2025 • edited Loading

cockroach-teamcity commented Mar 17, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angles-n-daemons Mar 19, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angles-n-daemons Mar 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angles-n-daemons Mar 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinhaoz left a comment

Choose a reason for hiding this comment

angles-n-daemons commented Mar 26, 2025

angles-n-daemons commented Mar 26, 2025

craig bot commented Mar 26, 2025

angles-n-daemons commented Mar 17, 2025 •

edited

Loading

angles-n-daemons Mar 19, 2025 •

edited

Loading

angles-n-daemons Mar 26, 2025 •

edited

Loading

angles-n-daemons Mar 26, 2025 •

edited

Loading