-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/compact: preserve blob references #4366
Conversation
1c0fcea
to
1122a0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 6 files at r1.
Reviewable status: 3 of 6 files reviewed, 4 unresolved discussions (waiting on @RaduBerinde)
a discussion (no related file):
Just a high-level question about the approach -- I'll read carefully after that is resolved.
internal/compact/iterator.go
line 286 at r1 (raw file):
// // If PreserveBlobReferences is false, the compaction iterator may still // output InternalValues backed by blob references. Callers should be
I'm getting a bit confused by both seeing this in the iterator, and the various optionality in the comments ("may set this to true" etc.).
Shouldn't the iterator just expose LazyValue with the blob reference regardless of whether the compaction wants to preserve the reference or not, and let the caller make the decision? I think we were doing something like that in the prototype, and the only place where the compactionIter
would retrieve the underlying value was in mergeNext
.
internal/compact/iterator.go
line 288 at r1 (raw file):
// output InternalValues backed by blob references. Callers should be // prepared to handle InternalValues backed by blob references regardless of // the this value. Compactions that rewrite blob files will set this to
nit: regardless of this value
internal/compact/iterator.go
line 903 at r1 (raw file):
// of whether the compaction is attempting to preserve existing blob // references. If the compaction is preserving blob references, the // resulting value will end up in-place within the sstable.
Odd to see this comment here, given that is a decision made in the writer. Relates to my previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 6 files reviewed, 4 unresolved discussions (waiting on @RaduBerinde and @sumeerbhola)
internal/compact/iterator.go
line 286 at r1 (raw file):
Previously, sumeerbhola wrote…
I'm getting a bit confused by both seeing this in the iterator, and the various optionality in the comments ("may set this to true" etc.).
Shouldn't the iterator just expose LazyValue with the blob reference regardless of whether the compaction wants to preserve the reference or not, and let the caller make the decision? I think we were doing something like that in the prototype, and the only place where thecompactionIter
would retrieve the underlying value was inmergeNext
.
The logic in saveValue
is the crux. When we're saving a value that is a blob reference, we can either Clone or retrieve it. There's no point in cloning if we're ultimately going to retrieve it in the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 6 files reviewed, 4 unresolved discussions (waiting on @jbowens and @RaduBerinde)
internal/compact/iterator.go
line 286 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
The logic in
saveValue
is the crux. When we're saving a value that is a blob reference, we can either Clone or retrieve it. There's no point in cloning if we're ultimately going to retrieve it in the caller.
So if we simply expose the LazyValue
, I think what you are saying is that the Clone
will first make a copy of the ValueOrHandle
(which is small for a blob reference) and then later have to copy out the value too. While with this extra information we can avoid the copy of the ValueOrHandle
? Is that correct?
Is this worth optimizing, given most compactions will reuse the blob reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 6 files reviewed, 4 unresolved discussions (waiting on @RaduBerinde and @sumeerbhola)
internal/compact/iterator.go
line 286 at r1 (raw file):
Previously, sumeerbhola wrote…
So if we simply expose the
LazyValue
, I think what you are saying is that theClone
will first make a copy of theValueOrHandle
(which is small for a blob reference) and then later have to copy out the value too. While with this extra information we can avoid the copy of theValueOrHandle
? Is that correct?
Is this worth optimizing, given most compactions will reuse the blob reference?
I think so because it also simplifies the lifecycle: we either clone the value or retrieve it. This was part of the intention of #4198.
36e1961
to
45e455c
Compare
bfaa8dc
to
ebb2512
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 14 files reviewed, 4 unresolved discussions (waiting on @RaduBerinde and @sumeerbhola)
a discussion (no related file):
Previously, sumeerbhola wrote…
Just a high-level question about the approach -- I'll read carefully after that is resolved.
Alright, I removed the logic that avoided unnecessary clones. This is ready for a review. Apologies there were some intermediate diffs that are now extracted out into #4376. Probably best to just review the entirety of the diff against base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 14 files reviewed, 8 unresolved discussions (waiting on @jbowens and @sumeerbhola)
a discussion (no related file):
Previously, jbowens (Jackson Owens) wrote…
Alright, I removed the logic that avoided unnecessary clones. This is ready for a review. Apologies there were some intermediate diffs that are now extracted out into #4376. Probably best to just review the entirety of the diff against base.
[nit] In the future, we may want to implement a more ergonomic test facility for blob refs, e.g. something where you define it with a friendly name e.g. blobref(a)
. The test can generate the filenum and stuff internally and keep it around so it can convert back when printing.
internal/compact/iterator_test.go
line 280 at r4 (raw file):
} type mockBlobValueFetcher struct{}
[nit] Could use a comment, like
// mockBlobValueFetcher is a dummy ValueFetcher implementation which produces string values
// reference the blobref.
internal/compact/iterator_test.go
line 287 at r4 (raw file):
ctx context.Context, handle []byte, fileNum base.DiskFileNum, valLen uint32, buf []byte, ) (val []byte, callerOwned bool, err error) { s := fmt.Sprintf("<a fetched value from blobref(%s, encodedHandle=%x, valLen=%d)>",
[nit] "a " seems extraneous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r1, 3 of 3 files at r2, 2 of 8 files at r4, all commit messages.
Reviewable status: 8 of 14 files reviewed, 8 unresolved discussions (waiting on @jbowens and @RaduBerinde)
a discussion (no related file):
Adapt the compaction iterator to preserve blob file references by cloning the LazyValues. Compactions that do not rewrite blob files will avoid ever retrieving the corresponding values. Additionally, fix LazyValue.Clone to copy the BlobFileNum of the LazyFetcher. Informs cockroachdb#112.
ebb2512
to
a261ea4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
Reviewable status: 6 of 14 files reviewed, 5 unresolved discussions (waiting on @RaduBerinde and @sumeerbhola)
internal/compact/testdata/iter
line 1132 at r4 (raw file):
Previously, RaduBerinde wrote…
[nit] In the future, we may want to implement a more ergonomic test facility for blob refs, e.g. something where you define it with a friendly name e.g.
blobref(a)
. The test can generate the filenum and stuff internally and keep it around so it can convert back when printing.
Yeah, makes sense
Adapt the compaction iterator to preserve blob file references by cloning the
LazyValues. Compactions that do not rewrite blob files will avoid ever
retrieving the corresponding values.
Additionally, fix LazyValue.Clone to copy the BlobFileNum of the LazyFetcher.
Informs #112.