-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: use origin timestamp to validate lww semantics #143100
kv: use origin timestamp to validate lww semantics #143100
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this seems reasonable to me.
I left some nits but take them or leave them.
require.NoError(t, err) | ||
|
||
// Verify the value was deleted | ||
gr, err := db.Get(ctx, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted, you could construct a get with the option to return full mvcc value headers to verify that the value header was set. But I think the mvcc history tests probably cover that well enough so I leave it to your judgement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that actually possible? I considered this when writing the test, but cmd_get.go
has no way to set Tombstone: true on the MVCCGetOptions.
getRes, err := storage.MVCCGet(ctx, readWriter, args.Key, h.Timestamp, storage.MVCCGetOptions{ |
And I decided it wasn't worth the trouble to figure out how to use scan for this since the LDR tests and the mvcc tests verify the actual origin timestamp.
17398d3
to
d2185fa
Compare
The PR updates the implementation of put, cput, delete, and range delete to return an error if a replicated write with an origin timestamp would overwrite a value with a more recent origin or MVCC timestamp. No explicit version checks are included. In a mixed-version environment, a SQL writer running 25.1 and writing to a 25.2 KV server will encounter "condition failed" errors if an insert would overwrite a tombstone and retry the write until it is upgraded or the LDR PTS ages out. This is acceptable, as LDR remains in preview and the SQL writer is neither the default nor the recommended writer. Release note: none Epic: CRDB-48647
d2185fa
to
9336d4f
Compare
These statements take advantage of the KV level LWW validation added by PR cockroachdb#143100 and do not explicitly validate the origin timestamp. Update and Delete specify the whole row in the where clause, which is intended to enable future SQL optimizations that generate a CPUT to replace the row instead Release note: none Epic: CRDB-48647
These statements take advantage of the KV level LWW validation added by PR cockroachdb#143100 and do not explicitly validate the origin timestamp. Update and Delete specify the whole row in the where clause, which is intended to enable future SQL optimizations that generate a CPUT to replace the row instead Release note: none Epic: CRDB-48647
These statements take advantage of the KV level LWW validation added by PR cockroachdb#143100 and do not explicitly validate the origin timestamp. Update and Delete specify the whole row in the where clause, which is intended to enable future SQL optimizations that generate a CPUT to replace the row instead Release note: none Epic: CRDB-48647
Thanks for the review! bors r+ |
143100: kv: use origin timestamp to validate lww semantics r=jeffswenson a=jeffswenson The PR updates the implementation of put, cput, delete, and range delete to return an error if a replicated write with an origin timestamp would overwrite a value with a more recent origin or MVCC timestamp. No explicit version checks are included. In a mixed-version environment, a SQL writer running 25.1 and writing to a 25.2 KV server will encounter "condition failed" errors if an insert would overwrite a tombstone and retry the write until it is upgraded or the LDR PTS ages out. This is acceptable, as LDR remains in preview and the SQL writer is neither the default nor the recommended writer. Release note: none Epic: [CRDB-48647](https://cockroachlabs.atlassian.net/browse/CRDB-48647) 143169: mirror: fix non-determinism in case two packages have the same path r=rail a=rickystewart This code previously assumed that `go mod download -json` would not produce two different versions of the same dependency with the same path. This is typically a sensible assumption but no longer holds in some niche scenarios. We use [replace](https://github.com/cockroachdb/cockroach/blob/65b2ed4fbdf5502f3fbe0af4ddbd30a7ac7eabb4/go.mod#L500) in `go.mod` to effectively import two different versions of the same dependency with the same path. This results in non-determinism in the mirroring code with respect to which version of the dependency we select. We now disambiguate with a path/version pair, which will be unique. We also add some additional validation to check assumptions so if these assumptions are ever broken in the future, the tool will fail loudly instead of proceeding silently and performing a potentially harmful operation. Fixes #143168 Epic: CRDB-17171 Release note: None Co-authored-by: Jeff Swenson <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>
Build failed (retrying...): |
// id to checkOriginTimestamp. That lets us correctly implement cross | ||
// cluster ties while allowing a replicated write to lose ties against | ||
// itself. | ||
if originTimestamp.LessEq(existTS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random thought: when the time comes to support FK checks and unique checks during LDR SQL writes, I wonder if the Get / Scan performing the check will need to use the OriginTimestamp to read "underneath" a newer value, in order to avoid a spurious FK violation when the foreign row has been modified more recently than the OriginTimestamp write?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Origin timestamp is only validated when writing KVs. So I would expect the foreign key checks to work since they are reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there something in LDR that keeps the most-recently-written OriginTimestamp consistent across rows?
What I'm imagining is a modification history on the origin cluster like:
CREATE TABLE parent (p INT PRIMARY KEY);
CREATE TABLE child (c INT PRIMARY KEY, p INT REFERENCES parent (p));
INSERT INTO parent VALUES (1);
INSERT INTO child VALUES (1, 1);
DELETE FROM child WHERE c = 1;
DELETE FROM parent WHERE p = 1;
Is there something in LDR that ensures the delete from parent always occurs after the insert and delete of child? If not, then it seems like when we replay the insert into child, and perform a FK check, we'll need that FK check to happen at the correct MVCC time, which we can find out from the OriginTimestamp that all events are at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SQL layer must ensure that, at write time, foreign key and uniqueness checks pass based on the local cluster’s state. It's acceptable for a row to fail these checks even if it would have passed on the source cluster at that moment.
Currently, the system retries writes that fail foreign key or uniqueness checks, allowing it to eventually apply writes that arrive out of order. If a row continues to fail after several retries, it is moved to the DLQ.
This behavior is one reason I believe foreign key and uniqueness constraints may be ill-suited for tables replicated via LDR. These constraints don't guarantee convergence and are likely to result in rows being sent to the DLQ. In the future, we may be able to improve this by enforcing a deterministic order for replicated writes, but ensuring that order would fall under LDR’s responsibility, not the SQL layer’s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One quirk in the current implementation is that we set an origin timestamp on all index writes, even though only the timestamp on the primary key is actually load bearing. For instance, if a row is inserted and then updated without modifying the indexed fields, the indexes retain the original insert timestamp rather than reflecting the update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, hmm, maybe deferred checks make this easier.
142368: crosscluster: handle lww condition failures r=jeffswenson a=jeffswenson This change aims to correct the SQL writer's LWW semantics, enabling the retirement of the KV writer. If the SQL writer observes the condition failed error's introduced by #143100, it will drop the row as a LWW loss. This closes a LWW bug where an old replicated write can overwrite a recent local tombstone. The first two commits in this PR are from: * #143096 * #143100 Release note: none Epic: [CRDB-48647](https://cockroachlabs.atlassian.net/browse/CRDB-48647) Co-authored-by: Jeff Swenson <[email protected]>
These statements take advantage of the KV level LWW validation added by PR cockroachdb#143100 and do not explicitly validate the origin timestamp. Update and Delete specify the whole row in the where clause, which is intended to enable future SQL optimizations that generate a CPUT to replace the row instead Release note: none Epic: CRDB-48647
The PR updates the implementation of put, cput, delete, and range delete
to return an error if a replicated write with an origin timestamp would
overwrite a value with a more recent origin or MVCC timestamp.
No explicit version checks are included. In a mixed-version environment,
a SQL writer running 25.1 and writing to a 25.2 KV server will encounter
"condition failed" errors if an insert would overwrite a tombstone and
retry the write until it is upgraded or the LDR PTS ages out. This is
acceptable, as LDR remains in preview and the SQL writer is neither the
default nor the recommended writer.
Release note: none
Epic: CRDB-48647