Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: use origin timestamp to validate lww semantics #143100

Merged
merged 1 commit into from
Mar 20, 2025

Conversation

jeffswenson
Copy link
Collaborator

The PR updates the implementation of put, cput, delete, and range delete
to return an error if a replicated write with an origin timestamp would
overwrite a value with a more recent origin or MVCC timestamp.

No explicit version checks are included. In a mixed-version environment,
a SQL writer running 25.1 and writing to a 25.2 KV server will encounter
"condition failed" errors if an insert would overwrite a tombstone and
retry the write until it is upgraded or the LDR PTS ages out. This is
acceptable, as LDR remains in preview and the SQL writer is neither the
default nor the recommended writer.

Release note: none
Epic: CRDB-48647

@jeffswenson jeffswenson requested review from a team as code owners March 18, 2025 22:39
@jeffswenson jeffswenson requested review from sumeerbhola and michae2 and removed request for a team March 18, 2025 22:39
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jeffswenson
Copy link
Collaborator Author

jeffswenson commented Mar 18, 2025

This change depends on #143096. It was split out from #142368.

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this seems reasonable to me.

I left some nits but take them or leave them.

require.NoError(t, err)

// Verify the value was deleted
gr, err := db.Get(ctx, key)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted, you could construct a get with the option to return full mvcc value headers to verify that the value header was set. But I think the mvcc history tests probably cover that well enough so I leave it to your judgement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that actually possible? I considered this when writing the test, but cmd_get.go has no way to set Tombstone: true on the MVCCGetOptions.

getRes, err := storage.MVCCGet(ctx, readWriter, args.Key, h.Timestamp, storage.MVCCGetOptions{

And I decided it wasn't worth the trouble to figure out how to use scan for this since the LDR tests and the mvcc tests verify the actual origin timestamp.

@jeffswenson jeffswenson force-pushed the jeffswenson-lww-kv-changes branch from 17398d3 to d2185fa Compare March 19, 2025 14:23
The PR updates the implementation of put, cput, delete, and range delete
to return an error if a replicated write with an origin timestamp would
overwrite a value with a more recent origin or MVCC timestamp.

No explicit version checks are included. In a mixed-version environment,
a SQL writer running 25.1 and writing to a 25.2 KV server will encounter
"condition failed" errors if an insert would overwrite a tombstone and
retry the write until it is upgraded or the LDR PTS ages out. This is
acceptable, as LDR remains in preview and the SQL writer is neither the
default nor the recommended writer.

Release note: none
Epic: CRDB-48647
@jeffswenson jeffswenson force-pushed the jeffswenson-lww-kv-changes branch from d2185fa to 9336d4f Compare March 19, 2025 15:02
jeffswenson added a commit to jeffswenson/cockroach that referenced this pull request Mar 19, 2025
These statements take advantage of the KV level LWW validation added by
PR cockroachdb#143100 and do not explicitly validate the origin timestamp. Update
and Delete specify the whole row in the where clause, which is intended
to enable future SQL optimizations that generate a CPUT to replace the
row instead

Release note: none
Epic: CRDB-48647
jeffswenson added a commit to jeffswenson/cockroach that referenced this pull request Mar 19, 2025
These statements take advantage of the KV level LWW validation added by
PR cockroachdb#143100 and do not explicitly validate the origin timestamp. Update
and Delete specify the whole row in the where clause, which is intended
to enable future SQL optimizations that generate a CPUT to replace the
row instead

Release note: none
Epic: CRDB-48647
@jeffswenson jeffswenson requested a review from stevendanna March 19, 2025 20:13
jeffswenson added a commit to jeffswenson/cockroach that referenced this pull request Mar 19, 2025
These statements take advantage of the KV level LWW validation added by
PR cockroachdb#143100 and do not explicitly validate the origin timestamp. Update
and Delete specify the whole row in the where clause, which is intended
to enable future SQL optimizations that generate a CPUT to replace the
row instead

Release note: none
Epic: CRDB-48647
@jeffswenson
Copy link
Collaborator Author

Thanks for the review!

bors r+

craig bot pushed a commit that referenced this pull request Mar 20, 2025
143100: kv: use origin timestamp to validate lww semantics r=jeffswenson a=jeffswenson

The PR updates the implementation of put, cput, delete, and range delete
to return an error if a replicated write with an origin timestamp would
overwrite a value with a more recent origin or MVCC timestamp.

No explicit version checks are included. In a mixed-version environment,
a SQL writer running 25.1 and writing to a 25.2 KV server will encounter
"condition failed" errors if an insert would overwrite a tombstone and
retry the write until it is upgraded or the LDR PTS ages out. This is
acceptable, as LDR remains in preview and the SQL writer is neither the
default nor the recommended writer.

Release note: none
Epic: [CRDB-48647](https://cockroachlabs.atlassian.net/browse/CRDB-48647)

143169: mirror: fix non-determinism in case two packages have the same path r=rail a=rickystewart

This code previously assumed that `go mod download -json` would not produce two different versions of the same dependency with the same path. This is typically a sensible assumption but no longer holds in some niche scenarios. We use [replace](https://github.com/cockroachdb/cockroach/blob/65b2ed4fbdf5502f3fbe0af4ddbd30a7ac7eabb4/go.mod#L500) in `go.mod` to effectively import two different versions of the same dependency with the same path. This results in non-determinism in the mirroring code with respect to which version of the dependency we select.

We now disambiguate with a path/version pair, which will be unique. We also add some additional validation to check assumptions so if these assumptions are ever broken in the future, the tool will fail loudly instead of proceeding silently and performing a potentially harmful operation.

Fixes #143168
Epic: CRDB-17171

Release note: None

Co-authored-by: Jeff Swenson <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
@craig
Copy link
Contributor

craig bot commented Mar 20, 2025

Build failed (retrying...):

@craig craig bot merged commit 1c8ef74 into cockroachdb:master Mar 20, 2025
24 checks passed
// id to checkOriginTimestamp. That lets us correctly implement cross
// cluster ties while allowing a replicated write to lose ties against
// itself.
if originTimestamp.LessEq(existTS) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random thought: when the time comes to support FK checks and unique checks during LDR SQL writes, I wonder if the Get / Scan performing the check will need to use the OriginTimestamp to read "underneath" a newer value, in order to avoid a spurious FK violation when the foreign row has been modified more recently than the OriginTimestamp write?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Origin timestamp is only validated when writing KVs. So I would expect the foreign key checks to work since they are reads.

Copy link
Collaborator

@michae2 michae2 Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something in LDR that keeps the most-recently-written OriginTimestamp consistent across rows?

What I'm imagining is a modification history on the origin cluster like:

CREATE TABLE parent (p INT PRIMARY KEY);
CREATE TABLE child (c INT PRIMARY KEY, p INT REFERENCES parent (p));
INSERT INTO parent VALUES (1);
INSERT INTO child VALUES (1, 1);
DELETE FROM child WHERE c = 1;
DELETE FROM parent WHERE p = 1;

Is there something in LDR that ensures the delete from parent always occurs after the insert and delete of child? If not, then it seems like when we replay the insert into child, and perform a FK check, we'll need that FK check to happen at the correct MVCC time, which we can find out from the OriginTimestamp that all events are at.

Copy link
Collaborator Author

@jeffswenson jeffswenson Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQL layer must ensure that, at write time, foreign key and uniqueness checks pass based on the local cluster’s state. It's acceptable for a row to fail these checks even if it would have passed on the source cluster at that moment.

Currently, the system retries writes that fail foreign key or uniqueness checks, allowing it to eventually apply writes that arrive out of order. If a row continues to fail after several retries, it is moved to the DLQ.

This behavior is one reason I believe foreign key and uniqueness constraints may be ill-suited for tables replicated via LDR. These constraints don't guarantee convergence and are likely to result in rows being sent to the DLQ. In the future, we may be able to improve this by enforcing a deterministic order for replicated writes, but ensuring that order would fall under LDR’s responsibility, not the SQL layer’s.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One quirk in the current implementation is that we set an origin timestamp on all index writes, even though only the timestamp on the primary key is actually load bearing. For instance, if a row is inserted and then updated without modifying the indexed fields, the indexes retain the original insert timestamp rather than reflecting the update.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, hmm, maybe deferred checks make this easier.

craig bot pushed a commit that referenced this pull request Mar 22, 2025
142368: crosscluster: handle lww condition failures r=jeffswenson a=jeffswenson

This change aims to correct the SQL writer's LWW semantics, enabling the
retirement of the KV writer. If the SQL writer observes the condition
failed error's introduced by #143100, it will drop the row as a LWW
loss. This closes a LWW bug where an old replicated write can overwrite
a recent local tombstone.

The first two commits in this PR are from:
* #143096
* #143100

Release note: none
Epic: [CRDB-48647](https://cockroachlabs.atlassian.net/browse/CRDB-48647)

Co-authored-by: Jeff Swenson <[email protected]>
jeffswenson added a commit to jeffswenson/cockroach that referenced this pull request Mar 24, 2025
These statements take advantage of the KV level LWW validation added by
PR cockroachdb#143100 and do not explicitly validate the origin timestamp. Update
and Delete specify the whole row in the where clause, which is intended
to enable future SQL optimizations that generate a CPUT to replace the
row instead

Release note: none
Epic: CRDB-48647
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants