-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[reconfigurator] Move insertions into crucible_dataset
from executor RPW to rendezvous RPW
#7397
Conversation
I ran
Obviously R12 does not have the new rendezvous tables. I then upgraded to this branch. Immediately after the upgrade, prior to running schema migration,
A short time after schema migration, the Nexus instances came online, and we can see that it knows all the Crucible dataset records already exist:
and it populated
I then went through the standard "add sled" process. The first blueprint after adding the new sled sets up the sleds disks (including its debug datasets!) and NTP zone; after that blueprint was executed, we see 10 new debug datasets (with a different
The second blueprint after adding the new sled sets up Crucible datasets and zones; after that blueprint was executed, we see the expected output from the RPW and 38 rows in
I also confirmed I could create disks and instances, and that we see the new sled being used and
|
5464d11
to
6602604
Compare
// The dataset was already in the database. If it was in | ||
// service, we should have noted that it already exists. | ||
if prep.disposition == ArbitraryDisposition::InService { | ||
expected_stats.num_already_exist += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to have this code here, but are we ever actually triggering it via proptest? I thought this condition was only possible to hit with a concurrent Nexus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are triggering this, yes; commenting it out results in this failure:
thread 'crucible_dataset::tests::proptest_reconciliation' panicked at nexus/reconfigurator/rendezvous/src/crucible_dataset.rs:311:9:
Test failed: assertion `left == right` failed
left: CrucibleDatasetsRendezvousStats { num_inserted: 0, num_already_exist: 1, num_not_in_inventory: 0 }
right: CrucibleDatasetsRendezvousStats { num_inserted: 0, num_already_exist: 0, num_not_in_inventory: 0 }.
minimal failing input: prep = {
0: DatasetPrep {
disposition: InService,
in_inventory: false,
in_database: true,
},
}
However, you're right that the test doesn't trigger issues from concurrent Nexuses. There are two places where we increment num_already_exist
; the common one is "some previous invocation already recorded this dataset", which this test exercises:
omicron/nexus/reconfigurator/rendezvous/src/crucible_dataset.rs
Lines 113 to 117 in 6602604
// ... and not already present in the database ... | |
if existing_datasets.contains_key(&id) { | |
stats.num_already_exist += 1; | |
continue; | |
} |
We don't exercise the "concurrent Nexus made us lose the race" one:
omicron/nexus/reconfigurator/rendezvous/src/crucible_dataset.rs
Lines 143 to 146 in 6602604
// This means we hit the TOCTOU race mentioned above: when we | |
// queried the DB this row didn't exist, but another Nexus must have | |
// beat us to actually inserting it. | |
stats.num_already_exist += 1; |
|
||
for (id, prep) in prep { | ||
let id: DatasetUuid = u32_to_id(id); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) do you think this is easier or harder to read? I found myself wandering through the nested conditionals a bit; wondering if this is easier to parse for humans.
let in_db_before = prep.in_database;
let in_service = prep.disposition == ArbitraryDisposition::InService;
let in_inventory = prep.in_inventory;
let in_db_after = datastore_datasets.contains_key(&id);
// Validate rendezvous output
match (in_db_before, in_service, in_inventory) {
(true, true, _) => expected_stats.num_already_exist += 1,
(false, true, false) => expected_stats.num_not_in_inventory += 1,
(false, true, true) => expected_stats.num_inserted += 1,
(_, false, _) => (),
}
// Validate database state
match (in_db_before, in_service, in_inventory) {
(true, _, _) => assert!(in_db_after, "Existing dataset missing"),
(false, false, _) => assert!(!in_db_after, "Expunged dataset should not get new records"),
(false, true, false) => assert!(!in_db_after, "Should wait for inventory before inserting records"),
(false, true, true) => assert!(in_db_after, "In-service datasets in inventory should be inserted"),
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that does seem clearer to me. Switched in a419aee
@@ -92,7 +92,11 @@ impl BlueprintRendezvous { | |||
|
|||
// Return the result as a `serde_json::Value` | |||
match result { | |||
Ok(()) => json!({}), | |||
Ok(stats) => json!({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking suggestion: I'd add a struct in nexus/types/src/internal_api/background.rs
that can be shared with omdb instead of free form json here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks; fixed in 5410b4b
|
||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] | ||
pub struct RendezvousStats { | ||
pub crucible_dataset: CrucibleDatasetsRendezvousStats, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking question: no stats for debug datasets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in #7401 :)
This PR builds on #7386. I'll test both of them together on a racklette (will put results in a comment on this PR).
This PR contributes to addressing (but doesn't fully address) a couple issues:
crucible_dataset
rendezvous table, but we still need useful output fordebug_dataset
I'm inclined to say this PR fully addresses a couple other issues, but it's debatable:
dataset
as a general table is now gone and replaced with more specific rendezvous tables, so I think this one can be closed?