Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-25.1: roachtest: fix missing binary for TPC-C in multitenant upgrade test #143184

Merged
merged 1 commit into from
Mar 26, 2025

Conversation

blathers-crl[bot]
Copy link

@blathers-crl blathers-crl bot commented Mar 20, 2025

Backport 1/1 commits from #143055 on behalf of @shubhamdhama.

/cc @cockroachdb/release

Fixes: #142807


Summary: In multitenant upgrade tests, the TPC-C workload may fail if the required binary is missing on a node. This issue can occur when no tenant is created on nodes with the previous binary version, and the workload attempts to run using that binary.

A sample excerpt from the upgrade plan illustrates the process:

├── start cluster at version "v23.2.20" (1)
├── wait for all nodes (:1-4) to acknowledge cluster version '23.2' on system tenant (2)
├── set cluster setting "storage.ingest_split.enabled" to 'false' on system tenant (3)
├── run "maybe create some tenants" (4)
├── upgrade cluster from "v23.2.20" to "v24.1.13"
│   ├── prevent auto-upgrades on system tenant by setting `preserve_downgrade_option` (5)
│   ├── upgrade nodes :1-4 from "v23.2.20" to "v24.1.13"
│   │   ├── restart node 2 with binary version v24.1.13 (6)
│   │   ├── restart node 1 with binary version v24.1.13 (7)
│   │   ├── allow upgrade to happen on system tenant by resetting `preserve_downgrade_option` (8)
│   │   ├── restart node 3 with binary version v24.1.13 (9)
│   │   ├── restart node 4 with binary version v24.1.13 (10)
│   │   └── run "run workload on tenants" (11)
│   ├── run "run workload on tenants" (12)

Once all the nodes are upgraded (step 10), we enter the finalizing phase in step 11. Our cluster configuration would then look like this,

[mixed-version-test/11_run-run-workload-on-tenants] 2025/03/13 10:47:21 runner.go:423: current cluster configuration:
                      n1           n2           n3           n4
released versions     v24.1.13     v24.1.13     v24.1.13     v24.1.13
binary versions       24.1         24.1         24.1         24.1
cluster versions      24.1         24.1         24.1         24.1

This implies that our tenant would also start with the target version as we finalize (see #138233). Then we run the TPC-C workload on tenant nodes using the version we are migrating from—likely for compatibility reasons. However, the required binary may be absent if, during step 4, we did not create any tenants with the previous version due to probabilistic selection. The fix is simple: upload the binary used to run TPC-C. The process first checks whether the binary is already present, so no extra performance overhead occurs if it is.

Fixes: #140507
Release note: None
Epic: None


Release justification: Low risk roachtest fix

Summary: In multitenant upgrade tests, the TPC-C workload may fail if the
required binary is missing on a node. This issue can occur when no tenant
is created on nodes with the previous binary version, and the workload
attempts to run using that binary.

A sample excerpt from the upgrade plan illustrates the process:
```
├── start cluster at version "v23.2.20" (1)
├── wait for all nodes (:1-4) to acknowledge cluster version '23.2' on system tenant (2)
├── set cluster setting "storage.ingest_split.enabled" to 'false' on system tenant (3)
├── run "maybe create some tenants" (4)
├── upgrade cluster from "v23.2.20" to "v24.1.13"
│   ├── prevent auto-upgrades on system tenant by setting `preserve_downgrade_option` (5)
│   ├── upgrade nodes :1-4 from "v23.2.20" to "v24.1.13"
│   │   ├── restart node 2 with binary version v24.1.13 (6)
│   │   ├── restart node 1 with binary version v24.1.13 (7)
│   │   ├── allow upgrade to happen on system tenant by resetting `preserve_downgrade_option` (8)
│   │   ├── restart node 3 with binary version v24.1.13 (9)
│   │   ├── restart node 4 with binary version v24.1.13 (10)
│   │   └── run "run workload on tenants" (11)
│   ├── run "run workload on tenants" (12)
```

Once all the nodes are upgraded (step 10), we enter the finalizing phase in
step 11. Our cluster configuration would then look like this,

```
[mixed-version-test/11_run-run-workload-on-tenants] 2025/03/13 10:47:21 runner.go:423: current cluster configuration:
                      n1           n2           n3           n4
released versions     v24.1.13     v24.1.13     v24.1.13     v24.1.13
binary versions       24.1         24.1         24.1         24.1
cluster versions      24.1         24.1         24.1         24.1
```

This implies that our tenant would also start with the target version as we
finalize (see #138233). Then we run the TPC-C workload on tenant nodes
using the version we are migrating from—likely for compatibility reasons.
However, the required binary may be absent if, during step 4, we did not
create any tenants with the previous version due to probabilistic
selection. The fix is simple: upload the binary used to run TPC-C. The
process first checks whether the binary is already present, so no extra
performance overhead occurs if it is.

Fixes: #140507
Informs: #142807
Release note: None
Epic: None
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-25.1-143055 branch from 923d8b7 to 48b9205 Compare March 20, 2025 09:09
@blathers-crl blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Mar 20, 2025
Copy link
Author

blathers-crl bot commented Mar 20, 2025

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Backports should only be created for serious
    issues
    or test-only changes.
  • Backports should not break backwards-compatibility.
  • Backports should change as little code as possible.
  • Backports should not change on-disk formats or node communication protocols.
  • Backports should not add new functionality (except as defined
    here).
  • Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
  • All backports must be reviewed by the owning areas TL. For more information as to how that review should be conducted, please consult the backport
    policy
    .
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
  • Your backport must be accompanied by a post to the appropriate Slack
    channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.

Also, please add a brief release justification to the body of your PR to justify this
backport.

Sorry, something went wrong.

@blathers-crl blathers-crl bot added the backport Label PR's that are backports to older release branches label Mar 20, 2025
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@blathers-crl blathers-crl bot added the backport-test-only Used to denote the backport has only non-production changes label Mar 20, 2025
Copy link
Contributor

@cthumuluru-crdb cthumuluru-crdb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shubhamdhama shubhamdhama merged commit bf5d8cb into release-25.1 Mar 26, 2025
15 of 16 checks passed
@shubhamdhama shubhamdhama deleted the blathers/backport-release-25.1-143055 branch March 26, 2025 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Label PR's that are backports to older release branches backport-test-only Used to denote the backport has only non-production changes blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. target-release-25.1.4
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants