-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(robot): fix volume nodeID transient state that causes test case to fail after a cluster restart #2268
base: master
Are you sure you want to change the base?
Conversation
WalkthroughThe pull request modifies the Changes
Assessment against linked issues
Possibly related PRs
Suggested reviewers
Poem
Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
e2e/tests/negative/cluster_restart.robot (1)
102-104
: Add timeout parameters for health checks.The volume health checks should have explicit timeouts to prevent test hangs if volumes fail to recover.
Consider adding timeout parameters:
- And Wait for volume 0 healthy - And Wait for volume 1 healthy - And Wait for volume 2 healthy + And Wait for volume 0 healthy timeout=300 + And Wait for volume 1 healthy timeout=300 + And Wait for volume 2 healthy timeout=300
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
e2e/tests/negative/cluster_restart.robot
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Summary
🔇 Additional comments (4)
e2e/tests/negative/cluster_restart.robot (4)
8-8
: LGTM! Resource imports align with test requirements.The addition of
volume.resource
andrecurringjob.resource
is appropriate for supporting the new test case functionality.Also applies to: 16-16
24-24
: LGTM! Tag addition improves test categorization.Adding the explicit
cluster
tag improves test discoverability and maintains consistency with the new test case.
95-97
: Verify snapshot/backup cleanup mechanism.The test creates recurring snapshot and backup jobs but doesn't specify cleanup procedures. This could lead to resource accumulation over multiple test runs.
Consider adding cleanup steps or verifying if the test teardown handles this automatically.
86-86
: Consider increasing replica count for volume 0.Using a single replica for volume 0 might risk data durability during cluster restarts. Consider using 3 replicas for better resilience.
cd7fab6
to
b2d9146
Compare
…to fail after a cluster restart Signed-off-by: Yang Chiu <[email protected]>
b2d9146
to
0fe7f0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
e2e/libs/volume/crd.py (2)
441-449
: LGTM! Fix for volume nodeID transient state.The changes properly handle the transient state by:
- Waiting for volume to be in "attached" state
- Adding retry logic to wait for nodeID availability
However, the loop control variable
i
is not used within the loop body.- for i in range(self.retry_count): + for _i in range(self.retry_count):🧰 Tools
🪛 Ruff (0.8.2)
444-444: Loop control variable
i
not used within loop bodyRename unused
i
to_i
(B007)
465-473
: Fix unused loop control variable.The loop control variable
i
is not used within the loop body.- for i in range(self.retry_count): + for _i in range(self.retry_count):🧰 Tools
🪛 Ruff (0.8.2)
468-468: Loop control variable
i
not used within loop bodyRename unused
i
to_i
(B007)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
e2e/libs/volume/crd.py
(2 hunks)e2e/tests/negative/cluster_restart.robot
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
e2e/libs/volume/crd.py
444-444: Loop control variable i
not used within loop body
Rename unused i
to _i
(B007)
468-468: Loop control variable i
not used within loop body
Rename unused i
to _i
(B007)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Build images
- GitHub Check: Summary
🔇 Additional comments (2)
e2e/tests/negative/cluster_restart.robot (2)
22-22
: LGTM! Tag addition improves test categorization.The addition of the
[Tags] cluster
tag enhances test organization and discoverability.
Line range hint
22-89
: Enhance data integrity verification after cluster restart.While the test verifies that workloads are stable and "work" after restart, it should also validate data integrity to ensure no data corruption occurred during the cluster restart.
Add data integrity checks after the cluster restart:
When Restart cluster And Wait for longhorn ready And Wait for workloads pods stable ... deployment 0 deployment 1 deployment 2 deployment 3 deployment 4 deployment 5 ... statefulset 0 statefulset 1 statefulset 2 statefulset 3 statefulset 4 statefulset 5 + Then Verify data integrity in deployment 0 + And Verify data integrity in deployment 1 + And Verify data integrity in deployment 2 + And Verify data integrity in deployment 3 + And Verify data integrity in deployment 4 + And Verify data integrity in deployment 5 + And Verify data integrity in statefulset 0 + And Verify data integrity in statefulset 1 + And Verify data integrity in statefulset 2 + And Verify data integrity in statefulset 3 + And Verify data integrity in statefulset 4 + And Verify data integrity in statefulset 5 Then Check deployment 0 works
Which issue(s) this PR fixes:
Issue longhorn/longhorn#10203
What this PR does / why we need it:
fix volume nodeID transient state that causes test case to fail after a cluster restart
Special notes for your reviewer:
Additional documentation or context
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Tests