-
Notifications
You must be signed in to change notification settings - Fork 40.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark volume as uncertain if MarkVolumeAsAttached fails #129664
base: master
Are you sure you want to change the base?
Conversation
When CSI's attachRequired changes from true to false after successful volume attach, MarkVolumeAsAttached may fail, leaving VolumeAttachment stranded. Mark volume as uncertain to allow volumeattachment cleanup on next retry. Signed-off-by: hongkang <[email protected]>
Welcome @hkttty2009! |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Hi @hkttty2009. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Please add a release note because it is bugfix and it should introduce a user-facing change. And is it possible to add an e2e test for this PR to verify the bugfix? Thanks for the contribution! |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: hkttty2009 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
a443f93
to
c135b59
Compare
I've added a release note and e2e test. Please review the updates,thanks! |
Hi @carlory, could you please review this PR? Thank you for your time and feedback ! |
sorry for the delay,I will review it next week. It is in my todo list. |
ginkgo.DeferCleanup(func(ctx context.Context) { | ||
_, err := m.cs.StorageV1().VolumeAttachments().Get(ctx, attachmentName, metav1.GetOptions{}) | ||
if err == nil { | ||
err := m.cs.StorageV1().VolumeAttachments().Delete(ctx, attachmentName, metav1.DeleteOptions{}) | ||
framework.ExpectNoError(err, "Failed to delete VolumeAttachment: %v", err) | ||
} else if !apierrors.IsNotFound(err) { | ||
framework.ExpectNoError(err, "Failed to get VolumeAttachment") | ||
} | ||
}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes code simpler.
ginkgo.DeferCleanup(func(ctx context.Context) { | |
_, err := m.cs.StorageV1().VolumeAttachments().Get(ctx, attachmentName, metav1.GetOptions{}) | |
if err == nil { | |
err := m.cs.StorageV1().VolumeAttachments().Delete(ctx, attachmentName, metav1.DeleteOptions{}) | |
framework.ExpectNoError(err, "Failed to delete VolumeAttachment: %v", err) | |
} else if !apierrors.IsNotFound(err) { | |
framework.ExpectNoError(err, "Failed to get VolumeAttachment") | |
} | |
}) | |
ginkgo.DeferCleanup(framework.IgnoreNotFound(m.cs.StorageV1().VolumeAttachments().Delete), attachmentName, metav1.DeleteOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
framework.ExpectNoError(err, "Failed to get CSIDriver: %v", err) | ||
|
||
ginkgo.By("Wait for the volumeattachment to be deleted") | ||
err = e2evolume.WaitForVolumeAttachmentTerminated(ctx, attachmentName, m.cs, csiVolumeAttachmentTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the csi driver doesn't deploy the attacher component, the expected status is terminating, not terminated once the finalizer is added to the volume attachment. In this case, we have to unset finalizer manually in order to cleanup this resource later. The new e2e test does not redeloyment the whole csi driver with disableAttach: true
, only changing the csi driver attributes, so the volume attachment can be deleted from kube-apiserver. It looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment. As you mentioned, retaining the attacher component allows the VolumeAttachment to be cleaned up from kube-apiserver. I'm currently developing a CSI driver that changed its attachRequired from true to false but still retains the attacher component after this change. So I think retaining the attacher component aligns more with real-world scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hkttty2009 Does the controller capabilities of your csi driver support the ControllerServiceCapability_RPC_PUBLISH_UNPUBLISH_VOLUME
capability when you change the ATTACHREQUIRED
field from true
to false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not supported, the attacher won't remove the finalizer, please see https://github.com/kubernetes-csi/external-attacher/blob/master/pkg/controller/trivial_handler.go#L50
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If supported, why change the ATTACHREQUIRED
to false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our CSI driver support the ControllerServiceCapability_RPC_PUBLISH_UNPUBLISH_VOLUME capability, but in reality, the CSI's ControllerPublishVolume doesn't perform any operations.
This is an NFS CSI driver which doesn't actually require attachment, but for a long period of time, the CSI's ATTACHREQUIRED was set to true. Changing it to false is to speed up pod creation by eliminating the step of creating volumeattachment.
I discovered the VolumeAttachment cleanup issue during CSI upgrade validation scenarios, which led to submitting this PR to address the problem.
/cc @saad-ali |
…ed changes Signed-off-by: hongkang <[email protected]>
c135b59
to
b367b30
Compare
What type of PR is this?
/kind bug
What this PR does / why we need it:
After volumeAttacher.Attach executes successfully, the volumeattachment will be created. At this point, if CSI's ATTACHREQUIRED changes from true to false, it will cause MarkVolumeAsAttached to fail.
And the volume will be removed from the dsw, leaving the VolumeAttachment stranded without being properly cleaned up.
In this PR, we mark the volume as uncertain to allow volumeattachment cleanup on the next retry.
Which issue(s) this PR fixes:
Fixes #129572
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: