-
Notifications
You must be signed in to change notification settings - Fork 764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: update netlink dep to latest commit #3237
base: master
Are you sure you want to change the base?
Conversation
- this fixes interrupted sys call errors in the plugin
@@ -32,7 +32,7 @@ require ( | |||
github.com/sirupsen/logrus v1.9.3 | |||
github.com/spf13/pflag v1.0.5 | |||
github.com/stretchr/testify v1.10.0 | |||
github.com/vishvananda/netlink v1.3.0 | |||
github.com/vishvananda/netlink v1.3.1-0.20250303224720-0e7078ed04c8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v1.3.1 hasn't been released yet. This uses the latest commit at the time of creating this PR (until we get v1.3.1 release of netlink): vishvananda/netlink@0e7078e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could handle the error in the code here:
amazon-vpc-cni-k8s/pkg/utils/cniutils/cni_utils.go
Lines 55 to 58 in 8dd2a5a
addrs, err := netLink.AddrList(link, netlink.FAMILY_V6) | |
if err != nil { | |
return fmt.Errorf("could not list addresses: %v", err) | |
} |
with something like:
if errors.Is(err, unix.EINTR) || err != nil {
return fmt.Errorf("could not list addresses: %v", err)
}
but we might end up ignoring unix.EINTR's which really matter.
P.S.: updated code link
@vadasambar thanks for updating this. Did you get chance to test? If you did, can you comment with the reprod you had in the overview? |
This PR is only updating a dependency. If the tests are passing (for this repo), even if the issue is not fixed (I think the issue will be fixed), we would be in a better state. The only problem I see is using a master/trunk version of a dependency. Provided upstream dependency is doing its due diligence, we should be fine? Let me know what you think. |
@haouc ^ |
I don't see the urgency to use package commit while upgrading dependency. We can update package when it is officially released. Even with this update we only get benefit of error categorization. But we either we retry silently using this new update which is not officially released yet or we let kubelet retry on failure. It doesn't seem worth the risk of importing package with a tag with no official release as of now. |
We could wait for the upstream to update the dependency but we don't know when that would happen. The fix was merged in Sep 2024.
We are throwing an error in the CNI (now) and letting kubelet retry vs using the result from call (before) even if the system call was interrupted (and then keep on retrying until the result contains the the value we were looking for). If we let kubelet retry, it's going to add to the pod startup time. P.S.: the problem is only going to get worse with increased pod churn or larger clusters since the probability of sys call getting interrupted is only going to go up. I guess we have 3 options now:
|
What type of PR is this?
bug
Which issue does this PR fix?:
#3196
What does this PR do / Why do we need it?:
This PR fixes the frequent interrupted sys call errors like these:
Testing done on this change:
Will this PR introduce any new dependencies?:
No
Will this break upgrades or downgrades? Has updating a running cluster been tested?:
No
Does this change require updates to the CNI daemonset config files to work?:
No
Does this PR introduce any user-facing change?:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.