You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My colleage @hongliangl tried to upgrade netlink to a recent commit to pick up a required change. However, our CI became flaky when validating the new netlink version. We identified the flake started from the commit merged via #941.
The above commit changes the socket created by Subscribe to non-blocking when groups are provided. However, it didn't change how the message was received from the socket, causing the receiver goroutine to run into a busy loop, taking 100% CPU.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
412687 root 20 0 1968136 8172 2660 S 103.6 0.0 0:15.53 netlink-reprodu
To fix it, all subscribers' receiver goroutines should use poll or select to wait for events first before receiving messages from the socket. I could take a stab at fixing the implementation, but I wonder if we could revert the commit that introduces the bug first to unblock projects requiring other changes of the library.
The text was updated successfully, but these errors were encountered:
@tnqn Do you mind fixing the implementation after the revert? Would be a great opportunity for me to learn.
Sure, I'm working on a fix, will ping you once I open a PR. Sorry for reverting the commit, I'm just not sure how long it takes to land the fix but we need some changes in the recent commit to unblock our project.
tnqn
linked a pull request
May 23, 2024
that will
close
this issue
My colleage @hongliangl tried to upgrade netlink to a recent commit to pick up a required change. However, our CI became flaky when validating the new netlink version. We identified the flake started from the commit merged via #941.
The above commit changes the socket created by
Subscribe
to non-blocking when groups are provided. However, it didn't change how the message was received from the socket, causing the receiver goroutine to run into a busy loop, taking 100% CPU.netlink/addr_linux.go
Lines 367 to 372 in 856e190
The issue can be reproduced by the following code:
The process would always take 100%+ CPU:
To fix it, all subscribers' receiver goroutines should use poll or select to wait for events first before receiving messages from the socket. I could take a stab at fixing the implementation, but I wonder if we could revert the commit that introduces the bug first to unblock projects requiring other changes of the library.
The text was updated successfully, but these errors were encountered: