Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: skip transient grpc errors in pktmon #1265

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

matmerr
Copy link
Member

@matmerr matmerr commented Jan 23, 2025

Description

There are known transient issues that show up running gRPC on windows in hostnetworking, a few of these don't need restart the client/server. The goal here is to accommodate those transient errors while preserving restart behavior for critical errors.

Related Issue

If this pull request is related to any issue, please mention it here. Additionally, make sure that the issue is assigned to you before submitting this pull request.

Checklist

  • I have read the contributing documentation.
  • I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
  • I have correctly attributed the author(s) of the code.
  • I have tested the changes locally.
  • I have followed the project's style guidelines.
  • I have updated the documentation, if necessary.
  • I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

Please add any relevant screenshots or GIFs to showcase the changes made.

Additional Notes

Add any additional notes or context about the pull request here.


Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.

@matmerr matmerr requested a review from a team as a code owner January 23, 2025 21:33
@matmerr matmerr force-pushed the matmerr/pktmontesting branch from 8770822 to de3319d Compare January 23, 2025 23:02
@matmerr matmerr force-pushed the matmerr/pktmontesting branch from de3319d to d24986e Compare January 23, 2025 23:35
// commonly seen with:
// {"error":"failed to receive pktmon event: rpc error: code = Internal desc = unexpected EOF"}
// {"error":"failed to receive pktmon event: rpc error: code = Internal desc = received 65576-bytes data exceeding the limit 65535 bytes"}
// {"error":"failed to receive pktmon event: rpc error: code = Internal desc = grpc: failed to unmarshal the received message: proto: cannot parse invalid wire-format data"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the Windows team have work items captured to fix these errors?

return errors.Wrapf(err, "failed to get current working directory for pktmon")
}

cmd := pwd + "\\" + "controller-pktmon.exe"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to locate this with exec.LookPath?


func (p *WindowsGRPCManager) Stop() error {
if p.pktmonCmd != nil {
err := p.pktmonCmd.Process.Kill()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fire-and-forget. Do we care? If so, we need p.pktmonCmd.Process.Wait() (and probably some context timeout).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants