-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with Timeout::Error from Channel#wait_for_confirms #351
Comments
Please use the mailing list for questions. There are separate timeouts for continuation timeouts and heartbeats. The default these days is 15 or greater. 10 is not a particularly Or just use streaming confirms with callbacks instead of blocking wsits.
|
Thanks for the quick response. Forgot about the mailing list. I'll check out that issue and try some higher level timeouts to see if that helps. We looked into the streaming confirms, but those unfortunately weren't very compatible with the various places we use it. Might be time to re-investigate that though. Thanks again. |
@justin-yesware can you send the link of this problem on mailing list? I'm having same trouble sporadically. Tks |
@tigluiz this can be caused by rabbitmq/rabbitmq-server#227, bumping the timeout is the only thing a client can do. |
@michaelklishin tks ! |
FWIW, we wound up cranking up heartbeat to 60s. This has helped as well as some hardware upgrades on our rabbit cluster. We still do see these occasionally and when they do, no confirmed publish appear to work after the first one. Fortunately, we're on heroku, so the 24h process restart they do typically saves us. We also run into this via our Resque workers, but they re-fork every n jobs which again clears out any connection to RabbitMq that may be experiencing this issue. We haven't had much time to confirm this configuration (~ 1 week) will work long term, but so far so good. Let me know if you'd like for me to summarize this issue and post to the mailing list. |
sorry to revive an old issue @michaelklishin but how does one implement this?
|
Apparently streaming confirms were not implemented in Bunny #432. Contributions would be considered. |
We're experiencing an issue where various workers of ours (some using Sneakers, some using Resque) are seeing higher rates of this above exception. We've tried this with various versions of sneakers and bunny, and it's pretty consistent. We're currently on the latest versions of both of these gems and I recently cranked up the heartbeat from 2s to 10s. I'm not sure that had any effect as we're continuing to see these.
We use a provider for our Rabbit cluster (CloudAMQP) and from their Server logs stream I can see messages like this occasionally:
These correlate time-wise with when we start seeing the
Timeout::Error
from our publishers. At this point, at least one worker I was watching seems to be stuck with a bad connection and no auto-reconnect via Bunny was happening. If any of the publish code raises any other exception thanTimeout::Error
we disconnect and re-connect just to be sure. I was under the impression aTimeout::Error
was not thrown in the case of an actual network error and that Bunny would auto-reconnect if it was.Also, all of these workers that are doing the publishing run in Heroku. I'm not sure that has anything to do with it, but thought it worth mentioning.
Unfortunately we've not been able to reproduce this outside of our production system and the code is a bit strewn across multiple classes to simply paste here, but we're establishing our connection using Bunny like this:
And we're publishing using a block of code like this:
We are using thread-local channels based on this single Bunny connection but have also tuned the Sneakers worker down to 1 thread to minimize that as a cause.
Any insight into what to look for next or that we should be disconnecting and re-connecting when we see these Timeouts would be great. I'm pretty sure reconnecting on Timeout has caused us issues in the past with leaking Connections, but we'll give it another shot if that's the correct way to handle them.
Thanks, Justin
The text was updated successfully, but these errors were encountered: