-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception while processing command leave IOThread in bad state (Not invoking inEvent to serve new command) #905
Comments
Do you think this commit might help you ? ece1430 |
One more question. Are you able to reproduce it ? A NPE is always a bug. So it should be resolver too. |
We are able to reproduce it. I'm not sure if it's easy to write a test for it yet, since it's a race condition, but I'm working on that this week. |
Meanwhile, a simple patch like:
could make the job. |
Here is a "Unittest" that will reproduce the issue:
|
Indeed, I quickly get a
|
It is not obvious, but I recently find out that the IOThread has to drain all commands in its mailbox in order to be signaled again for new commands. This is because draining the messages off underlying YPipe will reset the YPipe.c to -1, which cause "flush()" to return false on sender, and only then the sender of new command will signal the IOThread to invoke inEvent().
However, an exception while IOThread is processing the command will stop it from draining its mailbox. As a result, the IOThread will be left in running state but never waked up again to serve inEvent even though there are new commands sent to it.
The exception we hit is the same as the one mentioned in #871.
In our case, we understand that it is caused by a race condition between worker thread notifying the IOThread to ActivateWrite on a backed up socket and the heartbeat reconnecting to the remote end. This results the ActivateWrite command arrives to a new StreamEngine object just set up for the new connection, which is not fully initialized pending handshake.
We learnt that it is not a good idea to enable heartbeat on Pull socket as the back pressure will trigger the heartbeat to reconnect frequently (given a reasonably low heartbeat timeout).
I see few different ways to handle the exception better, for example, let the IOThread die instead of fail silently, handle the exception inside the inEvent or even call iothread.inEvent without the signal.
I am curious on your thoughts on this.
Thanks!
The text was updated successfully, but these errors were encountered: