-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TcpTransport writeTimeout
appears to be ignored
#48
Comments
Soooo, this is kind of an interesting question. Right now, you can pick arbitrary timeouts for a write simply by choosing the timescale for deref(), but it may also make sense to force a timeout for all clients. That imposes an upper bound on deref timeouts, so I want to make sure the default behavior is sane and documented. How would you expect deref to interact with a transport timeout setting? And what do you think sane defaults are? 10 seconds? |
I am inclined to agree, practically speaking. In my particular case, I'm using a RiemannReporter -- in a perfect world, I just give it the host/port of the riemann service, and trust that it will "live well", and survive this situation. The DNS cache disabling patch I sent earlier made it at least workable, even if with a long delay after the riemann service restart. The pedant in me asks: is this variable being honored in any way? If not, maybe at least kill the fake variable. Sane defaults -- 60s or less. 10s is not aggressive, per se, but it's definitely not passive :) Does anybody ever want a timeout >60s when deploying the sort of system that uses riemann? |
Yeah, I tend to instrument on 1 or 5-second interval, so 10s is already stale. ;-) I'll have to dig into the source and see whether WriteTimeout does anything right now. For a while I was aiming to have partial backpressure on async sends but wound up backing off that; couldn't get stable behavior under load. My guess is it's unused but we should hook it up to the Netty channel options. |
I can create a trivial diff to pass the param through, but anything more is beyond my current java/netty + riemann-client scope, at least to do with any competence. Let me know what's useful. |
Yeah, could you shoot me a PR for this? I'm, like, incredibly backlogged right now and haven't had good internet, but that'll make sure I get around to reviewing/merging. :) |
With the current state of the library, what is the correct behavior when a deref times out? Should I be calling |
Reconnects have been automatic for 2 years or so, IIRC. —You are receiving this because you commented.Reply to this email directly or view it on GitHub |
It does look like there is reconnection logic, but I couldn't see where a reconnection would happen if a deref timed out. Without a write timeout, it seems that you can get into a pathological situation (which I've seen many times) where the process doesn't see the connection is closed, but the write buffer is full, so nothing else can be written to the socket. I guess I'll continue to close the connection on timeout. |
Both from my own testing, and from looking through the code, it appears the the
writeTimeout
AtomicBoolean is ignored. It looks like it should be ferried through to bootstrap, and then possibly have some exception handling added to make it do the right thing. (I dunno, I've never used netty myself.) In any case, I see in deployment cases where it takes multiple minutes for the socket to recognize that it's dead (seeing TCP PUSH over and over); if the default of 5s was effective, this slow stream failure detection would be avoided by the high level timeout.The text was updated successfully, but these errors were encountered: