Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TcpTransport writeTimeout appears to be ignored #48

Open
benfleis opened this issue Jan 16, 2015 · 8 comments
Open

TcpTransport writeTimeout appears to be ignored #48

benfleis opened this issue Jan 16, 2015 · 8 comments

Comments

@benfleis
Copy link

Both from my own testing, and from looking through the code, it appears the the writeTimeout AtomicBoolean is ignored. It looks like it should be ferried through to bootstrap, and then possibly have some exception handling added to make it do the right thing. (I dunno, I've never used netty myself.) In any case, I see in deployment cases where it takes multiple minutes for the socket to recognize that it's dead (seeing TCP PUSH over and over); if the default of 5s was effective, this slow stream failure detection would be avoided by the high level timeout.

@aphyr
Copy link
Collaborator

aphyr commented Jan 16, 2015

Soooo, this is kind of an interesting question. Right now, you can pick arbitrary timeouts for a write simply by choosing the timescale for deref(), but it may also make sense to force a timeout for all clients. That imposes an upper bound on deref timeouts, so I want to make sure the default behavior is sane and documented. How would you expect deref to interact with a transport timeout setting? And what do you think sane defaults are? 10 seconds?

@benfleis
Copy link
Author

I am inclined to agree, practically speaking. In my particular case, I'm using a RiemannReporter -- in a perfect world, I just give it the host/port of the riemann service, and trust that it will "live well", and survive this situation. The DNS cache disabling patch I sent earlier made it at least workable, even if with a long delay after the riemann service restart.

The pedant in me asks: is this variable being honored in any way? If not, maybe at least kill the fake variable.

Sane defaults -- 60s or less. 10s is not aggressive, per se, but it's definitely not passive :) Does anybody ever want a timeout >60s when deploying the sort of system that uses riemann?

@aphyr
Copy link
Collaborator

aphyr commented Jan 16, 2015

Yeah, I tend to instrument on 1 or 5-second interval, so 10s is already stale. ;-) I'll have to dig into the source and see whether WriteTimeout does anything right now. For a while I was aiming to have partial backpressure on async sends but wound up backing off that; couldn't get stable behavior under load. My guess is it's unused but we should hook it up to the Netty channel options.

@benfleis
Copy link
Author

I can create a trivial diff to pass the param through, but anything more is beyond my current java/netty + riemann-client scope, at least to do with any competence. Let me know what's useful.

@aphyr
Copy link
Collaborator

aphyr commented Jan 20, 2015

Yeah, could you shoot me a PR for this? I'm, like, incredibly backlogged right now and haven't had good internet, but that'll make sure I get around to reviewing/merging. :)

@eric
Copy link
Contributor

eric commented May 3, 2016

With the current state of the library, what is the correct behavior when a deref times out? Should I be calling .close() on the client and then .connect() again?

@aphyr
Copy link
Collaborator

aphyr commented May 3, 2016

Reconnects have been automatic for 2 years or so, IIRC.
On May 2, 2016 7:26 PM, Eric Lindvall [email protected] wrote:With the current state of the library, what is the correct behavior when a deref times out? Should I be calling .close() on the client and then .connect() again?

—You are receiving this because you commented.Reply to this email directly or view it on GitHub

@eric
Copy link
Contributor

eric commented May 3, 2016

It does look like there is reconnection logic, but I couldn't see where a reconnection would happen if a deref timed out. Without a write timeout, it seems that you can get into a pathological situation (which I've seen many times) where the process doesn't see the connection is closed, but the write buffer is full, so nothing else can be written to the socket.

I guess I'll continue to close the connection on timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants