Potential deadlock with Vaadin #12272

mperktold · 2024-09-13T13:20:35Z

Jetty version(s)
Jetty 12.0.13

Jetty Environment
core, ee10

Java version/vendor (use: java -version)
openjdk version "21.0.3" 2024-04-16 LTS
OpenJDK Runtime Environment Temurin-21.0.3+9 (build 21.0.3+9-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.3+9 (build 21.0.3+9-LTS, mixed mode, sharing)

OS type/version
Windows 11

Description
This is a repost of vaadin/flow#19938

We have several reports of our application not being able to shutdown. Apparently, some VaadinSessions stay alive and cannot be destroyed. Here are the thread dumps of two such cases:
StackTraces1.txt
StackTraces2.txt

I found some common patterns in these dumps:

One of the threads blocks on a Jetty semaphore while reading the request content of an UIDL request. Note that this thread holds the lock on the VaadinSession while blocking.

[email protected]/jdk.internal.misc.Unsafe.park(Native Method)
[email protected]/java.util.concurrent.locks.LockSupport.park(Unknown Source)
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(Unknown Source)
[email protected]/java.util.concurrent.ForkJoinPool.unmanagedBlock(Unknown Source)
[email protected]/java.util.concurrent.ForkJoinPool.managedBlock(Unknown Source)
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source)
app//org.eclipse.jetty.ee10.servlet.AsyncContentProducer$LockedSemaphore.acquire(AsyncContentProducer.java:393)
app//org.eclipse.jetty.ee10.servlet.BlockingContentProducer.nextChunk(BlockingContentProducer.java:119)
app//org.eclipse.jetty.ee10.servlet.HttpInput.read(HttpInput.java:245)
app//org.eclipse.jetty.ee10.servlet.HttpInput.read(HttpInput.java:226)
[email protected]/sun.nio.cs.StreamDecoder.readBytes(Unknown Source)
[email protected]/sun.nio.cs.StreamDecoder.implRead(Unknown Source)
[email protected]/sun.nio.cs.StreamDecoder.lockedRead(Unknown Source)
[email protected]/sun.nio.cs.StreamDecoder.read(Unknown Source)
[email protected]/java.io.InputStreamReader.read(Unknown Source)
[email protected]/java.io.BufferedReader.read1(Unknown Source)
[email protected]/java.io.BufferedReader.implRead(Unknown Source)
[email protected]/java.io.BufferedReader.read(Unknown Source)
[email protected]/java.io.Reader.read(Unknown Source)
app//com.vaadin.flow.server.communication.ServerRpcHandler.getMessage(ServerRpcHandler.java:503)
app//com.vaadin.flow.server.communication.ServerRpcHandler.handleRpc(ServerRpcHandler.java:253)
app//com.vaadin.flow.server.communication.UidlRequestHandler.synchronizedHandleRequest(UidlRequestHandler.java:114)
app//com.vaadin.flow.server.SynchronizedRequestHandler.handleRequest(SynchronizedRequestHandler.java:40)
app//com.vaadin.flow.server.VaadinService.handleRequest(VaadinService.java:1584)
app//com.vaadin.flow.server.VaadinServlet.service(VaadinServlet.java:398)
app//jakarta.servlet.http.HttpServlet.service(HttpServlet.java:614)
app//org.eclipse.jetty.ee10.servlet.ServletHolder.handle(ServletHolder.java:736)

A second thread blocks on the VaadinSession while trying to close the websocket:

[email protected]/jdk.internal.misc.Unsafe.park(Native Method)
[email protected]/java.util.concurrent.locks.LockSupport.park(Unknown Source)
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown Source)
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown Source)
[email protected]/java.util.concurrent.locks.ReentrantLock$Sync.lock(Unknown Source)
[email protected]/java.util.concurrent.locks.ReentrantLock.lock(Unknown Source)
app//com.vaadin.flow.server.VaadinService.lockSession(VaadinService.java:792)
app//com.vaadin.flow.server.VaadinService.findOrCreateVaadinSession(VaadinService.java:839)
app//com.vaadin.flow.server.VaadinService.findVaadinSession(VaadinService.java:684)
app//com.vaadin.flow.server.communication.PushHandler.handleConnectionLost(PushHandler.java:408)
app//com.vaadin.flow.server.communication.PushHandler.connectionLost(PushHandler.java:368)
app//com.vaadin.flow.server.communication.PushAtmosphereHandler.onStateChange(PushAtmosphereHandler.java:62)
app//org.atmosphere.cpr.AsynchronousProcessor.invokeAtmosphereHandler(AsynchronousProcessor.java:538)
app//org.atmosphere.cpr.AsynchronousProcessor.completeLifecycle(AsynchronousProcessor.java:480)
app//org.atmosphere.cpr.AsynchronousProcessor.endRequest(AsynchronousProcessor.java:584)
app//org.atmosphere.websocket.DefaultWebSocketProcessor.close(DefaultWebSocketProcessor.java:639)
app//org.atmosphere.container.JSR356Endpoint.onClose(JSR356Endpoint.java:318)
[email protected]/java.lang.invoke.LambdaForm$DMH/0x000000001e1a4000.invokeVirtual(LambdaForm$DMH)
[email protected]/java.lang.invoke.LambdaForm$MH/0x000000001f292000.invoke(LambdaForm$MH)
[email protected]/java.lang.invoke.LambdaForm$MH/0x000000001ee44800.invoke_MT(LambdaForm$MH)
app//org.eclipse.jetty.ee10.websocket.jakarta.common.JakartaWebSocketFrameHandler.notifyOnClose(JakartaWebSocketFrameHandler.java:295)
app//org.eclipse.jetty.ee10.websocket.jakarta.common.JakartaWebSocketFrameHandler.onClose(JakartaWebSocketFrameHandler.java:267)
app//org.eclipse.jetty.ee10.websocket.jakarta.common.JakartaWebSocketFrameHandler.onFrame(JakartaWebSocketFrameHandler.java:255)
app//org.eclipse.jetty.websocket.core.WebSocketCoreSession$IncomingAdaptor.onFrame(WebSocketCoreSession.java:680)
app//org.eclipse.jetty.websocket.core.AbstractExtension.nextIncomingFrame(AbstractExtension.java:145)
app//org.eclipse.jetty.websocket.core.internal.PerMessageDeflateExtension.nextIncomingFrame(PerMessageDeflateExtension.java:239)
app//org.eclipse.jetty.websocket.core.internal.PerMessageDeflateExtension$IncomingFlusher$$Lambda/0x000000001e90bd08.onFrame(Unknown Source)
app//org.eclipse.jetty.websocket.core.util.DemandingFlusher.emitFrame(DemandingFlusher.java:143)
app//org.eclipse.jetty.websocket.core.internal.PerMessageDeflateExtension$IncomingFlusher.handle(PerMessageDeflateExtension.java:382)
app//org.eclipse.jetty.websocket.core.util.DemandingFlusher.process(DemandingFlusher.java:167)
app//org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:262)
app//org.eclipse.jetty.util.IteratingCallback.succeeded(IteratingCallback.java:401)
app//org.eclipse.jetty.websocket.core.util.DemandingFlusher.onFrame(DemandingFlusher.java:105)
app//org.eclipse.jetty.websocket.core.internal.PerMessageDeflateExtension.onFrame(PerMessageDeflateExtension.java:96)
app//org.eclipse.jetty.websocket.core.ExtensionStack.onFrame(ExtensionStack.java:113)
app//org.eclipse.jetty.websocket.core.WebSocketCoreSession.onFrame(WebSocketCoreSession.java:463)
app//org.eclipse.jetty.websocket.core.WebSocketConnection.onFrame(WebSocketConnection.java:254)
app//org.eclipse.jetty.websocket.core.WebSocketConnection.fillAndParse(WebSocketConnection.java:447)
app//org.eclipse.jetty.websocket.core.WebSocketConnection.onFillable(WebSocketConnection.java:332)
app//org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
app//org.eclipse.jetty.http2.HTTP2StreamEndPoint.process(HTTP2StreamEndPoint.java:497)
app//org.eclipse.jetty.http2.HTTP2StreamEndPoint.processDataAvailable(HTTP2StreamEndPoint.java:484)
app//org.eclipse.jetty.http2.server.internal.ServerHTTP2StreamEndPoint.onDataAvailable(ServerHTTP2StreamEndPoint.java:40)
app//org.eclipse.jetty.http2.server.internal.HTTP2ServerConnection.onDataAvailable(HTTP2ServerConnection.java:158)
app//org.eclipse.jetty.http2.server.HTTP2ServerConnectionFactory$HTTPServerSessionListener.onDataAvailable(HTTP2ServerConnectionFactory.java:153)
app//org.eclipse.jetty.http2.HTTP2Stream.notifyDataAvailable(HTTP2Stream.java:861)
app//org.eclipse.jetty.http2.HTTP2Stream.processData(HTTP2Stream.java:543)
app//org.eclipse.jetty.http2.HTTP2Stream.onData(HTTP2Stream.java:461)
app//org.eclipse.jetty.http2.HTTP2Stream.process(HTTP2Stream.java:368)
app//org.eclipse.jetty.http2.HTTP2Session.onData(HTTP2Session.java:280)
app//org.eclipse.jetty.http2.HTTP2Connection.onData(HTTP2Connection.java:246)
app//org.eclipse.jetty.http2.parser.BodyParser.notifyData(BodyParser.java:103)
app//org.eclipse.jetty.http2.parser.DataBodyParser.onData(DataBodyParser.java:145)
app//org.eclipse.jetty.http2.parser.DataBodyParser.onData(DataBodyParser.java:140)
app//org.eclipse.jetty.http2.parser.DataBodyParser.parse(DataBodyParser.java:106)
app//org.eclipse.jetty.http2.parser.Parser.parseBody(Parser.java:229)
app//org.eclipse.jetty.http2.parser.Parser.parse(Parser.java:156)
app//org.eclipse.jetty.http2.parser.ServerParser.parse(ServerParser.java:121)
app//org.eclipse.jetty.http2.HTTP2Connection$HTTP2Producer.produce(HTTP2Connection.java:342)
app//org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produceTask(AdaptiveExecutionStrategy.java:512)
app//org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:258)
app//org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:201)
app//org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:311)
app//org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:979)
app//org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1209)
app//org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1164)
[email protected]/java.lang.Thread.runWith(Unknown Source)
[email protected]/java.lang.Thread.run(Unknown Source)

A third thread also blocks on the VaadinSession while handling a connection loss, but this one comes from the HeartbeatInterception:

[email protected]/jdk.internal.misc.Unsafe.park(Native Method)
[email protected]/java.util.concurrent.locks.LockSupport.park(Unknown Source)
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown Source)
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown Source)
[email protected]/java.util.concurrent.locks.ReentrantLock$Sync.lock(Unknown Source)
[email protected]/java.util.concurrent.locks.ReentrantLock.lock(Unknown Source)
app//com.vaadin.flow.server.VaadinService.lockSession(VaadinService.java:798)
app//com.vaadin.flow.server.VaadinService.findOrCreateVaadinSession(VaadinService.java:845)
app//com.vaadin.flow.server.VaadinService.findVaadinSession(VaadinService.java:690)
app//com.vaadin.flow.server.communication.PushHandler.handleConnectionLost(PushHandler.java:414)
app//com.vaadin.flow.server.communication.PushHandler.connectionLost(PushHandler.java:368)
app//com.vaadin.flow.server.communication.PushAtmosphereHandler$AtmosphereResourceListener.onDisconnect(PushAtmosphereHandler.java:113)
app//org.atmosphere.cpr.AtmosphereResourceImpl.onDisconnect(AtmosphereResourceImpl.java:752)
app//org.atmosphere.cpr.AtmosphereResourceImpl.notifyListeners(AtmosphereResourceImpl.java:644)
app//org.atmosphere.cpr.AtmosphereResponseImpl.handleException(AtmosphereResponseImpl.java:732)
app//org.atmosphere.cpr.AtmosphereResponseImpl.access$1500(AtmosphereResponseImpl.java:57)
app//org.atmosphere.cpr.AtmosphereResponseImpl$Stream.write(AtmosphereResponseImpl.java:958)
app//org.atmosphere.cpr.AtmosphereResponseImpl.write(AtmosphereResponseImpl.java:805)
app//org.atmosphere.interceptor.HeartbeatInterceptor.lambda$clock$0(HeartbeatInterceptor.java:367)
app//org.atmosphere.interceptor.HeartbeatInterceptor$$Lambda/0x0000000021c1cf78.call(Unknown Source)
[email protected]/java.util.concurrent.FutureTask.run(Unknown Source)
[email protected]/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
[email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
[email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
[email protected]/java.lang.Thread.runWith(Unknown Source)
[email protected]/java.lang.Thread.run(Unknown Source)

I'm not sure where things go wrong, but this does look a bit suspicious to me.

The blocked thread is waiting in BlockingContentProducer.nextChunk until content is available.

From what I see, the semaphore should eventually be released in BlockingContentProducer.onContentAvailable when new content is available, which should be called from HttpInput.run.

I'm not actually seeing a deadlock here, but I find it suspicious that some threads wait on the VaadinSession and another holds it while blocking for some other reason.

How to reproduce?
Unfortunately, I don't have a reproducer. However, we have several reports of this on shutdown, so that might have to do something with it. I hop you can do something with the thread dumps.

The text was updated successfully, but these errors were encountered:

sbordet · 2024-09-13T17:54:00Z

I don't think this is a Jetty issue.

Vaadin should not grab a lock and then perform a blocking operation like ServletInputStream.read() while holding the lock (what the first stack trace does).

The other 2 stack traces are now waiting on VaadinService.lockSession().

Did you report this issue to Vaadin?

mperktold · 2024-09-14T10:18:27Z

Did you report this issue to Vaadin?

Yes, I did. I referenced the issue in the beginning of the description.

Vaadin should not grab a lock and then perform a blocking operation like ServletInputStream.read() while holding the lock (what the first stack trace does).

That was my impression as well. I cannot say for sure that this is what causes the deadlock, but it might be a good idea to avoid this regardless. Do you think the same is true for writing back the response?

sbordet · 2024-09-14T20:33:13Z

Do you think the same is true for writing back the response?

Yes, if a blocking API is used.

mperktold · 2024-09-16T09:14:02Z

Alright, I will forward these suggestions to Vaadin, thanks!

I am still interested, do you think that this is really the cause of the deadlock? Specifically, if you look at the threads waiting on the VaadinSession, are those the threads that would eventually release the BlockingContentProducer? Because to me, it looks like when content isn't immediately available, like it seems to be the case here, a read interested is registered in the selector, which should notify HttpInput when content becomes available. Can that really be affected by some request handler being blocked on a lock?

I can see some threads are waiting in ManagedSelector.select, wile the thread holding the VaadinSession also comes from a call to FillInterest.fillable in SelectableChannelEndPoint. Who would be responsible for releasing the BlockingContentProducer in this case?

sbordet · 2024-09-16T09:19:53Z

A blocker read will be eventually woken up, either when data is available, or when a timeout occurs.

The problem is that in both cases the wait can be really long (10s of seconds, minutes or even more), so if something else happens in the system that requires access to the Vaadin session, then these threads will be blocked for a long time.

The proper solution is to not perform blocking API calls with locks held.

Legioth · 2024-09-16T10:51:08Z

A blocker read will be eventually woken up, either when data is available, or when a timeout occurs.

Does this mean that you don't see any reason for why either of the two threads waiting for the Vaadin session would prevent the first thread from eventually proceeding? This in turn would imply that shutdown would still be blocked by that first thread regardless of whether that thread holds the Vaadin session lock while reading the request body (except if the client for some reason delays sending more bytes until one of the other threads have made progress)?

Legioth · 2024-09-16T12:57:33Z

...or is this a form of head-of-line blocking with a TCP-level buffer that full of bytes for one of the threads waiting for the lock which prevents the demultiplexer from reaching the bytes that would allow the first thread to proceed?

sbordet · 2024-09-16T13:17:58Z

@Legioth I am not sure I understand your comments.

A blocking operation performed with locks held is typically a mistake.

In this case, there is nothing that Jetty can do, it's a Vaadin issue.

Note that this will happen with any Servlet Container, not just Jetty, and for that matter with any InputStream or other class that exposes a blocking API.

Legioth · 2024-09-16T13:55:58Z

I understand the theoretical issues with dining philosophers and so on. Holding a lock while doing a blocking operation (e.g. acquiring another lock or blocking I/O) is generally fine when it comes to deadlocks as long as that blocking operation is independent from the lock. That's usually the case when a high-level abstraction calls down to a lower-level abstraction since the lower-level abstraction doesn't have any direct code path that leads to that lock.

It's a bit surprising that a HTTP request wouldn't be independent but I guess that's what TPC head-of-line blocking is all about. Could you confirm that this would be the likely underlying reason also from your perspective or is there some other factor at play here as well? If TCP head-of-line blocking is the expected cause, then this issue could be closed on my (and thus also Vaadin's) behalf.

I'm not saying that Vaadin's implementation might not be a mistake, but it's then at least a mistake that has taken more than 10 years for anyone to discover 😄. I would just prefer to have a proper understanding of the underlying mechanisms before I start digging into whether anything could be changed on Vaadin's side.

sbordet · 2024-09-16T14:54:20Z

@Legioth I think you have the locking understanding the other way around.
High level abstractions holding locks while calling low level abstractions is a recipe for problems.

It's a bit surprising that a HTTP request wouldn't be independent

They are.
It's Vaadin that it is establishing a dependency by grabbing a lock on the same Vaadin session.

that's what TPC head-of-line blocking is all about

TCP HoL blocking is just one of the multiple possible cases of blocking, but any blocking would show the problem.

I would not concentrate on TCP HoL blocking, as this is a problem at a different level.
Even if the protocol you are using does not suffer from TCP HoL blocking, for example HTTP/3, you would have problems.

The API called by Vaadin are blocking, so if the remote client is not sending data then the API will block, no matter whether the data is transported by a protocol that suffers from TCP HoL blocking.

I would not close the issue on the Vaadin side without a fix, no matter if it took 10 years to discover.

Legioth · 2024-09-17T07:23:50Z

I see the possibility of TCP HoL blocking as "proof" that there's nothing Jetty could do in this case regardless of whether that's actually what goes on in this specific case. This also means that this issue could be closed (which can only be done by the original reporter or project maintainers) while keeping the Vaadin issue open.

But... I'm also curious to gain a more in-depth understanding even though I also realize that it's not your duty to educate me. 😄

My statement about holding locks while calling a lower abstraction is maybe not accurate - it goes for calling any code where you are not sure about what shared resources it might try to use. A typical deadlock has two separate shared resources that are acquired in an inconsistent order. Vaadin's session lock is one of those resources but there also has to be some other shared resource for a deadlock to happen. Treating blocking I/O in general as a shared resource might be a good heuristic but I would like to have something more specific for my mental model. In the case of TCP HoL blocking, that specific other resource is the read buffer of the shared TCP connection which causes seemingly independent HTTP/2 requests to actually depend on each other. What other shared resources could there be that might cause a deadlock between concurrent HTTP requests or responses?

sbordet · 2024-09-17T08:32:46Z

@Legioth this is technically not a deadlock because there is no circular wait, but instead it is just a "hold and wait" situation, which would not happen if I/O reads/writes never block.

For HTTP/1.1, the wait in reads/writes is caused by TCP HoL blocking and TCP congestion, respectively.
For HTTP/2, the wait is typically caused by lack of data (for reads) or flow control stalling (for both reads and writes), but TCP may be involved too.
For HTTP/3, the wait is also caused by flow control stalling like HTTP/2, but TCP is not involved.

As for your mental model, I/O operations and blocking access to bounded resources (e.g. a connection pool towards a JDBC database, but also adding items to a bounded queue) are the "wait" part of the "hold and wait" situation, and you should avoid to "hold" when performing those "wait" operations.

In the case of TCP HoL blocking, that specific other resource is the read buffer of the shared TCP connection which causes seemingly independent HTTP/2 requests to actually depend on each other

While it is possible that few requests cause the HTTP/2 session flow control blocking (for reads and writes), or TCP HoL blocking, or TCP flow control blocking, so that other requests on the same connection would depend on the few ones, it does not seem to be what happens here.

This is just one request failing to provide the request content, and other requests from the same client waiting on the Vaadin session lock (which I assume be per-client).

Other requests from other clients should be able to proceed without problem, so there is no dependency.

Legioth · 2024-09-17T10:59:07Z

Thanks for the clarification. I think we have partially talked across each other based on slightly different initial assumptions.

I have been assuming that the first thread was stalled because the bytes it needed were held up by something from one of the two other threads, i.e. that the whole situation could be avoided if either of the two other threads could acquire the session lock and proceed to a point where they would indirectly release the bytes needed by the first thread.

It seems like your assumption is that the first thread was stalled because the needed bytes never even reached the server, i.e. that the first stalled thread is inevitable but a better locking strategy in Vaadin could prevent the two other threads from also being stalled.

sbordet · 2024-09-17T13:05:54Z

I have been assuming that the first thread was stalled because the bytes it needed were held up by something from one of the two other threads, i.e. that the whole situation could be avoided if either of the two other threads could acquire the session lock and proceed to a point where they would indirectly release the bytes needed by the first thread.

I doubt this is the case, looking at the stack traces.

It seems like your assumption is that the first thread was stalled because the needed bytes never even reached the server, i.e. that the first stalled thread is inevitable but a better locking strategy in Vaadin could prevent the two other threads from also being stalled.

Correct, this seems likely the case from the stack traces.

Legioth · 2024-09-17T13:39:35Z

I realized the same now when looking more closely at the stack traces. Thread 2 is a websocket close frame which shouldn't have any payload that remains to be read while thread 3 is a timer rather than any request handling.

mperktold · 2024-09-17T14:15:06Z

I just realized that we also use virtual threads in some cases, and those will not show up in our handmade thread dump.

This might only be remotely relevant for this particular case, since as you said, one of the threads is probably stuck inevitably. Still, it's quite possible that those virtual threads also wait on the VaadinSession, since we use them for background threads that need to access Vaadin components.

sbordet · 2024-09-17T14:32:27Z

@mperktold if you use Jetty 12, you can use VirtualThreadPool, which has built-in facilities to dump the virtual threads, including those that are suspended.
See https://jetty.org/docs/jetty/12/programming-guide/arch/threads.html#thread-pool-virtual-threads-virtual.

joakime · 2024-09-17T14:43:49Z

I realized the same now when looking more closely at the stack traces. Thread 2 is a websocket close frame which shouldn't have any payload that remains to be read while thread 3 is a timer rather than any request handling.

A WebSocket CLOSE Frame has payload.
It can contain up to 125 bytes of payload, of which is 2 bytes of close code, and 123 bytes of reason.
These are technically optional, but in practice, every implementation we've come across always provides the 2 byte close code at least. Most include the close reason.

mperktold · 2024-10-15T16:56:57Z

Some more insights that we gained in the meantime:

I wrote that this happens during shutdown, but that's not true. It's just that once this has happened, it prevents our shutdown logic, because some users are still logged in. The request that never provides the content happens much earlier though.

In our own installation, this happens like once every few days, but only for some users who typically work on notebooks with multiple tabs open.

So while I agree that it looks like in these cases, the client never sends the request content, I don't think that this would happen so frequently. Could there be any other circumstances that lead to the same behavior, maybe related to WIFI, or power save mode?

Could a bug in our or Vaadin's code potentially prevent the content from being read in this manner?

mperktold · 2024-11-06T16:58:37Z

In a new case, I have examined the logs and found the following exception:

Reset cancel_stream_error
    org.eclipse.jetty.io.EofException
        at org.eclipse.jetty.http2.server.internal.HttpStreamOverHTTP2.onFailure(HttpStreamOverHTTP2.java:593)
        at org.eclipse.jetty.http2.server.internal.HTTP2ServerConnection.onStreamFailure(HTTP2ServerConnection.java:204)
        at org.eclipse.jetty.http2.server.HTTP2ServerConnectionFactory$HTTPServerSessionListener.onFailure(HTTP2ServerConnectionFactory.java:173)
        at org.eclipse.jetty.http2.server.HTTP2ServerConnectionFactory$HTTPServerSessionListener.onReset(HTTP2ServerConnectionFactory.java:160)
        at org.eclipse.jetty.http2.HTTP2Stream.notifyReset(HTTP2Stream.java:876)
        at org.eclipse.jetty.http2.HTTP2Stream.onReset(HTTP2Stream.java:586)
        at org.eclipse.jetty.http2.HTTP2Stream.process(HTTP2Stream.java:357)
        at org.eclipse.jetty.http2.HTTP2Session.onReset(HTTP2Session.java:345)
        at org.eclipse.jetty.http2.HTTP2Connection.onReset(HTTP2Connection.java:258)
        at org.eclipse.jetty.http2.parser.BodyParser.notifyReset(BodyParser.java:139)
        at org.eclipse.jetty.http2.parser.ResetBodyParser.onReset(ResetBodyParser.java:94)
        at org.eclipse.jetty.http2.parser.ResetBodyParser.parse(ResetBodyParser.java:61)
        at org.eclipse.jetty.http2.parser.Parser.parseBody(Parser.java:229)
        at org.eclipse.jetty.http2.parser.Parser.parse(Parser.java:156)
        at org.eclipse.jetty.http2.parser.ServerParser.parse(ServerParser.java:121)
        at org.eclipse.jetty.http2.HTTP2Connection$HTTP2Producer.produce(HTTP2Connection.java:342)
        at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produceTask(AdaptiveExecutionStrategy.java:512)
        at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:258)
        at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:201)
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:311)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:979)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1209)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1164)
        at java.base/java.lang.Thread.run(Unknown Source)
    Caused by: java.io.EOFException: Reset cancel_stream_error
        at org.eclipse.jetty.http2.server.HTTP2ServerConnectionFactory$HTTPServerSessionListener.onReset(HTTP2ServerConnectionFactory.java:159)
        ... 20 more

I understand that this error is unavoidable, as this is just what happens when the client closes the connection.

But a few minutes afterwards, it appears that we are again in the situation where a thread is stuck reading the content while holding the lock on the VaadinSession. I don't have a thread dump of this situation, so I don't know for sure, but the logs show a symptom that leads me to believe this is the case here (we start a timer task that locks the VaadinSession, but the logs neither show that the task is executed nor that it is cancelled, so it probably waits for the lock forever).

Is it possible that this error isn't handled correctly by Jetty, in the sense that some threads might still try to read something from the connection that is already closed? Or maybe this only arises when a lock like the one on the VaadinSession is held like in our case?

sbordet · 2024-11-06T18:05:59Z

Is it possible that this error isn't handled correctly by Jetty, in the sense that some threads might still try to read something from the connection that is already closed?

The connection is not closed, this is just a stream failure due to the fact that the client sent a reset to the server, just for that particular stream.

Jetty does not have threads reading or otherwise interacting with this canceled stream ever again: the failure is notified to the application, and if the application blocks you should see a clear stack trace.

mperktold · 2024-11-07T07:53:36Z

The connection is not closed, this is just a stream failure due to the fact that the client sent a reset to the server, just for that particular stream.

I see. Sorry, I am not very familiar with low level HTTP/2 stuff, thanks for the clarification.

Jetty does not have threads reading or otherwise interacting with this canceled stream ever again: the failure is notified to the application, and if the application blocks you should see a clear stack trace.

What if a thread was already reading in a blocking way from it, will it wake up in this case? I am talking about the first stack trace in my original post, where a thread is blocked while waiting on the request content. I don't know whether the thread was already blocked before the stream reset or afterwards, or whether it is actually the same stream or not. I just want to investigate whether these two things, i.e. a thread blocked waiting for request content, and a stream reset sent by the client, could lead to the behavior we are seeing.

On a related note: if the request was really blocked because the client never sends the content, what do you think would happen when the user closes the browser and maybe even shuts down the device? Would the server notice that and give up on the request content, or would it still block forever?

sbordet · 2024-11-07T08:53:45Z

What if a thread was already reading in a blocking way from it, will it wake up in this case?

Yes, it will be woken up with an exception (your first stack trace).

Upon receiving a request, a stream is always subject to the idle timeout.
If the client does not send the request content, the server will idle timeout the stream, and it will be the server sending a reset to the client.

mperktold · 2024-11-07T16:08:40Z

Yes, it will be woken up with an exception (your first stack trace).

That stack trace is from a thread dump, there is no exception. On the contrary, it shows that the request is still blocked after a very long time.

Upon receiving a request, a stream is always subject to the idle timeout.
If the client does not send the request content, the server will idle timeout the stream, and it will be the server sending a reset to the client.

I tried to replicate the scenario with a simple Jetty client and server, and indeed, when I shut down the client before it sends the request content, the server wakes up from blocking on the content and continues normally.

On the other hand, we have reports where the thread remains blocked over night, so I wonder how that is possible.

sbordet · 2024-11-07T19:29:30Z

That stack trace is from a thread dump, there is no exception. On the contrary, it shows that the request is still blocked after a very long time.

Yes, the discussion in this issue has been that there is locking that prevents the read to be woken up.

Can you reproduce the issue with Jetty DEBUG logs enabled?

mperktold · 2024-11-08T06:29:00Z

I think I managed to reproduce the issue with Jetty's HTTP2Client.

The server part is a echo servlet that simply writes the request content back to the response:

var server = new Server();
var connector = new ServerConnector(server, new HTTP2ServerConnectionFactory());
connector.setPort(8080);
server.addConnector(connector);
var servletContextHandler = new ServletContextHandler("/");
server.setHandler(servletContextHandler);
servletContextHandler.addServlet(new HttpServlet() {
    @Override
    protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws IOException {
        System.out.println("POST begin");
        try (var reader = req.getReader();
            var writer = resp.getWriter()
        ) {
            reader.transferTo(writer);
        }
        finally {
            System.out.println("POST end");
        }
    }
}, "/");
server.start();
server.join();

The client part uses the HTTP2Client to send a data frame:

try (var client = new HTTP2Client(new ClientConnector())) {
    client.start();
    Session session = client.connect(
        new InetSocketAddress("localhost", 8080),
        new ServerSessionListener() {}
    ).join();
    var request = new MetaData.Request(
        "POST",
        HttpURI.from("http://localhost:8080/"),
        HttpVersion. HTTP_2,
        HttpFields.EMPTY
    );
    var headersFrame = new HeadersFrame(request, null, false);
    var responseListener = new Stream.Listener() {
        @Override
        public void onDataAvailable(Stream stream) {
            while (true) {
                Stream.Data data = stream.readData();
                if (data == null) {
                    stream.demand();
                    return;
                }
                System.out.println(data);
                data.release();
                if (data.frame().isEndStream())
                    return;
            }
        }
    };
    Stream stream = session.newStream(headersFrame, responseListener).join();
    ByteBuffer content = StandardCharsets.UTF_8.encode("Hello Jetty");
    var dataFrame = new DataFrame(stream.getId(), content, false);
    System.out.println(stream.data(dataFrame).join());
    stream.demand();
    session.shutdown().join();
}

Note that the data frame is constructed with endStream==false, so the server naturally blocks for more content. If you then kill the client process, the server keeps blocking for the request content. I've even let this run over night, and the server was still blocking the next morning. And, if you get a thread dump, you can see that it is blocks at the exact same spot as I posted before.

I'm certainly misusing the HTTP2Client here, but I think there should be some way for the server to notice that the client is gone. It could be an exception that is thrown when the stream is closed, or some kind of idle timeout, or maybe something else.

If this is not a valid/helpful reproducer, I can also try to enable DEBUG logs.

Added test case. Signed-off-by: Simone Bordet <[email protected]>

Updated test case to use Input/OutputStream as well as Reader/Writer. Signed-off-by: Simone Bordet <[email protected]>

sbordet · 2024-11-08T18:18:48Z

@mperktold I'm looking into this.

Fixed test case expectations. Signed-off-by: Simone Bordet <[email protected]>

More tests. Signed-off-by: Simone Bordet <[email protected]>

sbordet · 2024-11-08T19:29:32Z

@mperktold there are multiple issues at play here.

Your use of reader.transferTo(writer) is such that the output is aggregated and not immediately sent from the server to the client. This is a minor issue, but I mentioned it in case you were surprised that the System.out in onDataAvailable() would not print anything: onDataAvailable() is not called because the content is aggregated on the server and not sent yet to the client.
The semantic of HTTP/2 session.shutdown() and session.close(...) is such that they are graceful closes. This means that streams that are pending will be waited for completion -- this is mandated by the specification, see this section.

I have ported your test (and more) into #12506.

Your test is calling session.shutdown().join(), and because you wait blocking and do not complete the pending streams, the server is waiting.

This is expected behavior when using the low-level HTTP/2 APIs: you have more control, but you have to be more precise in what your application is doing.

In case the client does not finish the pending streams, either by reset or by sending the last frame, the server will eventually idle timeout, and wake up the blocked read by throwing an exception.

In summary, Jetty behaves correctly.
So far, I have not yet seen evidence that this issue is a Jetty problem.

It might be something that we did not find yet, but you need to prove it is a Jetty fault 😄

mperktold · 2024-11-09T07:28:14Z

Thanks for your investigation and clarification. It's my first time using HTTP2Client directly, so I am surely not using it correctly.

But in part, I do that on purpose here, to see how the server reacts when the client fails to shut down gracefully.

In case the client does not finish the pending streams, either by reset or by sending the last frame, the server will eventually idle timeout, and wake up the blocked read by throwing an exception.

This would be perfect, but I was not seeing this in my tests. I tried again and made the following observations:

When I just let the client block in session.shutdown().join(), eventually both will stop blocking due to an idle timeout.
When I kill the client process before that, the server never goes into idle timeout.

So apparently, the idle timeout works on the client, but not in the server in this case. When it occurs on the client, it probably notifies the server about it via a GOAWAY frame or something similar, but the server doesn't seem to notice the idle timeout itself.

Can you maybe replicate this in your tests?

sbordet · 2024-11-09T09:40:52Z

@mperktold since you can replicate, can you please detail the exact steps you are doing, post the exact code, and DEBUG logs? Thanks!

mperktold added the Bug For general bugs on Jetty side label Sep 13, 2024

mperktold mentioned this issue Sep 16, 2024

Potential deadlock with Jetty vaadin/flow#19938

Open

kgrits mentioned this issue Oct 29, 2024

Jetty threads stuck on BlockingContentProvider.nextContent #12434

Open

sbordet added a commit that referenced this issue Nov 8, 2024

Issue #12272 - Potential deadlock with Vaadin.

473445b

Added test case. Signed-off-by: Simone Bordet <[email protected]>

sbordet linked a pull request Nov 8, 2024 that will close this issue

Issue #12272 - Potential deadlock with Vaadin. #12506

Open

sbordet added a commit that referenced this issue Nov 8, 2024

Issue #12272 - Potential deadlock with Vaadin.

9f8cc61

Updated test case to use Input/OutputStream as well as Reader/Writer. Signed-off-by: Simone Bordet <[email protected]>

sbordet added a commit that referenced this issue Nov 8, 2024

Issue #12272 - Potential deadlock with Vaadin.

279f710

Fixed test case expectations. Signed-off-by: Simone Bordet <[email protected]>

sbordet added a commit that referenced this issue Nov 8, 2024

Issue #12272 - Potential deadlock with Vaadin.

7b9b156

More tests. Signed-off-by: Simone Bordet <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential deadlock with Vaadin #12272

Potential deadlock with Vaadin #12272

mperktold commented Sep 13, 2024

sbordet commented Sep 13, 2024

mperktold commented Sep 14, 2024

sbordet commented Sep 14, 2024

mperktold commented Sep 16, 2024

sbordet commented Sep 16, 2024

Legioth commented Sep 16, 2024

Legioth commented Sep 16, 2024

sbordet commented Sep 16, 2024

Legioth commented Sep 16, 2024 •

edited

Loading

sbordet commented Sep 16, 2024

Legioth commented Sep 17, 2024

sbordet commented Sep 17, 2024

Legioth commented Sep 17, 2024

sbordet commented Sep 17, 2024

Legioth commented Sep 17, 2024

mperktold commented Sep 17, 2024

sbordet commented Sep 17, 2024

joakime commented Sep 17, 2024 •

edited

Loading

mperktold commented Oct 15, 2024

mperktold commented Nov 6, 2024

sbordet commented Nov 6, 2024

mperktold commented Nov 7, 2024

sbordet commented Nov 7, 2024

mperktold commented Nov 7, 2024

sbordet commented Nov 7, 2024

mperktold commented Nov 8, 2024

sbordet commented Nov 8, 2024

sbordet commented Nov 8, 2024

mperktold commented Nov 9, 2024

sbordet commented Nov 9, 2024

Potential deadlock with Vaadin #12272

Potential deadlock with Vaadin #12272

Comments

mperktold commented Sep 13, 2024

sbordet commented Sep 13, 2024

mperktold commented Sep 14, 2024

sbordet commented Sep 14, 2024

mperktold commented Sep 16, 2024

sbordet commented Sep 16, 2024

Legioth commented Sep 16, 2024

Legioth commented Sep 16, 2024

sbordet commented Sep 16, 2024

Legioth commented Sep 16, 2024 • edited Loading

sbordet commented Sep 16, 2024

Legioth commented Sep 17, 2024

sbordet commented Sep 17, 2024

Legioth commented Sep 17, 2024

sbordet commented Sep 17, 2024

Legioth commented Sep 17, 2024

mperktold commented Sep 17, 2024

sbordet commented Sep 17, 2024

joakime commented Sep 17, 2024 • edited Loading

mperktold commented Oct 15, 2024

mperktold commented Nov 6, 2024

sbordet commented Nov 6, 2024

mperktold commented Nov 7, 2024

sbordet commented Nov 7, 2024

mperktold commented Nov 7, 2024

sbordet commented Nov 7, 2024

mperktold commented Nov 8, 2024

sbordet commented Nov 8, 2024

sbordet commented Nov 8, 2024

mperktold commented Nov 9, 2024

sbordet commented Nov 9, 2024

Legioth commented Sep 16, 2024 •

edited

Loading

joakime commented Sep 17, 2024 •

edited

Loading