Intermittent deadlock when closing a channel using CloseAsync in 7.x #1751

Andersso · 2024-12-19T20:50:47Z

Describe the bug

Hi there,

Ever since upgrading from 6.x to 7.x, I've been running into intermittent deadlocks whenever I try to close a channel via CloseAsync.
I haven't been able to reproduce it locally, but I've been able to do some remote debugging, but I could not get any insight. (all TP threads are waiting for work)

I did however manage to run a dotnet-dump dumpasync during one of these deadlocks and got the following info:

First dump

00007ebcbe050400 00007efcd1b1e298 ( ) System.Threading.Tasks.Task
  00007ebcbe9d7c48 00007efcd74554b8 (0) RabbitMQ.Client.ConsumerDispatching.ConsumerDispatcherChannelBase+<WaitForShutdownAsync>d__17
    00007ebcbe915d68 00007efcd7453028 (3) RabbitMQ.Client.Impl.Channel+<CloseAsync>d__73
      00007ebcbe915e10 00007efcd74533e0 (0) RabbitMQ.Client.Impl.AutorecoveringChannel+<CloseAsync>d__53
        00007ebcbe915ea0 00007efcd7453788 (0) <My code>

Second dump (another instance)

00007f0a56a17290 00007f4a69375380 ( ) System.Threading.Tasks.Task
  00007f0a597ed238 00007f4a6d2dd968 (0) RabbitMQ.Client.ConsumerDispatching.ConsumerDispatcherChannelBase+<WaitForShutdownAsync>d__17
    00007f0a59573cf8 00007f4a6d2da998 (3) RabbitMQ.Client.Impl.Channel+<CloseAsync>d__73
      00007f0a59573da0 00007f4a6d2dad50 (0) RabbitMQ.Client.Impl.AutorecoveringChannel+<CloseAsync>d__53
        00007f0a59573e30 00007f4a6d2db0f8 (0) <My code>

I noticed that in both dump instances, the stacks aren’t displayed with the usual Awaiting: notation you often see in async stack traces, but it might be normal.

Reproduction steps

I haven’t pinned down a reliable way to reproduce this, but calling CloseAsync more frequently seems to increase the chances of hitting the deadlock. It also appears more common on Linux than Windows, though that might just be due to hardware differences rather than OS behavior.

Expected behavior

When calling CloseAsync, I’d expect the channel to close normally without causing a deadlock.

Additional context

No response

The text was updated successfully, but these errors were encountered:

lukebakken · 2024-12-19T21:13:35Z

Hi, thanks for the report. As I'm sure you're aware of, there's not much to work with here 😸 Obviously, the gold standard is to provide code that reproduces this issue, or at least some idea of steps to do so.

calling CloseAsync more frequently

What does this mean? Do you have some way in your application to increase the frequency of channel closure?

Andersso · 2024-12-19T21:41:36Z

What does this mean? Do you have some way in your application to increase the frequency of channel closure?

We're running tests that create and close channels very frequently, and it appears that the test suite that do this the most; is the the one that is usually getting stuck.

Anyhow, I can try to look dig into this further and see if I can provide something that will help you reproduce it.

Thanks

michaelklishin · 2024-12-19T22:14:39Z

@Andersso channel and connection churn are workloads explicitly recommended against.

lukebakken · 2024-12-19T22:17:05Z

We're running tests that create and close channels very frequently, and it appears that the test suite that do this the most; is the the one that is usually getting stuck.

It would be extremely helpful for you to share your test code. If you can't do that, describe the test as best you can:

How many channels are created at any given point?
Are they created concurrently?
Is your test code a separate, console app, or using a test framework like xunit?

My guess is that you could be hitting a ThreadPool limit which prevents a Task from being scheduled, while another Task waits for the result. If you'd like to test that theory, please add the following code to the startup of your test program / test suite:

ThreadPool.SetMinThreads(16 * Environment.ProcessorCount, 16 * Environment.ProcessorCount);

This is a related issue:
#1354

michaelklishin · 2024-12-19T23:19:05Z

Also note that management UI has connection and channel churn metrics, on the Overview page but also on the node page IIRC.

So at the very least it should be easy to see the churn rate: is it 50 channels opened per second? Is it 200?

lukebakken · 2024-12-26T22:59:01Z

@Andersso @ZajacPiotr98 - I've modified a test app in this project to try and trigger the error in this issue, or the error in #1749, and it works fine every time in my environment:

https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/rabbitmq-dotnet-client-1749/projects/Applications/CreateChannel/Program.cs

Andersso · 2025-01-13T20:17:33Z

Hi again, and sorry for the delayed response. I hope you guys had a good Christmas and new year!

How many channels are created at any given point?
The application creates a total of 20 channels, with 5 being recreated between each test case.
Are they created concurrently?
No, they are created one at a time and stopped one at a time.
Is your test code a separate, console app, or using a test framework like xunit?
The actual application is a console application, running separately from the test application (different instances). The tests use RabbitMQ to send messages to the application but maintain their own connection/channel, which is kept alive for the entire test suite.

I’ve been working on reproducing the issue in a test project but haven’t had any success. I’ve tried experimenting with different thread pool sizes, but it didn’t seem to affect the outcome. Based on my investigation of my latest memory dump, there’s no indication of thread pool starvation; all the threads in the pool are idle and waiting for work. It is also worth mentioning that my application is a console app so it does not have any synchronization context.

Regarding the connection churn, wouldn’t that have caused issues in the 6.x versions as well? We’ve had this setup running fine for years without any problems until the upgrade to 7.x.

I’ve done some additional digging by analyzing the memory dump. Specifically, I’ve looked at the tasks being awaited in the method that always seem to get stuck (according to the async dump):

    class ConsumerDispatcherChannelBase
    {
        public async Task WaitForShutdownAsync()
        {
            if (_disposed) // Value=0
            {
                return;
            }

            if (_quiesce) // Value=1
            {
                try
                {
                    await _reader.Completion; // m_stateFlags=33555520 (WaitingForActivation)
                    await _worker; // m_stateFlags=117441536 (RanToCompletion)
                }
            }
        }
    }

It appears that the channel never gets completed, which prevents the method from ever completing.

lukebakken · 2025-01-16T17:36:20Z

@Andersso I'm going to publish a 7.1.0 alpha release right now. When it's ready, I'll follow-up here. There have been a couple fixes merged that might help here.

Any luck reproducing this issue reliably?

lukebakken · 2025-01-17T00:56:00Z

@Andersso please give this version a try! https://www.nuget.org/packages/RabbitMQ.Client/7.1.0-alpha.0

Andersso · 2025-01-21T10:26:25Z

@Andersso please give this version a try! https://www.nuget.org/packages/RabbitMQ.Client/7.1.0-alpha.0

Hey,
I tested it over the weekend, unfortunately it did not change anything.

ZajacPiotr98 · 2025-01-21T15:53:25Z

I also performed the test with the alpha version and same results. Also I tried to do workaround with passing cancellation token to the CloseAsync method, but it does not work as CloseAsync is still running on cancellation request. So I think it block on awaiting ChannelCloseAsyncRpcContinuation or ConsumerDispatcher.WaitForShutdownAsync().

In my case it was around 500 close requests in 2 minutes from one instance of my application (overall 6 instances, 5 connection each, 5 RabbitMQ nodes with 3GiB high watermark). Second instance of app had same issue for around 1000 close requests in 4 minutes.

lukebakken · 2025-01-21T17:01:30Z

Thanks for your reports. I'll try to reproduce this issue locally, though I've had no luck so far.

Fixes #1751 Attempt to fix deadlock by waiting on channel dispatcher first, then channel reader.

lukebakken · 2025-01-22T21:07:09Z

@Andersso @ZajacPiotr98 I'm wondering if you're running into this condition - https://stackoverflow.com/a/66521303

Is it possible to test my PR branch in your environments?
#1771

If not, I can publish another alpha release. Thank you!

Andersso · 2025-01-23T07:48:17Z

Hey, I do not have the infrastructure to use your repo directly. A nuget package would be perfect!

Thanks

lukebakken · 2025-01-23T16:18:34Z

@Andersso - I build the packages locally on my branch, and uploaded them here:

https://www.myget.org/feed/rabbitmq-dotnet-client/package/nuget/RabbitMQ.Client/7.1.0-alpha.0.1

ZajacPiotr98 · 2025-01-24T10:35:28Z

I tested this PR and issue is still there. I added logs and it seems that for some reason _worker finished, but it's stuck on the _reader.Completion. I think it may be the case when the _worker was stopped for some reason (exception?) and there was still work never read from a channel, most likely WorkType.Shutdown. I tried to reproduce it with more logs, but unfortunately never happened.

lukebakken · 2025-01-24T16:38:22Z

Thanks for the follow-up. I wish I could reproduce this! I think the best fix will be to either not await reader.Completion or drain the reader first.

lukebakken · 2025-01-24T17:27:27Z

Related PR that may have caused this behavior - #1606

@bollhals if you have a second to read this issue and comment, I would appreciate it. Thanks!

lukebakken · 2025-01-24T17:53:14Z

@ZajacPiotr98 @Andersso I've uploaded a new version to MyGet -

https://www.myget.org/feed/rabbitmq-dotnet-client/package/nuget/RabbitMQ.Client/7.1.0-alpha.0.2

When the AsyncConsumerDispatcher processing loop exits it should drain the _reader, allowing WaitForShutdownAsync to complete. It'll log the WorkType that is discarded.

Andersso · 2025-01-24T18:55:41Z

I will run 7.1.0-alpha.0.2 over the weekend, fingers crossed!

Sorry for my ignorance, but where does the log end up?

lukebakken · 2025-01-24T21:16:30Z

You have to configure EventSource output. There is an example in the integration tests:

https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/main/projects/Test/Common/TestOutputWriterEventListener.cs

Use that class as a starting point in your own project. Instead of writing to ITestOutputHelper like that code does, you'd log to the console, your own logging infrastructure, etc.

Andersso · 2025-01-31T14:26:25Z

Hey again, sorry for the delayed response.

Unfortunately, the issue is still present, and no log output has been observed. (I did verify that the event listener is working)
I've been trying to spend some time poking around in the memory dump to get further insight of the state of the Channel.
From what I have seen so far from one dump, is that everything points to the channel being completed, but the Completion task state remains WaitingForActivation, which I find very strange. But take my observations with a grain of salt.

I will have another dive once I have a fresh memory dump.

Thanks

lukebakken · 2025-01-31T17:27:02Z

@Andersso thanks for the report. Argh, I wish I could reproduce this issue here. I will try some other ideas and will publish a new release to MyGet. I REALLY appreciate you being willing to test and investigate.

Fixes #1751 Attempt to fix deadlock by waiting on channel dispatcher first, then channel reader.

Fixes #1751 See if not awaiting `_reader.Completion` fixes the issue.

lukebakken · 2025-02-13T20:38:11Z

@Andersso @ZajacPiotr98 - please test version 7.1.0-alpha.1.1, which includes the code in this pull request: #1782

https://www.myget.org/feed/rabbitmq-dotnet-client/package/nuget/RabbitMQ.Client

Thank you!

Andersso · 2025-02-14T08:01:41Z

Hey, I can concur that the memory leak fix did not solve the issue. I will try out the new package today. Thanks!

NathanielAB · 2025-02-17T08:21:51Z

@lukebakken Would it be possible to release the memory leak fix as a minor version? We are also noticing a number of cancellation tokens and would greatly appreciate the update. Thank you!

michaelklishin · 2025-02-17T16:45:24Z

@NathanielAB you probably mean "as a patch version"

lukebakken · 2025-02-17T16:57:19Z

I'll produce a new release once @Andersso and / or @ZajacPiotr98 confirm the fix in 7.1.0-alpha.1.1. It shouldn't take long. @NathanielAB you're more than welcome to use that version, of course!

https://www.myget.org/feed/rabbitmq-dotnet-client/package/nuget/RabbitMQ.Client

Andersso · 2025-02-18T11:18:00Z

Hey, I have run the new pre-release package over the weekend and I haven't observed it getting stuck, it looks promising!

lukebakken · 2025-02-18T18:01:05Z

@Andersso thanks for letting us know!

lukebakken · 2025-02-19T20:50:19Z

Hey everyone, hot off the presses: https://www.nuget.org/packages/RabbitMQ.Client/7.1.0

DenisMayorko · 2025-02-20T12:00:30Z

Tried version 7.1.0 but encountered locks. The MonitorHeld metric is increasing, while the thread remains unchanged. Some of the locks seem to be gone, but it looks like the issue is still present.

> syncblk                                                                                                                                                                             
Index         SyncBlock MonitorHeld Recursion Owning Thread Info          SyncBlock Owner
    7 0000555620C20198           39         1 00007E1D0400D240 29b605 104   00007e214ecbb518 System.Collections.Generic.Dictionary`2[[System.Int32, System.Private.CoreLib],[RabbitMQ.Client.Impl.ISession, RabbitMQ.Client]]
-----------------------------
Total           4621
Free            1255
> setthread 104                                                                                                                                                                       
> clrstack                                                                                                                                                                            
OS Thread Id: 0x29b605 (104)
        Child SP               IP Call Site
00007E1D2BFFD530 00007F2A133CFD8F RabbitMQ.Client.Util.IntAllocator+IntervalList.Merge(IntervalList, IntervalList)
00007E1D2BFFD570 00007F2A139B6E84 RabbitMQ.Client.Util.IntAllocator.Flush()
00007E1D2BFFD5A0 00007F2A109873DD RabbitMQ.Client.Impl.SessionManager.HandleSessionShutdownAsync(System.Object, RabbitMQ.Client.Events.ShutdownEventArgs)
00007E1D2BFFD620 00007F2A102538E6 RabbitMQ.Client.Impl.AsyncEventingWrapper`1+<InternalInvoke>d__12[[System.__Canon, System.Private.CoreLib]].MoveNext()
00007E1D2BFFD700 00007F2A10256762 System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[RabbitMQ.Client.Impl.AsyncEventingWrapper`1+<InternalInvoke>d__12[[System.__Canon, System.Private.CoreLib]], RabbitMQ.Client]](<InternalInvoke>d__12<System.__Canon> ByRef)
00007E1D2BFFD750 00007F2A1025610B RabbitMQ.Client.Impl.AsyncEventingWrapper`1[[System.__Canon, System.Private.CoreLib]].InvokeAsync(System.Object, System.__Canon)
00007E1D2BFFD800 00007F2A13A61609 RabbitMQ.Client.Impl.SessionBase.OnConnectionShutdownAsync(System.Object, RabbitMQ.Client.Events.ShutdownEventArgs)
00007E1D2BFFD830 00007F2A102538E6 RabbitMQ.Client.Impl.AsyncEventingWrapper`1+<InternalInvoke>d__12[[System.__Canon, System.Private.CoreLib]].MoveNext()
00007E1D2BFFD910 00007F2A10256762 System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[RabbitMQ.Client.Impl.AsyncEventingWrapper`1+<InternalInvoke>d__12[[System.__Canon, System.Private.CoreLib]], RabbitMQ.Client]](<InternalInvoke>d__12<System.__Canon> ByRef)
00007E1D2BFFD960 00007F2A1025610B RabbitMQ.Client.Impl.AsyncEventingWrapper`1[[System.__Canon, System.Private.CoreLib]].InvokeAsync(System.Object, System.__Canon)
00007E1D2BFFDA10 00007F2A13A1EEA6 RabbitMQ.Client.Framing.Connection.OnShutdownAsync(RabbitMQ.Client.Events.ShutdownEventArgs)
00007E1D2BFFDA30 00007F2A13A1EB32 RabbitMQ.Client.Framing.Connection+<ClosedViaPeerAsync>d__97.MoveNext()
00007E1D2BFFDAE0 00007F2A13A1E9CC System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[RabbitMQ.Client.Framing.Connection+<ClosedViaPeerAsync>d__97, RabbitMQ.Client]](<ClosedViaPeerAsync>d__97 ByRef) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs @ 38]
00007E1D2BFFDB60 00007F2A13A1E92C System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[RabbitMQ.Client.Framing.Connection+<ClosedViaPeerAsync>d__97, RabbitMQ.Client]](<ClosedViaPeerAsync>d__97 ByRef) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskMethodBuilder.cs @ 33]
00007E1D2BFFDB80 00007F2A13A1E8E7 RabbitMQ.Client.Framing.Connection.ClosedViaPeerAsync(RabbitMQ.Client.Events.ShutdownEventArgs, System.Threading.CancellationToken)
00007E1D2BFFDBE0 00007F2A13A1DB95 RabbitMQ.Client.Impl.Channel+<HandleConnectionCloseAsync>d__106.MoveNext()
00007E1D2BFFDD60 00007F2A13A1D844 System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[RabbitMQ.Client.Impl.Channel+<HandleConnectionCloseAsync>d__106, RabbitMQ.Client]](<HandleConnectionCloseAsync>d__106 ByRef) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs @ 38]
00007E1D2BFFDDF0 00007F2A13A1D79C System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Boolean, System.Private.CoreLib]].Start[[RabbitMQ.Client.Impl.Channel+<HandleConnectionCloseAsync>d__106, RabbitMQ.Client]](<HandleConnectionCloseAsync>d__106 ByRef) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskMethodBuilderT.cs @ 35]
00007E1D2BFFDE10 00007F2A13A1D750 RabbitMQ.Client.Impl.Channel.HandleConnectionCloseAsync(RabbitMQ.Client.Impl.IncomingCommand, System.Threading.CancellationToken)
00007E1D2BFFDE90 00007F2A10250DB8 RabbitMQ.Client.Impl.Channel+<HandleCommandAsync>d__83.MoveNext()
00007E1D2BFFDF90 00007F2A102530B0 System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[RabbitMQ.Client.Impl.Channel+<HandleCommandAsync>d__83, RabbitMQ.Client]](<HandleCommandAsync>d__83 ByRef) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs @ 38]
00007E1D2BFFDFD0 00007F2A10224540 RabbitMQ.Client.Impl.Session.HandleFrameAsync(RabbitMQ.Client.Impl.InboundFrame, System.Threading.CancellationToken)
00007E1D2BFFE040 00007F2A0D499B05 RabbitMQ.Client.Impl.MainSession.HandleFrameAsync(RabbitMQ.Client.Impl.InboundFrame, System.Threading.CancellationToken)
00007E1D2BFFE0B0 00007F2A10221B24 RabbitMQ.Client.Framing.Connection+<ProcessFrameAsync>d__130.MoveNext()
00007E1D2BFFE160 00007F2A102242E0 System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[RabbitMQ.Client.Framing.Connection+<ProcessFrameAsync>d__130, RabbitMQ.Client]](<ProcessFrameAsync>d__130 ByRef) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs @ 38]
00007E1D2BFFE1A0 00007F2A1025299F RabbitMQ.Client.Framing.Connection+<ReceiveLoopAsync>d__129.MoveNext()
00007E1D2BFFE390 00007F2A0FA05754 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 179]
00007E1D2BFFE3F0 00007F2A1025216D System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[RabbitMQ.Client.Framing.Connection+<ReceiveLoopAsync>d__129, RabbitMQ.Client]].MoveNext(System.Threading.Thread) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskMethodBuilderT.cs @ 368]
00007E1D2BFFE430 00007F2A0FAA292E System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Runtime.CompilerServices.IAsyncStateMachineBox, Boolean) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/TaskContinuation.cs @ 795]
00007E1D2BFFE490 00007F2A0FA0648E System.Threading.Tasks.Task.RunContinuations(System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs @ 3456]
00007E1D2BFFE540 00007F2A0FAA772D System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib]].SetExistingTaskResult(System.Threading.Tasks.Task`1<System.Threading.Tasks.VoidTaskResult>, System.Threading.Tasks.VoidTaskResult) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskMethodBuilderT.cs @ 490]
00007E1D2BFFE580 00007F2A1023C4E8 RabbitMQ.Client.Impl.InboundFrame+<ReadFromPipeAsync>d__14.MoveNext()
00007E1D2BFFE760 00007F2A0FA05754 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 179]
00007E1D2BFFE7C0 00007F2A10254D1B System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[RabbitMQ.Client.Impl.InboundFrame+<ReadFromPipeAsync>d__14, RabbitMQ.Client]].MoveNext(System.Threading.Thread) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskMethodBuilderT.cs @ 368]
00007E1D2BFFE810 00007F2A1023D052 System.IO.Pipelines.StreamPipeReader+<<ReadInternalAsync>g__Core|40_0>d.MoveNext() [/_/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/StreamPipeReader.cs @ 307]
00007E1D2BFFE930 00007F2A0FA05754 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 179]
00007E1D2BFFE990 00007F2A101806C4 System.Net.Sockets.SocketAsyncEventArgs.TransferCompletionCallbackCore(Int32, System.Memory`1<Byte>, System.Net.Sockets.SocketFlags, System.Net.Sockets.SocketError) [/_/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEventArgs.Unix.cs @ 102]
00007E1D2BFFE9B0 00007F2A0FAAC4F2 System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute()
00007E1D2BFFEA70 00007F2A0FAA0838 System.Threading.ThreadPoolWorkQueue.Dispatch() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs @ 989]
00007E1D2BFFEAE0 00007F2A133C66CD System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.NonBrowser.cs @ 102]
00007E1D2BFFECD0 00007f2a8469fe07 [DebuggerU2MCatchHandlerFrame: 00007e1d2bffecd0]

DenisMayorko · 2025-02-20T14:45:19Z

Tried this fix, and now those leaks are no longer detected.

public static IntervalList? Merge(IntervalList? x, IntervalList? y)
{
    if (x is null)
    {
        return y;
    }
    if (y is null)
    {
        return x;
    }

    if (x.Start > y.Start)
    {
        (x, y) = (y, x);
    }

    Debug.Assert(x.End != y.Start);

    // We now have x, y non-null and x.End < y.Start.

    if (y.Start <= x.End + 1)
    {
        x.End = Math.Max(x.End, y.End);
        x.Next = Merge(x.Next, y.Next);
        return x;
    }

    // y belongs in the tail of x.

    x.Next = Merge(x.Next, y);
    return x;
}

lukebakken · 2025-02-20T15:08:38Z

@DenisMayorko - can you reproduce the locks every time?

DenisMayorko · 2025-02-20T15:54:56Z

@DenisMayorko - can you reproduce the locks every time?

Yes, the application gradually accumulates locks with each startup. Preventing the arrival of new messages does not reduce the amount.

lukebakken · 2025-02-20T16:02:25Z

@DenisMayorko - do you see the same symptoms as initially reported in this issue - what appears to be a deadlock in CloseAsync? My guess is yes based on the output you provided but I'd like to be sure.

lukebakken · 2025-02-20T16:12:19Z

@DenisMayorko - let's continue discussion here: #1784

DenisMayorko · 2025-02-20T19:27:00Z

@DenisMayorko - do you see the same symptoms as initially reported in this issue - what appears to be a deadlock in CloseAsync? My guess is yes based on the output you provided but I'd like to be sure.

Hmm… I see a large number of ShutdownEvent and a corresponding deadlock, an increase in the number of timers, etc., but I don’t see any mentions of Close or Dispose methods in the StackTrace. I’m starting to doubt that the issues are related, although the problem also seems to occur when shutting down the channel, judging by the call stack.

lukebakken · 2025-02-26T18:09:24Z

@Andersso @ZajacPiotr98 @DenisMayorko -

Please upgrade to this version! Thank you for all of your debugging!

https://www.nuget.org/packages/RabbitMQ.Client/7.1.1

Andersso added the bug label Dec 19, 2024

lukebakken self-assigned this Dec 19, 2024

lukebakken added this to the 7.1.0 milestone Dec 19, 2024

ZajacPiotr98 mentioned this issue Dec 23, 2024

ObjectDisposedException when channel is closed by RabbitMQ due to a channel exception #1749

Closed

lukebakken added a commit that referenced this issue Jan 22, 2025

Fix very rare deadlock

c48822c

Fixes #1751 Attempt to fix deadlock by waiting on channel dispatcher first, then channel reader.

lukebakken mentioned this issue Jan 22, 2025

Fix very rare deadlock #1771

Merged

lukebakken added a commit that referenced this issue Feb 5, 2025

Fix very rare deadlock

6a627dd

Fixes #1751 Attempt to fix deadlock by waiting on channel dispatcher first, then channel reader.

lukebakken added a commit that referenced this issue Feb 13, 2025

Fix rare deadlock, second try

2c0b7ee

Fixes #1751 See if not awaiting `_reader.Completion` fixes the issue.

lukebakken mentioned this issue Feb 13, 2025

Fix rare deadlock, second try #1782

Merged

lukebakken closed this as completed in #1782 Feb 19, 2025

lukebakken reopened this Feb 20, 2025

lukebakken modified the milestones: 7.1.0, 7.1.1, 7.0.0 Feb 20, 2025

lukebakken mentioned this issue Feb 20, 2025

Bug in IntervalList causes leak in locks. #1784

Closed

lukebakken closed this as completed Feb 20, 2025

lukebakken mentioned this issue Feb 20, 2025

Port IntAllocator from rabbitmq-java-client #1786

Closed

lukebakken mentioned this issue Mar 17, 2025

AutorecoveringConnection Memory Leak #1808

Closed

Intermittent deadlock when closing a channel using CloseAsync in 7.x #1751

Intermittent deadlock when closing a channel using CloseAsync in 7.x #1751

Comments

Andersso commented Dec 19, 2024

Describe the bug

Reproduction steps

Expected behavior

Additional context

lukebakken commented Dec 19, 2024

Andersso commented Dec 19, 2024

michaelklishin commented Dec 19, 2024

lukebakken commented Dec 19, 2024 • edited Loading

michaelklishin commented Dec 19, 2024

lukebakken commented Dec 26, 2024 • edited Loading

Andersso commented Jan 13, 2025 • edited Loading

lukebakken commented Jan 16, 2025

lukebakken commented Jan 17, 2025

Andersso commented Jan 21, 2025

ZajacPiotr98 commented Jan 21, 2025

lukebakken commented Jan 21, 2025

lukebakken commented Jan 22, 2025

Andersso commented Jan 23, 2025

lukebakken commented Jan 23, 2025

ZajacPiotr98 commented Jan 24, 2025 • edited Loading

lukebakken commented Jan 24, 2025

lukebakken commented Jan 24, 2025

lukebakken commented Jan 24, 2025

Andersso commented Jan 24, 2025

lukebakken commented Jan 24, 2025

Andersso commented Jan 31, 2025

lukebakken commented Jan 31, 2025

lukebakken commented Feb 13, 2025

Andersso commented Feb 14, 2025

NathanielAB commented Feb 17, 2025

michaelklishin commented Feb 17, 2025

lukebakken commented Feb 17, 2025

Andersso commented Feb 18, 2025

lukebakken commented Feb 18, 2025

lukebakken commented Feb 19, 2025

DenisMayorko commented Feb 20, 2025

DenisMayorko commented Feb 20, 2025

lukebakken commented Feb 20, 2025

DenisMayorko commented Feb 20, 2025

lukebakken commented Feb 20, 2025

lukebakken commented Feb 20, 2025

DenisMayorko commented Feb 20, 2025

lukebakken commented Feb 26, 2025

lukebakken commented Dec 19, 2024 •

edited

Loading

lukebakken commented Dec 26, 2024 •

edited

Loading

Andersso commented Jan 13, 2025 •

edited

Loading

ZajacPiotr98 commented Jan 24, 2025 •

edited

Loading