Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutorecoveringConnection Memory Leak #1808

Closed
NathanielAB opened this issue Mar 11, 2025 · 13 comments · Fixed by #1814
Closed

AutorecoveringConnection Memory Leak #1808

NathanielAB opened this issue Mar 11, 2025 · 13 comments · Fixed by #1814
Assignees
Labels
Milestone

Comments

@NathanielAB
Copy link

Describe the bug

Hi we have noticed a large amount of TimerQueues when using RabbitMq latest version 7.1.1, leading to an increase in memory usage

Image
Image

Reproduction steps

Start multiple consumers in parallel threads and by time the memory usage increases

Expected behavior

No memory leaks when connecting to multiple consumers

Additional context

No response

@NathanielAB NathanielAB changed the title AutorecoveryConnection Memory Leak AutorecoveringConnection Memory Leak Mar 11, 2025
@lukebakken lukebakken added this to the 7.1.2 milestone Mar 11, 2025
@lukebakken lukebakken self-assigned this Mar 11, 2025
@lukebakken
Copy link
Contributor

lukebakken commented Mar 11, 2025

@NathanielAB - it would be greatly appreciated if you took the time to do the following:

  • Provide code to reproduce the issue. Yes, you described how ("Start multiple consumers in parallel threads and by time the memory usage increases"), but to ensure I reproduce the issue the same way, I would appreciate code I can clone, compile and run. A single sentence can be easily mis-interpreted.
  • Describe the tool you're using to track allocations, and how to use it. I can't tell from your screenshot.
  • If you can, provide a PR to address the issue.

@lukebakken lukebakken modified the milestones: 7.1.2, 7.1.3 Mar 17, 2025
@lukebakken
Copy link
Contributor

lukebakken commented Mar 17, 2025

@NathanielAB I can't make progress without more information.

Related issues / comments:

So, it seems that #1784 didn't really fix the issue around timer leaks.

Pinging people who worked on this issue before to see if they can contribute:

@ZajacPiotr98
@DenisMayorko

@NathanielAB
Copy link
Author

@NathanielAB - it would be greatly appreciated if you took the time to do the following:

  • Provide code to reproduce the issue. Yes, you described how ("Start multiple consumers in parallel threads and by time the memory usage increases"), but to ensure I reproduce the issue the same way, I would appreciate code I can clone, compile and run. A single sentence can be easily mis-interpreted.
  • Describe the tool you're using to track allocations, and how to use it. I can't tell from your screenshot.
  • If you can, provide a PR to address the issue.

Hi sorry for the late reply, I will try and produce a small POC console application.

The tool I used was visual studio I got a memory dump (.gcdump) and analyzed it there. It shows the root (multiple references nested within each other of TimerQueue) and the referenced types, the types generating the TimerQueues, mostly RabbitMq. It seems to be happening when there is a recovery of a connection, because the specific method causing these instances is RecoverConnectionAsync.

@lukebakken
Copy link
Contributor

Thank you @NathanielAB. It would also be great to know if you're using .NET core or a netstandard2.0-TFM environment.

@NathanielAB
Copy link
Author

Thank you @NathanielAB. It would also be great to know if you're using .NET core or a netstandard2.0-TFM environment.

.NET Core

@lukebakken
Copy link
Contributor

@NathanielAB if you aren't able to provide code to reproduce the issue, could you log all first-chance-exceptions when this problem presents itself?

I think what may be happening is that an exception is being thrown here:

https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/main/projects/RabbitMQ.Client/Impl/AutorecoveringConnection.cs#L297-L298

...and this prevents this cancellation token source from being disposed:

https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/main/projects/RabbitMQ.Client/Impl/AutorecoveringConnection.cs#L303

I'm making some code changes and can have a 7.1.3 alpha release available soon, as well.

@lukebakken
Copy link
Contributor

@NathanielAB - you can get 7.1.3-alpha.0 from here:

https://www.myget.org/feed/rabbitmq-dotnet-client/package/nuget/RabbitMQ.Client

It includes code from PR #1814.

@lukebakken
Copy link
Contributor

@NathanielAB any updates from you? I'd like to wrap this issue up this week, thanks.

@NathanielAB
Copy link
Author

Hi @lukebakken we have ran the alpha package and seems to have helped with the leak, thank you!

@lukebakken
Copy link
Contributor

we have ran the alpha package and seems to have helped with the leak,

OK, by "helped" you mean it's fixed, correct?

Could you please log first-chance exceptions? I would like to know which exception is causing the existing code to skip .Dispose() here:

https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/main/projects/RabbitMQ.Client/Impl/AutorecoveringConnection.cs#L297-L298

@NathanielAB
Copy link
Author

we have ran the alpha package and seems to have helped with the leak,

OK, by "helped" you mean it's fixed, correct?

Could you please log first-chance exceptions? I would like to know which exception is causing the existing code to skip .Dispose() here:

https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/main/projects/RabbitMQ.Client/Impl/AutorecoveringConnection.cs#L297-L298

Yes seems like it has been fixed. Will try to investigate regarding first chance exceptions

@NathanielAB
Copy link
Author

Im only seeing the following, nothing in the Dispose method

FirstChanceException event raised in Project: System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'RabbitMQ.Client.Framing.AutorecoveringConnection'.
   at RabbitMQ.Client.Framing.AutorecoveringConnection.<ThrowIfDisposed>g__ThrowDisposed|76_0()

and

FirstChanceException event raised in Project: System.OperationCanceledException: The operation was canceled.
   at System.Threading.CancellationToken.ThrowOperationCanceledException()
   at System.Threading.CancellationToken.ThrowIfCancellationRequested()
   at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|285_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
   at RabbitMQ.Client.Impl.SocketFactory.ConnectUsingAddressFamilyAsync(IPEndPoint endpoint, Func`2 socketFactory, AddressFamily family, TimeSpan connectionTimeout, CancellationToken cancellationToken)
   at RabbitMQ.Client.Impl.SocketFactory.OpenAsync(AmqpTcpEndpoint amqpTcpEndpoint, Func`2 socketFactory, TimeSpan connectionTimeout, CancellationToken cancellationToken)
   at RabbitMQ.Client.Impl.SocketFrameHandler.CreateAsync(AmqpTcpEndpoint amqpTcpEndpoint, Func`2 socketFactory, TimeSpan connectionTimeout, CancellationToken cancellationToken)
   at RabbitMQ.Client.ConnectionFactory.CreateFrameHandlerAsync(AmqpTcpEndpoint endpoint, CancellationToken cancellationToken)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()

@lukebakken
Copy link
Contributor

Thank you! I will release version 7.1.3 today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants