Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memoryleak in ResponseSocket #788

Open
mvburgh opened this issue Mar 28, 2019 · 30 comments
Open

Possible memoryleak in ResponseSocket #788

mvburgh opened this issue Mar 28, 2019 · 30 comments
Labels

Comments

@mvburgh
Copy link

mvburgh commented Mar 28, 2019

image

my code is already minimalistic; only a RequestSocket and ResponseSocket.
I'm creating a RequestSocket per few seconds based on the requests from a Web Controller

Originally posted by @mvburgh in #737 (comment)

@mvburgh mvburgh changed the title ![image](https://user-images.githubusercontent.com/8941351/54089769-d7f1fc80-436c-11e9-837d-45cd2ab83508.png) Possible memoryleak in ResponseSocket Mar 28, 2019
@KamranShahid
Copy link

Any one working on it?

@mvburgh
Copy link
Author

mvburgh commented Apr 24, 2019

Not me personally. I had a quick glance at the code but could not see any possible cause in the meanwhile.

@Svisstack
Copy link
Contributor

image

@mvburgh Did you made any progress since then on that?

@somdoron
Copy link
Member

I will take a look next week, on vacation this week.

Code that reproduce this will help

@somdoron
Copy link
Member

Also @Svisstack, can you check with socket.Options.Linger set to zero?
I suspect it might be the issue.

@Svisstack
Copy link
Contributor

Svisstack commented Aug 23, 2019

image

Code which causing this is very simple just the PublisherSocker who had connected ~10 subscribers:

image

I'm not sure it's the same bug as the initial bug was related to the ResponseSocket.

On the Subscriber side, this bug does not exist.

From the memory dumps, we can see that there are probably too many Pub+PubSession objects and along with that Pipe, YPipe, YQueue, but all the memory is allocated on the YQueue+Chunk

@Svisstack
Copy link
Contributor

@somdoron I confirm that Linger is equal to the {00:00:00} at the end of the Start() function in the snippet provided above @(Start(): return port;)

publisher.Options
{NetMQ.SocketOptions}
Affinity: 0
Backlog: 100
DelayAttachOnConnect: false
DisableTimeWait: false
Endian: Big
IPv4Only: true
Identity: null
LastEndpoint: "tcp://0.0.0.0:61584"
LastPeerRoutingId: null
Linger: {00:00:00}
MaxMsgSize: -1
MulticastHops: 1
MulticastRate: 100
MulticastRecoveryInterval: {00:00:10}
PgmMaxTransportServiceDataUnitLength: 'publisher.Options.PgmMaxTransportServiceDataUnitLength' threw an exception of type 'NetMQ.InvalidException'
ReceiveBuffer: 0
ReceiveHighWatermark: 1000
ReceiveLowWatermark: 0
ReceiveMore: false
ReconnectInterval: {00:00:00.1000000}
ReconnectIntervalMax: {00:00:00}
SendBuffer: 0
SendHighWatermark: 0
SendLowWatermark: 0
TcpKeepalive: false
TcpKeepaliveIdle: {-00:00:00.0010000}
TcpKeepaliveInterval: {-00:00:00.0010000}

@somdoron
Copy link
Member

Do the subscribers come and go frequently?
It seems like linger set to zero or few seconds will solve

@somdoron
Copy link
Member

Thanks, does the subscribers come and go?
Can you check who is referencing the PubSession?

@Svisstack
Copy link
Contributor

@somdoron Take a look at the incoming reference chart.

image

In my use-case, the subscribers should not come and go frequently, but there could be a bug on my side causing the come and go and I analyzing that at the moment.

@Svisstack
Copy link
Contributor

I'm using the 4.0.0.239-pre version.

@somdoron
Copy link
Member

Can you send me the report? Which application are you using?

somdoron AT gmail DOT com

I'm not in front of a computer this week, but I will take a look beginning of next week.

@Svisstack
Copy link
Contributor

@somdoron No problem, I actually found interesting fact - the leak is visible only on nodes on which there is no communication activity between Publisher and Subscriber (silence), it's ok from the application perspective.

@somdoron
Copy link
Member

Can you extend this list:

https://user-images.githubusercontent.com/864295/63584247-d162e480-c59c-11e9-8480-9f4bd1532964.png

I want to see the root object causing the memory leak

@somdoron
Copy link
Member

Also, can you show the incoming reference to the pipe class?

@Svisstack
Copy link
Contributor

image

It looks like the Pipe is also referenced to the Pub+Sub, however, I don't know it's the same instance.

@Svisstack
Copy link
Contributor

Svisstack commented Aug 23, 2019

image

@somdoron paths to the root.

@somdoron
Copy link
Member

Funny, I just figured it out myself.

At least in this case it is not a bug.

Once one message will be sent everything will be freed.

From the memory picture I saw pending command holding the reference and causing the issue.

To avoid the issue you can call once in a while the socket.Poll with zero timespan. This will also process pending commands.

Anyway, I think you have a case where subscribers come and go frequently.

@Svisstack
Copy link
Contributor

Svisstack commented Aug 23, 2019

Thanks. @somdoron, I appreciate the effort and in-depth knowledge of this project.

Have a nice time on the vacations.

Yes, I could have the come and go issue looking at the netstat.

@mvburgh
Copy link
Author

mvburgh commented Aug 23, 2019

Do the subscribers come and go frequently?
It seems like linger set to zero or few seconds will solve

In my case they come and go every few seconds as they are web api requests.

@somdoron
Copy link
Member

@mvburgh, i will try to reproduce next week.
Only request response sockets? Are you using a proxy? Do you happen to have memory profiler report?

@mvburgh
Copy link
Author

mvburgh commented Aug 24, 2019

No proxy here; it runs between a windows service and website for me.
I dont have a profile report at hand.

@KamranShahid
Copy link

KamranShahid commented Sep 14, 2019

I have majordomo pattern implemented with broker in one windows service (.net core 2.1) and worker app resides on another windows service (.net core 2.1).
https://github.com/NetMQ/Samples/tree/master/src/Majordomo
In worker windows service there are different 16/17 type of workers . each type of worker can have multiple instances. What i were seeing is when i am assigning 10 number of worker against each type my broker application memory increases time to time.

It probably is due to default heartbeat time. Now I am trying setting default heartbeat time at worker side as 10 seconds while on broker 15 seconds.

Memory profiling is bit difficult in my case as i have setup workers and broker in different applications for future scalability perspective

@somdoron
Copy link
Member

@ReneOlsthoorn during the time the memory increase to 3Gb are you still sending messages? can it be that it happens only during silence times?

@somdoron
Copy link
Member

somdoron commented Sep 15, 2019 via email

@somdoron
Copy link
Member

Can you share a memory profiler snapshot? that will help alot

@ReneOlsthoorn
Copy link

Doron and others, the memory-leak I was investigating was in our own product. My apologies for posting when it was not clear where the problem came from. I've deleted my comments, so new users don't get a wrong impression about NetMQ.
Keep up the good work!

@mvburgh
Copy link
Author

mvburgh commented Dec 12, 2019

@somdoron I have spent some more time with this last week, but both setting the linger to 0 and the socket.Poll() every now and then give no better result. The increase stays in YQueue+Chunk and does not get freed over time.

@manu-st
Copy link

manu-st commented Nov 18, 2020

We are also experiencing something similar in our app. We do not see a leak when we have one server and one client communicating via Request/Response sockets. However, if another client tries to connect to the server while it is already serving another client, the server will leak memory. The way we have it work is that the server can only serve one client, so when a new client connects, it sends a message to tell the client that it cannot communicate and that's pretty much it. Once the client receives that message it disconnects.

@stale
Copy link

stale bot commented Apr 17, 2022

This issue has been automatically marked as stale because it has not had activity for 365 days. It will be closed if no further activity occurs within 56 days. Thank you for your contributions.

@stale stale bot added the stale label Apr 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants