Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REQ/RESP with Python zmq as worker causes a 80~90 seconds initial delay on linux but works fine on Windows #934

Open
rafamerlin opened this issue Sep 7, 2020 · 1 comment
Labels

Comments

@rafamerlin
Copy link

Environment

NetMQ Version:  4.0.1.4 
Operating System:  Windows 10 + Linux Docker/WSL2
.NET Version:  3.1.401

Expected behaviour

I would expect something that's working fine on Windows would have the same behaviour on Linux or on a docker image.

Actual behaviour

When running on linux and controlling the influx of requests by using a concurrentqueue I'm having some issues of a very long delay to start or sometimes not starting depending on the amount of messages I'm sending to the queue.

Steps to reproduce the behaviour

First of all, this is my first time using ZeroMQ so I may be missing something critical here.

So, my idea is to have a Pool of workers, each using a different TCP port and spinning a Python process on that same port. These workers will run while the dotnet container runs. So I use a BlockingCollection on a ConcurrentQueue so each worker would get the next request when free, get the response and signal to the message in the blocking collection that a response was added. I create all these messages using a class that will have a ManualResetEventSlim to signal that the answer was added.

First I tried a very simple scenario, that's not using queues or anything, just spinning 3 workers and doing a bunch of work on these 3 workers.

To run these examples on docker, just:
dotnet build -c Release
Then
docker build -t test_netmq .
And
docker run -it test_netmq

So here's the simple example: https://github.com/rafamerlin/netmq_test/tree/simple_working

Then I've implemented a queue (I'm not catering for errors at all here, it's just for the sake of the example):
https://github.com/rafamerlin/netmq_test/tree/not_working

If you check this file: https://github.com/rafamerlin/netmq_test/blob/not_working/Program.cs

If I change the Enumerable.Range(0, 100) to Enumerable.Range(0, 1) it will work almost instantaneously, but if I bump it to 100 it will take between 80 and 90 seconds for me to send the "HELLO" message to the python consumer and for it to return an "READY" back.

I think the reason may be that for each message I run a Task.Run() and I'm using the BlockingCollection to process one by one. However, it doesn't make a lot of sense as it seems to be working fine on Windows.

So this is an example of what happens when I run the code as is (with the 100 numbers):

 docker run test_netmq
20000 sending hello
20001 sending hello
20000 sent hello in 62ms
20001 sent hello in 49ms
20000 received ready in 80739ms
Received READY on Client 20000
Worker 20000 has processed Request 0
Worker 20000 has processed Request 1
Worker 20000 has processed Request 2
Worker 20000 has processed Request 3
Worker 20000 has processed Request 4
20001 received ready in 80731ms
...

After this happens, it will devour the messages, but the very first communication is always slow. If I add too many ClientWorkerPair by changing the number here pool.StartWorkerClientPair(2); it will sometimes just freeze and not do anything while also not triggering any exceptions.

What intrigues me the most about this, is that on Windows, this runs fine, this is the start of the console output equivalent to the docker one on windows running on 2 workers and 100 messages (to run on my windows at least I have to replace the "python3" to "python" (But it's still Python 3.8) in the WorkerClientPair.cs file:

20001 sending hello
20000 sending hello
20001 sent hello in 18ms
20000 sent hello in 18ms
20000 received ready in 568ms
Received READY on Client 20000
20001 received ready in 568ms
Received READY on Client 20001
...

So, am I doing something wrong or is this a known issue on linux? I would like to keep using the Tasks to get the results as I intend to use this on a message consumer that would consume multiple messages at the same time and when needed would hit this pool, so I would have to have a queue guaranteeing that first in will be the first out while having only one single WorkerClientPool being called by different threads. I've seen that the RequestSocket is not threadsafe but I was expecting that since I'm executing it from one single thread and this thread is the thing reading the queue I should be alright.

@stale
Copy link

stale bot commented Apr 17, 2022

This issue has been automatically marked as stale because it has not had activity for 365 days. It will be closed if no further activity occurs within 56 days. Thank you for your contributions.

@stale stale bot added the stale label Apr 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant