Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException: Cannot invoke "zmq.IMailbox.send(zmq.Command)" because "this.slots[tid]" is null #983

Open
inad9300 opened this issue Mar 26, 2024 · 4 comments

Comments

@inad9300
Copy link

inad9300 commented Mar 26, 2024

Using version 0.5.4, after having interrupted a thread in which an open subscription was running, followed by calls to ZMonitor.close() and ZContext.close(), I got the following NPE (the first exception is to help understand the context):

Exception in thread "Thread-244" org.zeromq.ZMQException: Errno 4 : Interrupted function
        at org.zeromq.ZMQ$Socket.mayRaise(ZMQ.java:3732)
        at org.zeromq.ZMQ$Socket.recv(ZMQ.java:3530)
        at org.zeromq.ZMQ$Socket.recv(ZMQ.java:3502)
        ...
        at java.base/java.lang.Thread.run(Thread.java:840)

...

java.lang.NullPointerException: Cannot invoke "zmq.IMailbox.send(zmq.Command)" because "this.slots[tid]" is null
        at zmq.Ctx.sendCommand(Ctx.java:615)
        at zmq.ZObject.sendCommand(ZObject.java:410)
        at zmq.ZObject.sendPipeTermAck(ZObject.java:260)
        at zmq.pipe.Pipe.processPipeTermAck(Pipe.java:421)
        at zmq.ZObject.processCommand(ZObject.java:91)
        at zmq.Command.process(Command.java:79)
        at zmq.SocketBase.processCommands(SocketBase.java:1198)
        at zmq.SocketBase.inEvent(SocketBase.java:1365)
        at zmq.poll.Poller.run(Poller.java:276)
        at java.base/java.lang.Thread.run(Thread.java:840)

It is worth noting that this exception is a rare occurrence, having shown up only after many similar executions of the same code.

@fbacchella
Copy link
Contributor

Did you try with release 0.6.0 ?

@inad9300
Copy link
Author

I confirm this exception can occur in 0.6.0 (this happens sometimes in a scenario like the one described in #984; both issues may be due to the same underlying problem):

Exception in thread "ZMonitor-Sub[56]" java.lang.NullPointerException: Cannot invoke "zmq.IMailbox.send(zmq.Command)" because "this.slots[tid]" is null
        at zmq.Ctx.sendCommand(Ctx.java:662)
        at zmq.ZObject.sendCommand(ZObject.java:410)
        at zmq.ZObject.sendReapAck(ZObject.java:290)
        at zmq.SocketBase.processCommands(SocketBase.java:1183)
        at zmq.SocketBase.send(SocketBase.java:854)
        at zmq.SocketBase.send(SocketBase.java:792)
        at org.zeromq.ZMQ$Socket.send(ZMQ.java:3445)
        at org.zeromq.ZMQ$Socket.send(ZMQ.java:3359)
        at org.zeromq.ZStar$Plateau.run(ZStar.java:503)
        at org.zeromq.ZThread$ShimThread.run(ZThread.java:57)

@pmconrad
Copy link

I had an NPE with version 0.5.4 at the same line as in the OP, but with a different stack trace. It turned out that due to a race condition I was calling CancellationToken::cancel after the socket-owning thread had closed the socket, so the fault was in my code after all (OTOH, should cancel() on a closed socket really throw a NPE?).

That said, I'm not sure if the cancel() code really is correct. The real issue here is that access to slot[tid] is not synchronized properly AFAICS. I guess that's the main reason why the documentation clearly says that a socket should only ever be used by the thread that created it, but the cancellation token deliberately breaks that thread boundary and therefore requires proper synchronization.

@trevorbernard
Copy link
Member

trevorbernard commented Jul 16, 2024

I had an NPE with version 0.5.4 at the same line as in the OP, but with a different stack trace. It turned out that due to a race condition I was calling CancellationToken::cancel after the socket-owning thread had closed the socket, so the fault was in my code after all (OTOH, should cancel() on a closed socket really throw a NPE?).

That said, I'm not sure if the cancel() code really is correct. The real issue here is that access to slot[tid] is not synchronized properly AFAICS. I guess that's the main reason why the documentation clearly says that a socket should only ever be used by the thread that created it, but the cancellation token deliberately breaks that thread boundary and therefore requires proper synchronization.

If that's the case then cancel() usage should be discouraged and deprecated for reasons you described. Better would be to use a pattern that's officially supported. E.g. Send a shutdown command to the socket/thread via the same socket or a different command channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants