Replies: 3 comments 6 replies
-
I'll convert this to a discussion, you'll probably get more response that way than when it's languishing here as a "probably not our bug" type of issue. |
Beta Was this translation helpful? Give feedback.
-
Are you using Node.js as a shared library? I ask because I know that has been done successfully and there is some doc on experimental support for building Node.js as a shared library in - https://github.com/nodejs/node/blob/main/doc/contributing/maintaining-shared-library-support.md |
Beta Was this translation helpful? Give feedback.
-
@viferga k, just wanted to understand if you were leveraging what was already there or not. I doubt somebody is going to have time to install/debug the full metcall setup. If I were you I'd start by removing functionality/parts you might be able to narrow down a bit more specifically what is contributing to the crash. ie I think based on past experience with people using the shared library it will be possible to embedded, start run a script and then shutdown. If you incrementally remove parts until you get back to that, along the way you might get more insight into what specifically is the problem. |
Beta Was this translation helpful? Give feedback.
-
Version
v12.21.0 | v14.x | v15.x
Platform
Linux a8e75a23e0fd 5.11.0-49-generic #55-Ubuntu x86_64 GNU/Linux
Subsystem
node.h / node internals
What steps will reproduce the bug?
The setup for reproducing the bug is really complex to implement so the best way to reproduce it is to clone the MetaCall repository and run the tests through docker (compose):
Then open the output file and you will see different failed tests, some of them are false positives because some runtimes need to be recompiled to fully support sanitizers. You can check out the output of some node failed tests and you will see most of them are due to this bug.
How often does it reproduce? Is there a required condition?
I think this bug is related to threading because it does not happen always, it's basically unpredictable, but the failure point is always the same. Just to provide some context, what I am doing here is embedding NodeJS into a C/C++ application. In order to control when to delete the queues that force node to be open, I have a counter of the asynchronous handles opened. On each event loop iteration I check for them and if they reach the same amount of asynchronous handles as the ones that MetaCall needs to run, then I issue a destroy invoke that deletes all those queues, letting libuv to gracefully die.
What is the expected behavior?
It should finalize gracefully without generating a segmentation fault.
What do you see instead?
NodeJS finalizes with a segmentation fault.
Additional information
I know this is a complex bug that probably is not related to NodeJS itself, but I think this is a valid use case and it will be interesting if you can give me some hints in order to solve this accordingly. My current system for detecting asynchronous handles is the only way that I have found to finalize NodeJS gracefully, because the embedding API does not have this implemented. So I ended up doing this kind of hack in order to end the event loop properly, but still seems there's some race condition. Thanks for your time.
If you want to review the methodology I have used, there's some interesting parts here:
This is the function that tires to destroy:
https://github.com/metacall/core/blob/af94689f9ba64f69af7eabba8a7097debcee80f5/source/loaders/node_loader/source/node_loader_impl.cpp#L5161
This is the part that checks if there's async handles, and otherwise it hooks into the event loop for doing the check of the async handles on each iteration:
https://github.com/metacall/core/blob/af94689f9ba64f69af7eabba8a7097debcee80f5/source/loaders/node_loader/source/node_loader_impl.cpp#L4870
And from here:
https://github.com/metacall/core/blob/af94689f9ba64f69af7eabba8a7097debcee80f5/source/loaders/node_loader/source/node_loader_impl.cpp#L4803
..to here:
https://github.com/metacall/core/blob/af94689f9ba64f69af7eabba8a7097debcee80f5/source/loaders/node_loader/source/node_loader_impl.cpp#L4856
There's the hooks for the event loop, I have two hooks, in the prepare and check stage.
If everything goes well, all the async queues are destroyed here:
https://github.com/metacall/core/blob/af94689f9ba64f69af7eabba8a7097debcee80f5/source/loaders/node_loader/source/node_loader_impl.cpp#L4997
Sorry for the spaghetti code, this will be eventually refactored. Even if the code is bad, it has a good test suite with valgrind and sanitizers so it can be refactored without fear in the future. My workforce is limited right now.
Beta Was this translation helpful? Give feedback.
All reactions