Segmentation fault in embedded NodeJS when exiting #43914

viferga · 2022-07-05T13:06:44Z

viferga
Jul 5, 2022

Version

v12.21.0 | v14.x | v15.x

Platform

Linux a8e75a23e0fd 5.11.0-49-generic #55-Ubuntu x86_64 GNU/Linux

Subsystem

node.h / node internals

What steps will reproduce the bug?

The setup for reproducing the bug is really complex to implement so the best way to reproduce it is to clone the MetaCall repository and run the tests through docker (compose):

git clone https://github.com/metacall/core.git
cd core
./docker-compose.sh test &> output

Then open the output file and you will see different failed tests, some of them are false positives because some runtimes need to be recompiled to fully support sanitizers. You can check out the output of some node failed tests and you will see most of them are due to this bug.

How often does it reproduce? Is there a required condition?

I think this bug is related to threading because it does not happen always, it's basically unpredictable, but the failure point is always the same. Just to provide some context, what I am doing here is embedding NodeJS into a C/C++ application. In order to control when to delete the queues that force node to be open, I have a counter of the asynchronous handles opened. On each event loop iteration I check for them and if they reach the same amount of asynchronous handles as the ones that MetaCall needs to run, then I issue a destroy invoke that deletes all those queues, letting libuv to gracefully die.

What is the expected behavior?

It should finalize gracefully without generating a segmentation fault.

What do you see instead?

NodeJS finalizes with a segmentation fault.

==7578==ERROR: AddressSanitizer: SEGV on unknown address 0x00021a000062 (pc 0x7fef1c8d1f99 bp 0x7fef19fc6e10 sp 0x7fef19fc6dd8 T2)
==7578==The signal is caused by a READ memory access.
    #0 0x7fef1c8d1f99  (/usr/lib/x86_64-linux-gnu/libnode.so.72+0x7b4f99)
    #1 0x7fef1cc2df22 in v8::internal::GlobalHandles::PendingPhantomCallback::Invoke(v8::internal::Isolate*, v8::internal::GlobalHandles::PendingPhantomCallback::InvocationType) (/usr/lib/x86_64-linux-gnu/libnod
e.so.72+0xb10f22)
    #2 0x7fef1cc2dfac in v8::internal::GlobalHandles::InvokeSecondPassPhantomCallbacks() (/usr/lib/x86_64-linux-gnu/libnode.so.72+0xb10fac)
    #3 0x7fef1cc2e031 in v8::internal::GlobalHandles::InvokeSecondPassPhantomCallbacksFromTask() (/usr/lib/x86_64-linux-gnu/libnode.so.72+0xb11031)
    #4 0x7fef1c999ef4 in node::PerIsolatePlatformData::RunForegroundTask(std::unique_ptr<v8::Task, std::default_delete<v8::Task> >) (/usr/lib/x86_64-linux-gnu/libnode.so.72+0x87cef4)
    #5 0x7fef1c99ab75 in node::PerIsolatePlatformData::FlushForegroundTasksInternal() (/usr/lib/x86_64-linux-gnu/libnode.so.72+0x87db75)
    #6 0x7fef1c0fbe7c  (/usr/lib/x86_64-linux-gnu/libuv.so.1+0xee7c)
    #7 0x7fef1c10da22  (/usr/lib/x86_64-linux-gnu/libuv.so.1+0x20a22)
    #8 0x7fef1c0fc713 in uv_run (/usr/lib/x86_64-linux-gnu/libuv.so.1+0xf713)
    #9 0x7fef1c8c92df in node::Environment::CleanupHandles() (/usr/lib/x86_64-linux-gnu/libnode.so.72+0x7ac2df)
    #10 0x7fef1c8c968b in node::Environment::RunCleanup() (/usr/lib/x86_64-linux-gnu/libnode.so.72+0x7ac68b)
    #11 0x7fef1c87d620 in node::FreeEnvironment(node::Environment*) (/usr/lib/x86_64-linux-gnu/libnode.so.72+0x760620)
    #12 0x7fef1c968cc8 in node::NodeMainInstance::Run() (/usr/lib/x86_64-linux-gnu/libnode.so.72+0x84bcc8)
    #13 0x7fef1c8eb122 in node::Start(int, char**) (/usr/lib/x86_64-linux-gnu/libnode.so.72+0x7ce122)
    #14 0x7fef1e440bb3 in node_loader_impl_thread /usr/local/metacall/source/loaders/node_loader/source/node_loader_impl.cpp:4175
    #15 0x7fef24abdd7f in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7d7f)
    #16 0x7fef249d776e in clone (/lib/x86_64-linux-gnu/libc.so.6+0xfa76e)

Additional information

I know this is a complex bug that probably is not related to NodeJS itself, but I think this is a valid use case and it will be interesting if you can give me some hints in order to solve this accordingly. My current system for detecting asynchronous handles is the only way that I have found to finalize NodeJS gracefully, because the embedding API does not have this implemented. So I ended up doing this kind of hack in order to end the event loop properly, but still seems there's some race condition. Thanks for your time.

If you want to review the methodology I have used, there's some interesting parts here:

This is the function that tires to destroy:
https://github.com/metacall/core/blob/af94689f9ba64f69af7eabba8a7097debcee80f5/source/loaders/node_loader/source/node_loader_impl.cpp#L5161

This is the part that checks if there's async handles, and otherwise it hooks into the event loop for doing the check of the async handles on each iteration:
https://github.com/metacall/core/blob/af94689f9ba64f69af7eabba8a7097debcee80f5/source/loaders/node_loader/source/node_loader_impl.cpp#L4870

And from here:
https://github.com/metacall/core/blob/af94689f9ba64f69af7eabba8a7097debcee80f5/source/loaders/node_loader/source/node_loader_impl.cpp#L4803

..to here:
https://github.com/metacall/core/blob/af94689f9ba64f69af7eabba8a7097debcee80f5/source/loaders/node_loader/source/node_loader_impl.cpp#L4856

There's the hooks for the event loop, I have two hooks, in the prepare and check stage.

If everything goes well, all the async queues are destroyed here:
https://github.com/metacall/core/blob/af94689f9ba64f69af7eabba8a7097debcee80f5/source/loaders/node_loader/source/node_loader_impl.cpp#L4997

Sorry for the spaghetti code, this will be eventually refactored. Even if the code is bad, it has a good test suite with valgrind and sanitizers so it can be refactored without fear in the future. My workforce is limited right now.

bnoordhuis · 2022-07-20T15:11:16Z

bnoordhuis
Jul 20, 2022
Collaborator

I'll convert this to a discussion, you'll probably get more response that way than when it's languishing here as a "probably not our bug" type of issue.

1 reply

viferga Jul 20, 2022
Author

Thanks, maybe it's not your fault but basically this happens because I am using node for something it was not designed for. So I am basically hacking around NodeJS.

It would be great if someone can point how to debug this race condition. I suppose Node team won't be interested in allowing this use case, so anyways I will have to do this kind of hacks in order to make it work.

mhdawson · 2022-07-22T21:03:17Z

mhdawson
Jul 22, 2022
Maintainer

Are you using Node.js as a shared library? I ask because I know that has been done successfully and there is some doc on experimental support for building Node.js as a shared library in - https://github.com/nodejs/node/blob/main/doc/contributing/maintaining-shared-library-support.md

1 reply

viferga Aug 3, 2022
Author

@mhdawson I am. In fact I am even distributing it in multiple platforms, this is windows for example: https://github.com/metacall/node.dll

The problem is related to the destruction of NodeJS. This is a project which embeds NodeJS as library, you can check out more implementation details here: https://github.com/metacall/core

mhdawson · 2022-08-04T21:03:04Z

mhdawson
Aug 4, 2022
Maintainer

@viferga k, just wanted to understand if you were leveraging what was already there or not. I doubt somebody is going to have time to install/debug the full metcall setup.

If I were you I'd start by removing functionality/parts you might be able to narrow down a bit more specifically what is contributing to the crash. ie I think based on past experience with people using the shared library it will be possible to embedded, start run a script and then shutdown. If you incrementally remove parts until you get back to that, along the way you might get more insight into what specifically is the problem.

4 replies

viferga Aug 8, 2022
Author

What's the setup C++ contributors from NodeJS use to debug race conditions?

mhdawson Aug 19, 2022
Maintainer

I'm not aware of anything in particular to race conditions specifically, but https://github.com/nodejs/node/blob/main/doc/contributing/investigating-native-memory-leaks.md has some info use the debugger, compilling with debug symbols etc. You might might use the debugger along with watches on certain memory locations.

Valgrind might also be useful in terms of finding use-after-frees which can sometimes be related to race conditions. You do need to limit what you run under it thought as it greatly slows things down.

bnoordhuis Aug 20, 2022
Collaborator

Valgrind is a good suggestion. TSan (Thread Sanitizer) is another way. Search common.gypi for -fsanitize=address and add or replace with -fsanitize=thread, then run ./configure --enable-asan.

We should probably add a separate flag for that. PR welcome. :-)

viferga Aug 23, 2022
Author

@mhdawson @bnoordhuis Thanks you so much for your help, I will try to review it with those flags enabled.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node.js

Segmentation fault in embedded NodeJS when exiting #43914

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Node.js

Segmentation fault in embedded NodeJS when exiting #43914

viferga Jul 5, 2022

Version

Platform

Subsystem

What steps will reproduce the bug?

How often does it reproduce? Is there a required condition?

What is the expected behavior?

What do you see instead?

Additional information

Replies: 3 comments · 6 replies

bnoordhuis Jul 20, 2022 Collaborator

viferga Jul 20, 2022 Author

mhdawson Jul 22, 2022 Maintainer

viferga Aug 3, 2022 Author

mhdawson Aug 4, 2022 Maintainer

viferga Aug 8, 2022 Author

mhdawson Aug 19, 2022 Maintainer

bnoordhuis Aug 20, 2022 Collaborator

viferga Aug 23, 2022 Author

viferga
Jul 5, 2022

Replies: 3 comments 6 replies

bnoordhuis
Jul 20, 2022
Collaborator

viferga Jul 20, 2022
Author

mhdawson
Jul 22, 2022
Maintainer

viferga Aug 3, 2022
Author

mhdawson
Aug 4, 2022
Maintainer

viferga Aug 8, 2022
Author

mhdawson Aug 19, 2022
Maintainer

bnoordhuis Aug 20, 2022
Collaborator

viferga Aug 23, 2022
Author