Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyGILState_Ensure in one thread causes the thread-local storage of other threads to be GCed #130394

Open
kevin85421 opened this issue Feb 21, 2025 · 5 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-C-API type-bug An unexpected behavior, bug, or error

Comments

@kevin85421
Copy link

kevin85421 commented Feb 21, 2025

Bug report

Bug description:

# Download https://gist.github.com/kevin85421/4616545d3bed2ebcfe5d0a01bdfde3ff
g++ example.cc -I/usr/include/python3.12 -lpython3.12 -lpthread -o example
./example
  • The above script creates two C++ threads, default_pool and custom_pool, and calls Python code.
    • Phase 1 (void init_python_thread): Each thread
      • Calls PyGILState_Ensure()
      • Runs Python code that assigns the value 1 to the variable a and creates a thread-local state using threading.local().
      • Calls PyEval_SaveThread to release GIL.
    • Phase 2 (void release_gstate): Each thread
      • Calls PyEval_RestoreThread(*tstate) to acquire GIL and restore thread state.
      • Runs Python code to print a
      • Runs Python code to print thread-local state.
      • Calls PyGILState_Release

The thread who is the first thread to call PyGILState_Ensure() will fail to print thread-local state in step 2.

In the following example log,

  • default_pool calls PyGILState_Ensure() and writes a=1 and thread-local state.
  • custom_pool calls PyGILState_Ensure() and writes a=1 and thread-local state. => My current guess is that PyGILState_Ensure() here makes the thread-local state of default_pool be GCed.
[C++][08:21:37.740] Hello from the default_pool in init_python_thread
[C++][08:21:37.741] Hello from the custom_pool in init_python_thread
Hello from default_pool, a = 1!
[C++][08:21:37.744] Hello from the default_pool in release_gstate
Hello from custom_pool, a = 1!
a = 1 in release_gstate!
[C++][08:21:37.744] Hello from the custom_pool in release_gstate
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: '_thread._local' object has no attribute 'name'
a = 1 in release_gstate!
thread_local.name = custom_pool in release_gstate!

CPython versions tested on:

3.12

Operating systems tested on:

Linux

@kevin85421 kevin85421 added the type-bug An unexpected behavior, bug, or error label Feb 21, 2025
@kevin85421 kevin85421 changed the title PyGILState_Ensure in one thread causes the thread-local storage of other threads to be GCed [wip] PyGILState_Ensure in one thread causes the thread-local storage of other threads to be GCed Feb 21, 2025
@kevin85421 kevin85421 changed the title [wip] PyGILState_Ensure in one thread causes the thread-local storage of other threads to be GCed PyGILState_Ensure in one thread causes the thread-local storage of other threads to be GCed Feb 21, 2025
@ZeroIntensity
Copy link
Member

Hi! I'm not familiar with boost::asio::post, but your gist looks a little wrong. (Don't blame yourself, we've done a bad job of documenting how these functions work.)

This may seem a little counterintuitive, but the GIL can also be released randomly in-between execution of Python instructions, not just calls to PyEval_SaveThread. I'm seeing the following behavior based on the output:

  • Thread 1 (default_pool) starts, and thread_local is assigned to a threading.local() and the name attribute is set. The GIL is then released explicitly by PyEval_SaveThread.
  • Thread 2 (custom_pool) is now unblocked and takes the GIL, but completely overwrites the thread_local from before, losing the prior name attribute.
  • At this point, Python decides another thread should execute (based on the switch interval), and releases the GIL without having set the attribute yet.
  • release_gstate for thread 1 is now unblocked and gets the GIL, but since thread_local was blown away before, it doesn't have the name anymore, resulting in an AttributeError.
  • init_python_thread for thread 2 gets the GIL again and finishes up, then hands off the GIL to release_gstate.

That aside, I would suggest avoiding passing PyGILState_STATE and PyThreadState pointers up to the main function like you're currently doing. It's not wrong, but it's error-prone; if you accidentally use either of those in a different thread than where they were created, you'll get all sorts of hard-to-debug crashes.


I've seen several reports before due to PyGILState_Ensure inconsistencies. It would definitely be helpful to tell us where things need to be documented better, so we don't run into problems like these. Where were you reading, and what kind of information would be useful to you?

@picnixz picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Feb 21, 2025
@kevin85421
Copy link
Author

kevin85421 commented Feb 21, 2025

Hi @ZeroIntensity, thank you for the quick reply!

Thread 2 (custom_pool) is now unblocked and takes the GIL, but completely overwrites the thread_local from before, losing the prior name attribute.

This differs from my understanding of how threading.local() behaves. I expected that thread_local.name in default_pool would be stored somewhere like thread_local_dict[default_pool_thread_id]["name"], and the one for custom_pool would be stored somewhere like thread_local_dict[custom_pool_thread_id]["name"]. The following example is also able to show this behavior:

# Download https://gist.github.com/kevin85421/2d4fd5620ffdd7a1dfd36fd734fa7afe
python3 example.py

Worker-0 set value: Worker-0
Worker-1 set value: Worker-1
Worker-2 set value: Worker-2
Worker-0 read value: Worker-0
Worker-1 read value: Worker-1
Worker-2 read value: Worker-2

At this point, Python decides another thread should execute (based on the switch interval), and releases the GIL without having set the attribute yet.

My observation based on the following example is that the GIL doesn't implicitly switch. Instead, the thread holds the thread should call PyEval_SaveThread or PyGILState_Release explicitly. Do I misunderstand anything?

# Download: https://gist.github.com/kevin85421/d34df2fac0af85efe25b3c3e4bf0ceea
g++ switchinterval.cc -I/usr/include/python3.12 -lpython3.12 -lpthread -o switchinterval
./switchinterval

[C++][17:18:34.651] Sleep 10 seconds
[C++][17:18:34.651] Hello from the default_pool in init_python_thread
[C++][17:18:34.651] Hello from the custom_pool in init_python_thread
Hello from default_pool, a = 1!
sys.getswitchinterval() = 0.005
[C++][17:18:44.651] Hello from the default_pool in release_gstate
Hello from custom_pool, a = 1! <---------- `custom_pool` gets the GIL after `default_pool` calls `PyGILState_Release`
sys.getswitchinterval() = 0.005
[C++][17:18:44.651] Hello from the custom_pool in release_gstate
a = 1 in release_gstate!
a = 1 in release_gstate!
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: '_thread._local' object has no attribute 'name'
thread_local.name = custom_pool in release_gstate!

My current guess is that when custom_pool calls PyGILState_Ensure, it accidentally overwrites some data associated with default_pool. Consequently, the thread-local storage detects that the default_pool thread is out of scope and removes the key default_pool_thread_id from the thread-local dictionary.

The following link points to the function that cleans up data related to a thread when it is garbage-collected.

wrthread = ref(thread, thread_deleted)

@ZeroIntensity
Copy link
Member

This differs from my understanding of how threading.local() behaves. I expected that thread_local.name in default_pool would be stored somewhere like thread_local_dict[default_pool_thread_id]["name"], and the one for custom_pool would be stored somewhere like thread_local_dict[custom_pool_thread_id]["name"]. The following example is also able to show this behavior:

That would normally be right, but the important part about your example is that you redeclare the threading.local() object when calling init_python_thread. The prior thread-local dictionary containing name for thread 1 is completely lost!

In fact, I'm not even really sure what happens when you set a threading.local() attribute outside of the main thread. If the switch interval isn't messing things up, I suspect it's just getting set for the second thread, and then the first thread has no name at all (which would also explain the AttributeError).

You should refactor your code to something more like:

Py_Initialize();
PyRun_SimpleString("import threading; thread_local = threading.local()");
thread1 = run_in_thread("thread_local.name = 'first'");
thread2 = run_in_thread("thread_local.name = 'second'");
// ...
Py_Finalize();

@kevin85421
Copy link
Author

kevin85421 commented Feb 21, 2025

That would normally be right, but the important part about your example is that you redeclare the threading.local() object when calling init_python_thread.

Oops, my bad. I don't notice that I redeclare it. The following gist can get the threading local state in both threads.

https://gist.github.com/kevin85421/4b646b0ee07e5168b333d3325a59ca4c

@kevin85421
Copy link
Author

Where were you reading, and what kind of information would be useful to you?

This is the doc I read: https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock. There are something possible to improve:

  • Provide some runnable small examples (something like the gists) and explain the expected results of the examples.
  • What will happen to the C++ threads if PyGILState_ or PyThreadState goes out of scope?
  • Summarize the lifecycle of the GIL (when and how to acquire/release it).
  • How do I debug? If a deadlock occurs, how can I check whether a thread is waiting for the GIL and determine which thread holds it?

#130394 (comment)

Following up on my comment: Will the GIL be released based on the switch interval in this case?

That aside, I would suggest avoiding passing PyGILState_STATE and PyThreadState pointers up to the main function like you're currently doing. It's not wrong, but it's error-prone; if you accidentally use either of those in a different thread than where they were created, you'll get all sorts of hard-to-debug crashes.

To provide more context, I am working on ray-project/ray, a distributed computing engine. Users add the @ray.remote annotation to their Python functions or classes, and then Ray launches these functions (Ray tasks) or classes (Ray actors) on the underlying C++ threads in the Ray cluster.

Currently, we don't use Python.h. Instead, we pass the Python code as a callback from Python (via Cython) to C++. However, the C++ threads are considered as Python threads only when it executes the Python callback functions.

A C++ thread:

  • Executes Python function f1, which writes data to threading.local().
  • After f1 finishes, the Python interpreter considers the thread dead and garbage collects its threading local state.
  • Executes Python function f2, which attempts to access data from threading.local() → Fails.
    • From the Python interpreter's perspective, the thread with the same ID was dead and then became alive again which should not happen in Python IIUC.

To avoid this issue, I want the Python interpreter to treat the C++ thread as a Python thread, regardless of whether it is executing a Python function. My current solution:

  1. When we initialize the C++ thread, call PyGILState_Ensure to register the thread with the Python interpreter. Then, call PyEval_SaveThread to release GIL so that other threads can get GIL. At the same time, hold both gstate and tstate.

  2. Call PyEval_RestoreThread to restore thread state before executing Python function.

  3. Execute Python function

  4. Call PyEval_SaveThread after the Python function finishes

  5. Repeat step 2 - 4

  6. Call PyGILState_Release before stopping the C++ thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-C-API type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants