Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-130115: fix return value of threading.get_ident for the main thread on 32bit musl #130391

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

vfazio
Copy link
Contributor

@vfazio vfazio commented Feb 21, 2025

CPython's pthread-based thread identifier relies on pthread_t being able to be represented as an unsigned integer type.

This is true in most Linux libc implementations where it's defined as an unsigned long, however musl typedefs it as a struct *.

If the pointer has the high bit set and is cast to PyThread_ident_t, the resultant value can be sign-extended (https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/Arrays-and-pointers-implementation.html). This can cause issues when comparing against threading._MainThread's identifier. The main thread's identifier value is retrieved via _get_main_thread_ident which is backed by an unsigned long which truncates sign extended bits.

  >>> hex(threading.main_thread().ident)
  '0xb6f33f3c'
  >>> hex(threading.current_thread().ident)
  '0xffffffffb6f33f3c'

Work around this by conditionally compiling in some code for non-glibc based Linux platforms that are at risk of sign-extension to return a PyLong based on the main thread's unsigned long thread identifier if the current thread is the main thread.

musl isn't "officially" supported in PEP 11, however platform detection was added in c163d7f and similar PRs have been merged in the past which target it 5633c4f

This PR is intended to be a "minimum" to get this working. Longer term there should maybe be work to keep pthread_t opaque and not make assumptions about its type.


I'm open to changing the implementation.

Options I've considered:

  • modifying PyThread_get_thread_ident_ex with a conditional compile directive to cast through either uintptr_t or unsigned long on potentially affected platforms, which would restore pre GH-110829: Ensure Thread.join() joins the OS thread #110848 behavior for musl. I'm hesitant to bandaid this code, but doing this shouldn't break anything we hadn't made assumptions about prior to the aforementioned PR.

  • Adding a configure check to test if pthread_t is arithmetic and then assuming it's a pointer if the check fails. There already exists a similar check for pthread_key_t for the deprecated (but not removed) TLS API (see PTHREAD_KEY_T_IS_COMPATIBLE_WITH_INT). However pthread_t could just as easily be a struct in some other libc, though maybe we don't care about this part. Either way, it seemed like too big a change that will hopefully get reverted at some point and i didn't want to risk leaving a configure check that will live forever without re-review (examples: aforementioned TLS API, the getaddrinfo IPv6 check added 20+ years ago, etc)

  • adding a new struct member to _PyRuntime that is a PyThread_ident_t type and update thread__get_main_thread_ident to return its value. Adding and maintaining an additional struct member seemed like overkill


CPython's pthread-based thread identifier relies on pthread_t being able
to be represented as an unsigned integer type.

This is true in most Linux libc implementations where it's defined as an
unsigned long, however musl typedefs it as a struct *.

If the pointer has the high bit set and is cast to PyThread_ident_t, the
resultant value can be sign-extended [0]. This can cause issues when
comparing against threading._MainThread's identifier. The main thread's
identifier value is retrieved via _get_main_thread_ident which is backed
by an unsigned long which truncates sign extended bits.

  >>> hex(threading.main_thread().ident)
  '0xb6f33f3c'
  >>> hex(threading.current_thread().ident)
  '0xffffffffb6f33f3c'

Work around this by conditionally compiling in some code for non-glibc
based Linux platforms that are at risk of sign-extension to return a
PyLong based on the main thread's unsigned long thread identifier if the
current thread is the main thread.

[0]: https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/Arrays-and-pointers-implementation.html

Signed-off-by: Vincent Fazio <[email protected]>
Signed-off-by: Vincent Fazio <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant