Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support free-threaded Python 3.13 #503

Open
andfoy opened this issue Feb 13, 2025 · 15 comments · May be fixed by #508
Open

Support free-threaded Python 3.13 #503

andfoy opened this issue Feb 13, 2025 · 15 comments · May be fixed by #508

Comments

@andfoy
Copy link
Contributor

andfoy commented Feb 13, 2025

This tracking issue aims to collect all the issues and PRs related for the current free-threaded CPython 3.13 (a.k.a. "no-GIL") builds.
For context, this new experimental CPython version enables multithreaded programs to get around the GIL limitation, thus reducing the execution time, whilst introducing a whole set of concurrency and parallelism issues.

So far, as part of a general effort, several key packages in the PyData ecosystem have been tested for concurrency issues in order to produce adequate and working wheels for free-threaded Python.

We invite the community and maintainers with extensive knowledge of the project to highlight the existence of any potential thread-safety issues that currently exist in numexpr.

Here are some resources that might be useful in the context of free-threaded Python

@rgommers
Copy link

rgommers commented Feb 18, 2025

Hi @FrancescAlted & other maintainers, I'd like to add a bit of context to this issue. Our team at Quansight has been working on free-threading support for widely used open source packages, and with numpy now in good shape things are unblocked for numexpr support. We are happy to contribute, or if you'd prefer to do it yourself please do let us know.

Here is what at least will need doing I think:

In addition, there are threading-related APIs in numexpr, like set_num_threads and the corresponding environment variables (NUMEXPR_MAX_THREADS, OMP_NUM_THREADS). If the package or end user which is using/calling numexpr doesn't use the Python threading module (usage is rare currently), then there's probably nothing to worry about. If it does use threading, there's a risk of oversubscription of threads under free-threading, just like when one would use multiprocessing today on a default (with-gil) CPython build. I'm not sure that there's anything to do here, but it's worth thinking about.

@FrancescAlted
Copy link
Contributor

Hi Ralf. We are currently busy, so we would be happy if you want to proceed and contribute a PR.

Regarding oversubscription, yeah, I can see the issue; is there a way to detect whether we are in the thread-free python interpreter? This way we could at least raise a warning about it.

FWIW, there is also: #502. While you are at work with this one, you may be interesting in fixing that too; up to you.

@rgommers
Copy link

Thanks Francesc.

I can see the issue; is there a way to detect whether we are in the thread-free python interpreter? This way we could at least raise a warning about it.

There is no public Python API for this, since free-threading is supposed to be temporary and become the default (and only) Python implementation in a few years. That said, there is certainly a need right now. The standard way we've been doing this detection (from https://py-free-threading.github.io/running-gil-disabled/) is:

is_freethreaded = bool(sysconfig.get_config_var("Py_GIL_DISABLED"))

A warning seems reasonable if oversubscription is detected. The more tricky part is probably to detect whether multiple Python threads are being used at the level above numexpr. I'm not sure how to do this. It's the same puzzle as with multiprocessing. In NumPy we recommend using threadpoolctl to control threading behavior, since we can't do it well within NumPy. Maybe that's the way to go here?

@FrancescAlted
Copy link
Contributor

Thanks Francesc.

I can see the issue; is there a way to detect whether we are in the thread-free python interpreter? This way we could at least raise a warning about it.

There is no public Python API for this, since free-threading is supposed to be temporary and become the default (and only) Python implementation in a few years. That said, there is certainly a need right now. The standard way we've been doing this detection (from https://py-free-threading.github.io/running-gil-disabled/) is:

is_freethreaded = bool(sysconfig.get_config_var("Py_GIL_DISABLED"))
A warning seems reasonable if oversubscription is detected.

Ok, so my vote is to use this for now.

The more tricky part is probably to detect whether multiple Python threads are being used at the level above numexpr. I'm not sure how to do this. It's the same puzzle as with multiprocessing. In NumPy we recommend using threadpoolctl to control threading behavior, since we can't do it well within NumPy. Maybe that's the way to go here?

Maybe. I didn't know about threadpoolctl, but feel free to add some notes in docs (like README) to suggest users going that way.

@rgommers
Copy link

rgommers commented Mar 19, 2025

I checked off all the TODOs in my comment higher up, they were all done in gh-504 (merged as part of gh-505). Follow-up actions from that are:

  • Fix an issue with init_sentinels_done (xref gh-504#comment)
  • Migrate wheel build jobs to native Linux aarch64 runners and start building cp313t aarch64 wheels
  • Docs: add a small script using threading in Python code in combination with the new numexpr. And perhaps, if it helps demonstrate speedups, add a pandas (using numexpr under the hood) example. (xref gh-504#comment)

@andfoy
Copy link
Contributor Author

andfoy commented Mar 19, 2025

fix an issue with init_sentinels_done

This one should be checked, the revert that @FrancescAlted did was because of some leftover code that I didn't remove when I was trying to check the locality of that variable

@rgommers
Copy link

This one should be checked, the revert that @FrancescAlted did was because of some leftover code that I didn't remove when I was trying to check the locality of that variable

So it got merged in gh-505? Or no changes are needed because there was never a problem to begin with?

@andfoy
Copy link
Contributor Author

andfoy commented Mar 19, 2025

No changes are needed regarding that item, thanks for the clarification question @rgommers

@FrancescAlted
Copy link
Contributor

Do you plan to continue with this anytime soon? I am trying to figure out whether I should do a release o wait a bit more. If you think that say, in a couple of weeks you can finish, I'll wait. Thanks!

@andfoy
Copy link
Contributor Author

andfoy commented Mar 25, 2025

I'm going to open a PR regarding docs today, as well for aarch64 native builds

@andfoy
Copy link
Contributor Author

andfoy commented Mar 26, 2025

@FrancescAlted could this benchmark script already exemplify using numexpr using threads? The only change required would be setting NUMEXPR_NUM_THREADS=1 https://github.com/pydata/numexpr/blob/master/bench/large_array_vs_numpy.py

@FrancescAlted
Copy link
Contributor

Doh, is that working for you? I am getting errors of this kind:

> NUMEXPR_NUM_THREADS=1 python bench/large_array_vs_numpy.py                             (py3.13)
Benchmarking Expression 1:
NumPy time (threaded over 32 chunks with 2 threads): 1.361437 seconds
Exception in thread Thread-4 (benchmark_numexpr_re_evaluate):
Traceback (most recent call last):
  File "/Users/faltet/miniforge3/envs/py3.13/lib/python3.13/threading.py", line 1041, in _bootstrap_inner
    self.run()
    ~~~~~~~~^^
  File "/Users/faltet/miniforge3/envs/py3.13/lib/python3.13/threading.py", line 992, in run
    self._target(*self._args, **self._kwargs)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/faltet/blosc/numexpr/bench/large_array_vs_numpy.py", line 96, in benchmark_numexpr_re_evaluate
    time_taken = timeit.timeit(
        lambda: ne.re_evaluate(
    ...<2 lines>...
        number=num_runs,
    )
  File "/Users/faltet/miniforge3/envs/py3.13/lib/python3.13/timeit.py", line 237, in timeit
    return Timer(stmt, setup, timer, globals).timeit(number)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "/Users/faltet/miniforge3/envs/py3.13/lib/python3.13/timeit.py", line 180, in timeit
    timing = self.inner(it, self.timer)
  File "<timeit-src>", line 6, in inner
  File "/Users/faltet/blosc/numexpr/bench/large_array_vs_numpy.py", line 97, in <lambda>
    lambda: ne.re_evaluate(
            ~~~~~~~~~~~~~~^
        local_dict={"a": a[start:end], "b": b[start:end], "c": c[start:end]}
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ),
    ^
  File "/Users/faltet/blosc/numexpr/numexpr/necompiler.py", line 1010, in re_evaluate
    args = getArguments(argnames, local_dict, global_dict, _frame_depth=_frame_depth)
  File "/Users/faltet/blosc/numexpr/numexpr/necompiler.py", line 761, in getArguments
    for name in names:
                ^^^^^
TypeError: 'NoneType' object is not iterable
numexpr time (threaded with re_evaluate over 32 chunks with 2 threads): 0.993288 seconds
numexpr speedup: 1.37x

@andfoy
Copy link
Contributor Author

andfoy commented Mar 26, 2025

Let me check!

@andfoy
Copy link
Contributor Author

andfoy commented Mar 26, 2025

@FrancescAlted the issue boils down to

Previously, as the cache was global, different threads could access the results left by past threads, however, since the cache is now thread-local, each thread needs to recompute their values independently.

@FrancescAlted
Copy link
Contributor

Ok. Then feel free to modify the benchmark to follow the new behavior. The idea is to have something that users can find educative for the new functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants