Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid (some) incref/decref immortality checks on 3.12+ #1044

Closed
JukkaL opened this issue Dec 30, 2023 · 4 comments · Fixed by python/mypy#18459
Closed

Avoid (some) incref/decref immortality checks on 3.12+ #1044

JukkaL opened this issue Dec 30, 2023 · 4 comments · Fixed by python/mypy#18459
Assignees
Labels

Comments

@JukkaL
Copy link
Collaborator

JukkaL commented Dec 30, 2023

Python 3.12 added support for immortal objects (python/cpython#19474). These add immortality checks to incref/decref operations, which add some overhead. Some programs don't get any advantage from immortal objects, and these could benefit from skipping these checks (enabled with a mypyc optimization flag). Skipping the checks seems a safe thing to do, based on this comment in Include/object.h in Python 3.12.0:

In 64+ bit systems, an object will be marked as immortal by setting all of the
lower 32 bits of the reference count field, which is equal to: 0xFFFFFFFF

Using the lower 32 bits makes the value backwards compatible by allowing
C-Extensions without the updated checks in Py_INCREF and Py_DECREF to safely
increase and decrease the objects reference count. The object would lose its
immortality, but the execution would still be correct.

Reference count increases will use saturated arithmetic, taking advantage of
having all the lower 32 bits set, which will avoid the reference count to go
beyond the refcount limit. Immortality checks for reference count decreases will
be done by checking the bit sign flag in the lower 32 bits.

Immortality checks in the Python runtime, stdlib and C extensions will still be present, but if most time is spent in code compiled using mypyc, skipping the checks could improve performance somewhat. In mypy self check I saw a 1.9% performance improvement from skipping them. I wouldn't be surprised if some use cases would get a 5-10% performance improvement. The self check improvement is small enough that it doesn't seem essential to use this in mypy wheels.

@JukkaL JukkaL added the speed label Dec 30, 2023
@hauntsaninja
Copy link
Collaborator

I think decref adds up to about 20% of mypy runtime on Python 3.11 when doing import torch (python/mypy#17919), so this could be a nice win.

Do you know off the top of your head if there's other low hanging fruit in mypyc regarding refcounting? Skimming the C code, I think I do see some redundant incref+decref pairs...

@JukkaL
Copy link
Collaborator Author

JukkaL commented Nov 4, 2024

I think much of decref cost is from freeing objects. Having per-type freelists should help with it (e.g. #1018, but many classes could benefit). Mypy allocates a lot of temporary objects, and if we'd have small per-type freelists for selected types, we could avoid a lot of expensive allocation/free operations.

Another idea would be to identify the top N most expensive compiled functions in a CPU profile, and manually look for redundant incref/decref operations by inspecting the generated source or IR. Any redundant operations you find are likely to be worth fixing, or at least worth investigating.

We could possibly avoid some incref/decref pairs if we'd support borrowing of final attributes. Example where this would help:

from typing import Final

class C:
   def __init__(self, s: str) -> None:
      self.s: Final = s

def foo(s: str) -> None: ...

def bar(c: C) -> None:
   foo(c.s)  # Redundant incref/decref, since c.s can't be freed during the call

We'd also need to make various attributes (e.g. in mypy.types) final to benefit from this.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 2, 2025

I've been working on upgrading the mypyc benchmarks runner to use Python 3.13 (previously it was using 3.8), and I noticed that some benchmarks are clearly slower on 3.13, and at least much of the impact seems to be from immortality checks. For example, the richards benchmark went from 0.0022s on 3.11 to 0.0038s on 3.12 (around 70% increase in execution time).

This now seems high priority, even if the impact to self check is not high. Also, we should try to avoid the overhead by default, instead of putting it behind a compiler flag.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 3, 2025

I have a basic draft implementation that skips immortality checks for native classes and some mutable built-in types that can't be safely shared between subinterpreters. This speeds up richards by about 30%, and self check by about 1.9% 1.6%. It looks like this may give us back about half of the perf regression from immortal objects, while still allowing the necessary object sharing between subinterpreters (I think e.g. None must be immortal). I'll try to get this included in the next mypy public release.

@JukkaL JukkaL self-assigned this Jan 3, 2025
@JukkaL JukkaL changed the title Optionally avoid incref/decref immortality checks on 3.12+ Avoid (some) incref/decref immortality checks on 3.12+ Jan 14, 2025
x612skm pushed a commit to x612skm/mypy-dev that referenced this issue Feb 24, 2025
python#18459)

Fixes mypyc/mypyc#1044.

The addition of object immortality in Python 3.12 (PEP 683) introduced
an extra immortality check to incref and decref operations. Objects with
a specific reference count are treated as immortal, and their reference
counts are never updated.

It turns out that this slowed down the performance of certain workloads
a lot (up to 70% increase in runtime, compared to 3.11). This PR reduces
the impact of immortality via a few optimizations:

1. Assume instances of native classes and list objects are not immortal
(skip immortality checks).
2. Skip incref of certain objects in some contexts when we know that
they are immortal (e.g. avoid incref of `None`).

The second change should be clear. We generally depend on CPython
implementation details to improve performance, and this seems safe to do
here as well.

The first change could turn immortal objects into non-immortal ones. For
native classes this is a decision we can arguably make -- native classes
don't properly support immortality, and they can't be shared between
subinterpreters. As discussed in PEP 683, skipping immortality checks
here is acceptable even in cases where somebody tries to make a native
instance immortal, but this could have some performance or memory use
impact. The performance gains make this a good tradeoff.

Since lists are mutable, they can't be safely shared between
subinterpreters, so again not dealing with immortality is acceptable. It
could reduce performance in some use cases by deimmortalizing lists, but
this potential impact seems marginal compared to faster incref and
decref operations on lists, which are some of the more common objects in
Python programs.

This speeds up self check by about 1.5% on Python 3.13. This speeds up
the richards benchmark by 30-35% (!) on 3.13, and also some other
benchmarks see smaller improvements.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants