Skip to content

Change MSBuild to use the Clang backend #690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Fidget-Spinner opened this issue Jul 2, 2024 · 18 comments
Closed

Change MSBuild to use the Clang backend #690

Fidget-Spinner opened this issue Jul 2, 2024 · 18 comments

Comments

@Fidget-Spinner
Copy link
Collaborator

MSBuild has supported using clang backend for awhile now. https://learn.microsoft.com/en-us/cpp/build/clang-support-msbuild?view=msvc-170

MSVC has been somewhat unpredictable in comparison to the other compilers. We've had reports that it does not benefit from Python 3.11 1, of random slowdowns 2, of ceval.c being too big for MSVC to inline/optimize properly 3, and now, slowdowns that only hit it and no other platform 4.

We should seriously consider switching to another compiler, because it's starting to hinder our own productivity and performance goals on Windows.

@brandtbucher
Copy link
Member

@zooba

@zooba
Copy link

zooba commented Jul 2, 2024

Last time I ran benchmarks, clang wasn't any better than MSVC on Windows (the clang-cl support works because I updated it to run this benchmark).

It's also a really significant compatibility change, and there will certainly be users who want to keep building with MSVC. So I don't think we get to drop support entirely - it may only impact the python.org releases (possibly excluding the Store package, I'm not sure if clang support extends that far).

Personally, I doubt it's worth the effort. But if someone wants to make the effort to show that it's both faster and remains binary compatible1, then go for it.

Footnotes

  1. For stable ABI purposes only, of course.

@erozenfeld
Copy link

@Fidget-Spinner

I work on msvc compiler and will take a look at the issues you reported.

We've had reports that it does not benefit from Python 3.11 1, of random slowdowns 2

You have two identical links above, I suspect that's not what you intended?

@Fidget-Spinner
Copy link
Collaborator Author

@erozenfeld thank you for your attention. Sorry if I seem like I'm dunking on the MSVC compiler in the initial issue. If it seems overly negative, I'm sorry about that.

Yes the link is #321 instead. But out of all the links, the only actionable one I think is python/cpython#121263. We've had a problem where ceval.c's _PyEval_EvalFrameDefault is so big that MSVC refuses to inline small functions anymore, making us use macros all over.

Thank you for your time!

@erozenfeld
Copy link

erozenfeld commented Jul 17, 2024

@Fidget-Spinner Thank you for the clarification. I spent some time looking at python/cpython#121263.

I found one compiler issue you are running into. Normally we compile functions in a bottom-up call graph order, i.e. callees before callers. That way we have more information for optimizations, including inlining. However, with multi-threaded compilation we schedule huge functions for earlier compilation in order to improve the overall multithreaded compilation time. The downside is that these huge functions don't always have information about callees, which results in worse code generation (and worse inlining decisions) for these huge functions. We are considering disabling that scheduling tweak for PGO (or perhaps for huge functions that are hot according to PGO info). In the meantime we have a switch that disables that: /d2:-noadditionalscheduling. It should be passed to link.exe. I reverted the macro-ization in python/cpython@722229e and checked what happens with /d2:-noadditionalscheduling . Of the functions that you turned into macros PyStackRef_Is is now inlined in all 17 call sites. Note that PyStackRef_AsPyObjectBorrow, PyStackRef_TYPE, and PyStackRef_FromPyObjectImmortal are inlined even without /d2:-noadditionalscheduling.
Unfortunately, PyStackRef_AsPyObjectSteal, PyStackRef_AsPyObjectNew, PyStackRef_FromPyObjectSteal, PyStackRef_FromPyObjectNew, PyStackRef_CLOSE and PyStackRef_DUP are over the limit when the caller is huge and are not inlined. I'll keep looking at whether inlining heuristics can be tweaked but I suggest you should add /d2:-noadditionalscheduling since it enables a bunch of inlines outside of the set that was turned into macros in python/cpython@722229e . For example, calls to the following functions are now inlined: _Py_EnsureFuncTstateNotNULL, Py_IS_TYPE, _Py_DECREF_SPECIALIZED, _Py_DECREF_NO_DEALLOC, _PyLong_IsNegative, _PyLong_IsCompact, _PyLong_IsNonNegativeCompact, _PyLong_IsZero, PyFloat_FromDouble, PyType_HasFeature, PyType_Check, _PyFunction_SetVersion, and _PyObject_GC_IS_TRACKED. I'm curious if that will improve your benchmark results.

@erozenfeld
Copy link

One more note: the compilation order scheduling tweak mentioned above affects not only _PyEval_EvalFrameDefault but 158 other functions considered "big" so their codegen may be negatively affected. So adding /d2:-noadditionalscheduling should help with codegen of those functions too.

@mdboom
Copy link
Contributor

mdboom commented Jul 18, 2024

I'll run a build with /d2:-noadditionalscheduling over our benchmark suite and report back. (May take a couple of days, I just got back from vacation to our Windows benchmarking machine behaving strangely, so I'll need to fix that first).

@mdboom
Copy link
Contributor

mdboom commented Jul 19, 2024

The /d2:-noadditionalscheduling flag helps, but not as much as the macro-ifying changes in python/cpython#121270.

I took the parent of python/cpython#121270, 93156880efd14ad7adc7d3512552b434f5543890, which still used static inline, and applied the /d2:-noadditionalscheduling flag. This resulted in a 2% speedup on 64-bit Windows, vs. a 5% speedup by converting all of those new functions to macros.

To confirm this, I am going to run 2 more experiments over the weekend:

  • The effect of applying the flag to CPython main
  • The effect of reverting #121790 against current CPython main and applying the flag

@mdboom
Copy link
Contributor

mdboom commented Jul 19, 2024

The results are in:

(All of this is with PGO, as is all of our benchmarking).

So, it's clear that the flag helps (2% wins are hard to come by), and perhaps we should always use it for PGO builds (where the slower build times are already expected). But it's also clear that the macros (forcible inlining) are still helpful.

Thanks again for all your help on this @erozenfeld. I'd be happy to help benchmark any ways to tweak the inlining heuristics that you think might be helpful.

@Fidget-Spinner
Copy link
Collaborator Author

Fidget-Spinner commented Feb 8, 2025

@zooba I'd like to restart this discussion. Clang now promises ABI compatibility with MSVC for everything we care about in CPython https://clang.llvm.org/docs/MSVCCompatibility.html

Would you be open to allowing a clang backend specified in the MSBuild solutions file? With the new tail calling interpreter, clang 19 is a significant step up in performance, even with no PGO on. We can keep the MSBuild backend as an alternative in ./build.bat. However, I'd like to push for us to try making it what we build the 3.14 Windows releases with.

This isn't just for perf reasons too. @colesbury has mentioned multiple times to consider switching to clang because it's blocking some work we (both Meta and Faster CPython) want to do on StackRefs in CPython. We need StackRefs to make free-threading the default too. So there's multiple incentives here.

@Zheaoli
Copy link

Zheaoli commented Feb 9, 2025

Update

We have official binary release for clang 19.1.0 for Windows FYI https://github.com/llvm/llvm-project/releases/tag/llvmorg-19.1.0

I will test the workflow this week


I can work on this issue. But here's a critical issue I think we need to work on. For now, there is not an official clang 19 binary for windows platform, the newest clang version is v18.1.8 which is bundled with Visual Studio 2022 (17.12) Preview. FYI https://learn.microsoft.com/en-us/gaming/gdk/_content/gc/tools-pc/visualstudio/gr-vs-clang

So how about we switch to clang 18 first, and upgrade to 19 when the new Visual Studio is released with clang 19. Or we need to maintain a custom binary for our environment?

cc @Fidget-Spinner

@Fidget-Spinner
Copy link
Collaborator Author

Fidget-Spinner commented Feb 9, 2025

@Zheaoli thanks for taking this up! You might want to wait for @mdboom to give some suggestions too on Monday. He's made some changes to bench runner to work with clang. Apparently it's just a one line change here https://github.com/faster-cpython/bench_runner/blob/164361bbb9a82dcb9bdaba8f155d18f4629b0b1a/bench_runner/templates/_benchmark.src.yml#L115 (see the additional inputs.clang options).

@chris-eibl
Copy link

chris-eibl commented Feb 9, 2025

@zooba I'd like to restart this discussion. Clang now promises ABI compatibility with MSVC for everything we care about in CPython https://clang.llvm.org/docs/MSVCCompatibility.html

Chrome is using clang-cl now for quite some time: https://blog.llvm.org/2018/03/clang-is-now-used-to-build-chrome-for.html
There is also a long list of motivations included in the above link, immediately followed by a pros/cons section.

[...] no PGO [...]

I've done some small changes to enable PGO in case of clang-cl. I can craft a pull request, if there is interest on it.

Likewise, I've spotted some places where we could adapt some #ifs to let clang-cl "see" some tweaks that are currently missed, because clang-cl does (must) not define __GNUC__, e.g. in _Py_HOT_FUNCTION.

But I think all that would need a build bot first?

@Fidget-Spinner
Copy link
Collaborator Author

@chris-eibl can you open a draft PR please so we can estimate how much effort this is?

@Zheaoli sorry I think Chris might take over this as he seems like he already has something working. Though I might still need your help testing this.

@Zheaoli
Copy link

Zheaoli commented Feb 9, 2025

@chris-eibl can you open a draft PR please so we can estimate how much effort this is?

@Zheaoli sorry I think Chris might take over this as he seems like he already has something working. Though I might still need your help testing this.

No problem, it's ok for me lol

@chris-eibl
Copy link

Draft PR is here: python/cpython#129907

Help and feedback more than welcome :)

@char101
Copy link

char101 commented Feb 10, 2025

Hi, I would like to drop some requests for this issue

  1. If using clang-cl, please also enable HAVE_COMPUTED_GOTOS and USE_COMPUTED_GOTOS
  2. If possible please support a custom clang directory (not just the distributed clang from Visual Studio which often lag by some major version).
  3. What about LTO in addition to PGO?

I usually add these properties to compile python with clang-cl

Directory.build.props

  <PropertyGroup>
    <LLVMInstallDir>A:\Lang\clang\20.1.0-rc1</LLVMInstallDir>
    <LLVMToolsVersion>20</LLVMToolsVersion>
  </PropertyGroup>

pyproject.props

  <PropertyGroup Label="Globals">
    <LLVMInstallDir>A:\Lang\clang\20.1.0-rc1</LLVMInstallDir>
    <LLVMToolsVersion>20</LLVMToolsVersion>
    <AdditionalOptions Condition="$(PlatformToolset) == 'ClangCL'">-Wno-deprecated-non-prototype -Wno-unused-label -Wno-pointer-sign -Wno-incompatible-pointer-types-discards-qualifiers -Wno-unused-function %(AdditionalOptions)</AdditionalOptions>
    <AdditionalOptions Condition="$(Configuration) != 'Debug' and $(PlatformToolset) == 'ClangCL'">/clang:-flto=thin /clang:-O3 /clang:-march=native /clang:-mno-retpoline /clang:-fomit-frame-pointer %(AdditionalOptions)</AdditionalOptions>

@Fidget-Spinner
Copy link
Collaborator Author

This has too much splash radius and requires a PEP. So I'm closing this. We will add build support for clang-cl in the Windows builds but won't switch to it as default.

@Fidget-Spinner Fidget-Spinner closed this as not planned Won't fix, can't repro, duplicate, stale Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants