Autodiff batching #137880

ZuseZ4 · 2025-03-02T06:48:49Z

Enzyme supports batching, which is especially known from the ML side when training neural networks.
There we would normally have a training loop, where in each iteration we would pass in some data (e.g. an image), and a target vector. Based on how close we are with our prediction we compute our loss, and then use backpropagation to compute the gradients and update our weights.
That's quite inefficient, so what you normally do is passing in a batch of 8/16/.. images and targets, and compute the gradients for those all at once, allowing better optimizations.

Enzyme supports batching in two ways, the first one (which I implemented here) just accepts a Batch size,
and then each Dual/Duplicated argument has not one, but N shadow arguments. So instead of

for i in 0..100 {
   df(x[i], y[i], 1234);
}

You can now do

for i in 0..100.step_by(4) {
   df(x[i+0],x[i+1],x[i+2],x[i+3], y[i+0], y[i+1], y[i+2], y[i+3], 1234);
}

which will give the same results, but allows better compiler optimizations. See the testcase for details.

There is a second variant, where we can mark certain arguments and instead of having to pass in N shadow arguments, Enzyme assumes that the argument is N times longer. I.e. instead of accepting 4 slices with 12 floats each, we would accept one slice with 48 floats. I'll implement this over the next days.

I will also add more tests for both modes.

For any one preferring some more interactive explanation, here's a video of Tim's llvm dev talk, where he presents his work. https://www.youtube.com/watch?v=edvaLAL5RqU
I'll also add some other docs to the dev guide and user docs in another PR.

r? ghost

Tracking:

ZuseZ4 · 2025-03-02T07:15:26Z

@rustbot label +F-autodiff

bors · 2025-03-08T00:27:49Z

☔ The latest upstream changes (presumably #138177) made this pull request unmergeable. Please resolve the merge conflicts.

tests/ui/autodiff/autodiff_illegal.stderr

tests/codegen/autodiff.rs

ZuseZ4 · 2025-04-03T03:15:35Z

Ok, so this is enough for one PR.
It adds most of the batching infrastructure, but it only explicitly tests it for forward-mode autodiff. It also adds support for sret in combination with forward-mode-batching.

There are three cases which I left for a follow-up PR, to not make this PR too large.

Reverse-Mode-batching
sret handling for reverse-mode (scalar/batching) or forward-mode (scalar)
The second batching mode. Right now we have support batching where each (non-const) arg is passed N times, which allows fusing N function calls (e.g. in a loop) into one call. There is a second mode, which just accepts just one shadow arg (similar to scalar mode), but instead each arg is N times larger (e.g. a vector now has N times the len).

Now that I have more features implemented, it also becomes a bit clearer to me how this code should look like, so I did some refactorings, even though I tried to split out most of that into the previous cleanup PR.

I'll replace the todo's with propper errors, even though the things to do hopefully won't stay for many days.
Let me know what else you'd think could be improved. (I also generally assume I'll do another refactor once all of batching is merged, since then I know how much code we'll have where.)

rustbot · 2025-04-03T03:15:40Z

Some changes occurred in compiler/rustc_codegen_ssa/src/codegen_attrs.rs

cc @jdonszelmann

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

compiler/rustc_builtin_macros/src/autodiff.rs

compiler/rustc_codegen_llvm/src/back/lto.rs

compiler/rustc_codegen_llvm/src/builder/autodiff.rs

compiler/rustc_ast/src/expand/autodiff_attrs.rs

compiler/rustc_builtin_macros/src/autodiff.rs

tests/codegen/autodiffv.rs

compiler/rustc_codegen_llvm/src/builder/autodiff.rs

compiler/rustc_codegen_llvm/src/llvm/enzyme_ffi.rs

compiler/rustc_codegen_llvm/src/context.rs

ZuseZ4 · 2025-04-03T19:58:08Z

Thank you for all the feedback! I think I should have addressed everything, do you have any other comments?

github UI thinks I have some Requested changes from you left, but I can't find them.

oli-obk · 2025-04-03T20:37:37Z

Github UI doesn't care about resolving comments... Re-reviewing now

oli-obk · 2025-04-03T20:42:25Z

Please squash the review commits. If you don't want to fiddle the review commits into appropriate earlier commits, squashing all of the commits in this PR is fine by me

ZuseZ4 · 2025-04-03T21:28:54Z

The individual commits build nicely on their own, so it was easy to clean up the history.

@bors r=@oli-obk

bors · 2025-04-03T21:28:56Z

📌 Commit 2898b90 has been approved by oli-obk

It is now in the queue for this repository.

Autodiff batching Enzyme supports batching, which is especially known from the ML side when training neural networks. There we would normally have a training loop, where in each iteration we would pass in some data (e.g. an image), and a target vector. Based on how close we are with our prediction we compute our loss, and then use backpropagation to compute the gradients and update our weights. That's quite inefficient, so what you normally do is passing in a batch of 8/16/.. images and targets, and compute the gradients for those all at once, allowing better optimizations. Enzyme supports batching in two ways, the first one (which I implemented here) just accepts a Batch size, and then each Dual/Duplicated argument has not one, but N shadow arguments. So instead of ```rs for i in 0..100 { df(x[i], y[i], 1234); } ``` You can now do ```rs for i in 0..100.step_by(4) { df(x[i+0],x[i+1],x[i+2],x[i+3], y[i+0], y[i+1], y[i+2], y[i+3], 1234); } ``` which will give the same results, but allows better compiler optimizations. See the testcase for details. There is a second variant, where we can mark certain arguments and instead of having to pass in N shadow arguments, Enzyme assumes that the argument is N times longer. I.e. instead of accepting 4 slices with 12 floats each, we would accept one slice with 48 floats. I'll implement this over the next days. I will also add more tests for both modes. For any one preferring some more interactive explanation, here's a video of Tim's llvm dev talk, where he presents his work. https://www.youtube.com/watch?v=edvaLAL5RqU I'll also add some other docs to the dev guide and user docs in another PR. r? ghost Tracking: - rust-lang#124509 - rust-lang#135283

Rollup of 14 pull requests Successful merges: - rust-lang#137869 (Demote i686-pc-windows-gnu to Tier 2) - rust-lang#137880 (Autodiff batching) - rust-lang#138546 (Add integer to string formatting tests) - rust-lang#138947 (Refactor Apple version handling in the compiler) - rust-lang#138950 (replace extra_filename with strict version hash in metrics file names) - rust-lang#139213 (Run coretests and alloctests with cg_clif in CI) - rust-lang#139274 (Rustdoc: typecheck settings.js) - rust-lang#139295 (Remove creation of duplicate `AnonPipe`) - rust-lang#139298 (Allow for missing invisible close delim when reparsing an expression.) - rust-lang#139313 (Deduplicate some `rustc_middle` function bodies by calling the `rustc_type_ir` equivalent) - rust-lang#139317 (compiletest: Encapsulate all of the code that touches libtest) - rust-lang#139322 (Add helper function for checking LLD usage to `run-make-support`) - rust-lang#139335 (Pass correct param-env to `error_implies`) - rust-lang#139342 (Add a mailmap entry for myself) Failed merges: - rust-lang#138949 (Rename `is_like_osx` to `is_like_darwin`) r? `@ghost` `@rustbot` modify labels: rollup

Autodiff batching Enzyme supports batching, which is especially known from the ML side when training neural networks. There we would normally have a training loop, where in each iteration we would pass in some data (e.g. an image), and a target vector. Based on how close we are with our prediction we compute our loss, and then use backpropagation to compute the gradients and update our weights. That's quite inefficient, so what you normally do is passing in a batch of 8/16/.. images and targets, and compute the gradients for those all at once, allowing better optimizations. Enzyme supports batching in two ways, the first one (which I implemented here) just accepts a Batch size, and then each Dual/Duplicated argument has not one, but N shadow arguments. So instead of ```rs for i in 0..100 { df(x[i], y[i], 1234); } ``` You can now do ```rs for i in 0..100.step_by(4) { df(x[i+0],x[i+1],x[i+2],x[i+3], y[i+0], y[i+1], y[i+2], y[i+3], 1234); } ``` which will give the same results, but allows better compiler optimizations. See the testcase for details. There is a second variant, where we can mark certain arguments and instead of having to pass in N shadow arguments, Enzyme assumes that the argument is N times longer. I.e. instead of accepting 4 slices with 12 floats each, we would accept one slice with 48 floats. I'll implement this over the next days. I will also add more tests for both modes. For any one preferring some more interactive explanation, here's a video of Tim's llvm dev talk, where he presents his work. https://www.youtube.com/watch?v=edvaLAL5RqU I'll also add some other docs to the dev guide and user docs in another PR. r? ghost Tracking: - rust-lang#124509 - rust-lang#135283

ZuseZ4 · 2025-04-04T18:30:32Z

Not part of any rollup rn, so I pushed a 3 line bugfix to compiler/rustc_codegen_llvm/src/builder/autodiff.rs, which I discovered while working on the second mode, as part of more extended testing.

@bors r=@oli-obk

bors · 2025-04-04T18:30:35Z

📌 Commit 89d8948 has been approved by oli-obk

It is now in the queue for this repository.

Rollup of 11 pull requests Successful merges: - rust-lang#136457 (Expose algebraic floating point intrinsics) - rust-lang#137880 (Autodiff batching) - rust-lang#137897 (fix pthread-based tls on apple targets) - rust-lang#138024 (Allow optimizing out `panic_bounds_check` in Unicode checks.) - rust-lang#138546 (Add integer to string formatting tests) - rust-lang#138826 (StableMIR: Add `associated_items`.) - rust-lang#138950 (replace extra_filename with strict version hash in metrics file names) - rust-lang#139274 (Rustdoc: typecheck settings.js) - rust-lang#139285 (use lower case to match other error messages) - rust-lang#139341 (Apply `Recovery::Forbidden` when reparsing pasted macro fragments.) - rust-lang#139389 (make `Arguments::as_statically_known_str` doc(hidden)) r? `@ghost` `@rustbot` modify labels: rollup

Rollup merge of rust-lang#137880 - EnzymeAD:autodiff-batching, r=oli-obk Autodiff batching Enzyme supports batching, which is especially known from the ML side when training neural networks. There we would normally have a training loop, where in each iteration we would pass in some data (e.g. an image), and a target vector. Based on how close we are with our prediction we compute our loss, and then use backpropagation to compute the gradients and update our weights. That's quite inefficient, so what you normally do is passing in a batch of 8/16/.. images and targets, and compute the gradients for those all at once, allowing better optimizations. Enzyme supports batching in two ways, the first one (which I implemented here) just accepts a Batch size, and then each Dual/Duplicated argument has not one, but N shadow arguments. So instead of ```rs for i in 0..100 { df(x[i], y[i], 1234); } ``` You can now do ```rs for i in 0..100.step_by(4) { df(x[i+0],x[i+1],x[i+2],x[i+3], y[i+0], y[i+1], y[i+2], y[i+3], 1234); } ``` which will give the same results, but allows better compiler optimizations. See the testcase for details. There is a second variant, where we can mark certain arguments and instead of having to pass in N shadow arguments, Enzyme assumes that the argument is N times longer. I.e. instead of accepting 4 slices with 12 floats each, we would accept one slice with 48 floats. I'll implement this over the next days. I will also add more tests for both modes. For any one preferring some more interactive explanation, here's a video of Tim's llvm dev talk, where he presents his work. https://www.youtube.com/watch?v=edvaLAL5RqU I'll also add some other docs to the dev guide and user docs in another PR. r? ghost Tracking: - rust-lang#124509 - rust-lang#135283

rustbot added A-attributes Area: Attributes (`#[…]`, `#![…]`) S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 2, 2025

This comment has been minimized.

Sign in to view

rustbot added the F-autodiff `#![feature(autodiff)]` label Mar 2, 2025

This comment has been minimized.

Sign in to view

ZuseZ4 force-pushed the autodiff-batching branch 2 times, most recently from 0243b2b to a1865e2 Compare March 13, 2025 05:53

ZuseZ4 force-pushed the autodiff-batching branch from b76368b to 722b3d0 Compare March 24, 2025 05:48

This comment has been minimized.

Sign in to view

ZuseZ4 added the F-batching `#![feature(batching)]` label Mar 25, 2025

ZuseZ4 mentioned this pull request Mar 25, 2025

Expose experimental LLVM features for GPU offloading rust-lang/rust-project-goals#109

Open

4 tasks

ZuseZ4 force-pushed the autodiff-batching branch from 722b3d0 to 963b3ce Compare March 31, 2025 22:28

This comment has been minimized.

Sign in to view

ZuseZ4 commented Apr 3, 2025

View reviewed changes

tests/ui/autodiff/autodiff_illegal.stderr Show resolved Hide resolved

ZuseZ4 commented Apr 3, 2025

View reviewed changes

tests/codegen/autodiff.rs Show resolved Hide resolved

This comment has been minimized.

Sign in to view

ZuseZ4 marked this pull request as ready for review April 3, 2025 03:15

ZuseZ4 requested a review from oli-obk April 3, 2025 03:15

ZuseZ4 closed this Apr 3, 2025

ZuseZ4 reopened this Apr 3, 2025

ZuseZ4 force-pushed the autodiff-batching branch from 5935252 to e16de5d Compare April 3, 2025 06:50

oli-obk requested changes Apr 3, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

ZuseZ4 requested a review from oli-obk April 3, 2025 19:58

oli-obk approved these changes Apr 3, 2025

View reviewed changes

ZuseZ4 added 2 commits April 3, 2025 17:19

add the autodiff batch mode frontend

087ffd7

add autodiff batching middle-end

e0c8ead

ZuseZ4 force-pushed the autodiff-batching branch from 51a79e3 to 2898b90 Compare April 3, 2025 21:26

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 3, 2025

ZuseZ4 mentioned this pull request Apr 3, 2025

Tracking Issue for autodiff #124509

Open

7 tasks

Zalathar mentioned this pull request Apr 4, 2025

Rollup of 14 pull requests #139344

Closed

Zalathar mentioned this pull request Apr 4, 2025

Rollup of 8 pull requests #139358

Closed

ZuseZ4 added 3 commits April 4, 2025 14:24

add autodiff batching backend

b7c63a9

add new tests for autodiff batching and update old ones

79e17bc

add new flag to print the module post-AD, before opts

89d8948

ZuseZ4 force-pushed the autodiff-batching branch from 2898b90 to 89d8948 Compare April 4, 2025 18:29

Zalathar mentioned this pull request Apr 5, 2025

Rollup of 11 pull requests #139396

Merged

bors merged commit c6bf3a0 into rust-lang:master Apr 5, 2025
6 checks passed

rustbot added this to the 1.88.0 milestone Apr 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autodiff batching #137880

Autodiff batching #137880

ZuseZ4 commented Mar 2, 2025 •

edited

Loading

This comment has been minimized.

ZuseZ4 commented Mar 2, 2025

This comment has been minimized.

bors commented Mar 8, 2025

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

ZuseZ4 commented Apr 3, 2025 •

edited

Loading

rustbot commented Apr 3, 2025

This comment has been minimized.

This comment has been minimized.

ZuseZ4 commented Apr 3, 2025 •

edited

Loading

oli-obk commented Apr 3, 2025

oli-obk commented Apr 3, 2025

ZuseZ4 commented Apr 3, 2025

bors commented Apr 3, 2025

ZuseZ4 commented Apr 4, 2025

bors commented Apr 4, 2025

Autodiff batching #137880

Autodiff batching #137880

Conversation

ZuseZ4 commented Mar 2, 2025 • edited Loading

This comment has been minimized.

ZuseZ4 commented Mar 2, 2025

This comment has been minimized.

bors commented Mar 8, 2025

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

ZuseZ4 commented Apr 3, 2025 • edited Loading

rustbot commented Apr 3, 2025

This comment has been minimized.

This comment has been minimized.

ZuseZ4 commented Apr 3, 2025 • edited Loading

oli-obk commented Apr 3, 2025

oli-obk commented Apr 3, 2025

ZuseZ4 commented Apr 3, 2025

bors commented Apr 3, 2025

ZuseZ4 commented Apr 4, 2025

bors commented Apr 4, 2025

ZuseZ4 commented Mar 2, 2025 •

edited

Loading

ZuseZ4 commented Apr 3, 2025 •

edited

Loading

ZuseZ4 commented Apr 3, 2025 •

edited

Loading