-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autodiff batching #137880
Autodiff batching #137880
Conversation
This comment has been minimized.
This comment has been minimized.
@rustbot label +F-autodiff |
This comment has been minimized.
This comment has been minimized.
☔ The latest upstream changes (presumably #138177) made this pull request unmergeable. Please resolve the merge conflicts. |
0243b2b
to
a1865e2
Compare
b76368b
to
722b3d0
Compare
This comment has been minimized.
This comment has been minimized.
722b3d0
to
963b3ce
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Ok, so this is enough for one PR. There are three cases which I left for a follow-up PR, to not make this PR too large.
Now that I have more features implemented, it also becomes a bit clearer to me how this code should look like, so I did some refactorings, even though I tried to split out most of that into the previous cleanup PR. I'll replace the todo's with propper errors, even though the things to do hopefully won't stay for many days. |
Some changes occurred in compiler/rustc_codegen_ssa/src/codegen_attrs.rs Some changes occurred in compiler/rustc_codegen_ssa |
5935252
to
e16de5d
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Thank you for all the feedback! I think I should have addressed everything, do you have any other comments? github UI thinks I have some Requested changes from you left, but I can't find them. |
Github UI doesn't care about resolving comments... Re-reviewing now |
Please squash the review commits. If you don't want to fiddle the review commits into appropriate earlier commits, squashing all of the commits in this PR is fine by me |
51a79e3
to
2898b90
Compare
Autodiff batching Enzyme supports batching, which is especially known from the ML side when training neural networks. There we would normally have a training loop, where in each iteration we would pass in some data (e.g. an image), and a target vector. Based on how close we are with our prediction we compute our loss, and then use backpropagation to compute the gradients and update our weights. That's quite inefficient, so what you normally do is passing in a batch of 8/16/.. images and targets, and compute the gradients for those all at once, allowing better optimizations. Enzyme supports batching in two ways, the first one (which I implemented here) just accepts a Batch size, and then each Dual/Duplicated argument has not one, but N shadow arguments. So instead of ```rs for i in 0..100 { df(x[i], y[i], 1234); } ``` You can now do ```rs for i in 0..100.step_by(4) { df(x[i+0],x[i+1],x[i+2],x[i+3], y[i+0], y[i+1], y[i+2], y[i+3], 1234); } ``` which will give the same results, but allows better compiler optimizations. See the testcase for details. There is a second variant, where we can mark certain arguments and instead of having to pass in N shadow arguments, Enzyme assumes that the argument is N times longer. I.e. instead of accepting 4 slices with 12 floats each, we would accept one slice with 48 floats. I'll implement this over the next days. I will also add more tests for both modes. For any one preferring some more interactive explanation, here's a video of Tim's llvm dev talk, where he presents his work. https://www.youtube.com/watch?v=edvaLAL5RqU I'll also add some other docs to the dev guide and user docs in another PR. r? ghost Tracking: - rust-lang#124509 - rust-lang#135283
Rollup of 14 pull requests Successful merges: - rust-lang#137869 (Demote i686-pc-windows-gnu to Tier 2) - rust-lang#137880 (Autodiff batching) - rust-lang#138546 (Add integer to string formatting tests) - rust-lang#138947 (Refactor Apple version handling in the compiler) - rust-lang#138950 (replace extra_filename with strict version hash in metrics file names) - rust-lang#139213 (Run coretests and alloctests with cg_clif in CI) - rust-lang#139274 (Rustdoc: typecheck settings.js) - rust-lang#139295 (Remove creation of duplicate `AnonPipe`) - rust-lang#139298 (Allow for missing invisible close delim when reparsing an expression.) - rust-lang#139313 (Deduplicate some `rustc_middle` function bodies by calling the `rustc_type_ir` equivalent) - rust-lang#139317 (compiletest: Encapsulate all of the code that touches libtest) - rust-lang#139322 (Add helper function for checking LLD usage to `run-make-support`) - rust-lang#139335 (Pass correct param-env to `error_implies`) - rust-lang#139342 (Add a mailmap entry for myself) Failed merges: - rust-lang#138949 (Rename `is_like_osx` to `is_like_darwin`) r? `@ghost` `@rustbot` modify labels: rollup
Autodiff batching Enzyme supports batching, which is especially known from the ML side when training neural networks. There we would normally have a training loop, where in each iteration we would pass in some data (e.g. an image), and a target vector. Based on how close we are with our prediction we compute our loss, and then use backpropagation to compute the gradients and update our weights. That's quite inefficient, so what you normally do is passing in a batch of 8/16/.. images and targets, and compute the gradients for those all at once, allowing better optimizations. Enzyme supports batching in two ways, the first one (which I implemented here) just accepts a Batch size, and then each Dual/Duplicated argument has not one, but N shadow arguments. So instead of ```rs for i in 0..100 { df(x[i], y[i], 1234); } ``` You can now do ```rs for i in 0..100.step_by(4) { df(x[i+0],x[i+1],x[i+2],x[i+3], y[i+0], y[i+1], y[i+2], y[i+3], 1234); } ``` which will give the same results, but allows better compiler optimizations. See the testcase for details. There is a second variant, where we can mark certain arguments and instead of having to pass in N shadow arguments, Enzyme assumes that the argument is N times longer. I.e. instead of accepting 4 slices with 12 floats each, we would accept one slice with 48 floats. I'll implement this over the next days. I will also add more tests for both modes. For any one preferring some more interactive explanation, here's a video of Tim's llvm dev talk, where he presents his work. https://www.youtube.com/watch?v=edvaLAL5RqU I'll also add some other docs to the dev guide and user docs in another PR. r? ghost Tracking: - rust-lang#124509 - rust-lang#135283
2898b90
to
89d8948
Compare
Rollup of 11 pull requests Successful merges: - rust-lang#136457 (Expose algebraic floating point intrinsics) - rust-lang#137880 (Autodiff batching) - rust-lang#137897 (fix pthread-based tls on apple targets) - rust-lang#138024 (Allow optimizing out `panic_bounds_check` in Unicode checks.) - rust-lang#138546 (Add integer to string formatting tests) - rust-lang#138826 (StableMIR: Add `associated_items`.) - rust-lang#138950 (replace extra_filename with strict version hash in metrics file names) - rust-lang#139274 (Rustdoc: typecheck settings.js) - rust-lang#139285 (use lower case to match other error messages) - rust-lang#139341 (Apply `Recovery::Forbidden` when reparsing pasted macro fragments.) - rust-lang#139389 (make `Arguments::as_statically_known_str` doc(hidden)) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#137880 - EnzymeAD:autodiff-batching, r=oli-obk Autodiff batching Enzyme supports batching, which is especially known from the ML side when training neural networks. There we would normally have a training loop, where in each iteration we would pass in some data (e.g. an image), and a target vector. Based on how close we are with our prediction we compute our loss, and then use backpropagation to compute the gradients and update our weights. That's quite inefficient, so what you normally do is passing in a batch of 8/16/.. images and targets, and compute the gradients for those all at once, allowing better optimizations. Enzyme supports batching in two ways, the first one (which I implemented here) just accepts a Batch size, and then each Dual/Duplicated argument has not one, but N shadow arguments. So instead of ```rs for i in 0..100 { df(x[i], y[i], 1234); } ``` You can now do ```rs for i in 0..100.step_by(4) { df(x[i+0],x[i+1],x[i+2],x[i+3], y[i+0], y[i+1], y[i+2], y[i+3], 1234); } ``` which will give the same results, but allows better compiler optimizations. See the testcase for details. There is a second variant, where we can mark certain arguments and instead of having to pass in N shadow arguments, Enzyme assumes that the argument is N times longer. I.e. instead of accepting 4 slices with 12 floats each, we would accept one slice with 48 floats. I'll implement this over the next days. I will also add more tests for both modes. For any one preferring some more interactive explanation, here's a video of Tim's llvm dev talk, where he presents his work. https://www.youtube.com/watch?v=edvaLAL5RqU I'll also add some other docs to the dev guide and user docs in another PR. r? ghost Tracking: - rust-lang#124509 - rust-lang#135283
Enzyme supports batching, which is especially known from the ML side when training neural networks.
There we would normally have a training loop, where in each iteration we would pass in some data (e.g. an image), and a target vector. Based on how close we are with our prediction we compute our loss, and then use backpropagation to compute the gradients and update our weights.
That's quite inefficient, so what you normally do is passing in a batch of 8/16/.. images and targets, and compute the gradients for those all at once, allowing better optimizations.
Enzyme supports batching in two ways, the first one (which I implemented here) just accepts a Batch size,
and then each Dual/Duplicated argument has not one, but N shadow arguments. So instead of
You can now do
which will give the same results, but allows better compiler optimizations. See the testcase for details.
There is a second variant, where we can mark certain arguments and instead of having to pass in N shadow arguments, Enzyme assumes that the argument is N times longer. I.e. instead of accepting 4 slices with 12 floats each, we would accept one slice with 48 floats. I'll implement this over the next days.
I will also add more tests for both modes.
For any one preferring some more interactive explanation, here's a video of Tim's llvm dev talk, where he presents his work. https://www.youtube.com/watch?v=edvaLAL5RqU
I'll also add some other docs to the dev guide and user docs in another PR.
r? ghost
Tracking: