Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow arena_matrix to use move semantics #2928

Merged
merged 73 commits into from
May 31, 2024

Conversation

SteveBronder
Copy link
Collaborator

@SteveBronder SteveBronder commented Aug 2, 2023

Summary

This PR does two things

  1. Allows arena_matrix to use move semantics
  • If an arena_matrix' s constructor is passed an rvalue, instead of allocating memory onto the stack allocator and then doing a copy, arena_matrix now pushes that rvalue into a chainable_ptr that sits in the chainable stacks var_alloc_stack_ and is deleted after stan::math::recover_memory() is called. This saves us the allocation and the copy, instead we just delay paying the deletion cost which we would have paid anyway when the temporary was deleted
  1. After talking with @bob-carpenter I think I'd like to remove the hard copies we do for reverse mode functions using Eigen types. Right now our pattern is to make an arena matrix for the result, pass that to the reverse mode callback, and then when we return from the function we make a hard copy into an actual Eigen::Matrix type. The hard copy into the Eigen matrix is to make sure assignments done later to the matrix (i.e. X(1, 1) = a) do not overwrite the memory necessary for the reverse pass.

Instead, I suggest we just do what Eigen does and include docs that tell people that auto is unsafe to use recklessly with the stan math library. Instead users who want to do assignment to elements of the matrices should have the assigned object's type be the appropriate Eigen matrix type.

For example, in the below code, using auto for the assignment from the function and then doing an assignment to an element of the matrix would mess up the reverse mode pass of the function the matrix was generated in. However, assigning the result of the function to an actual matrix type and then doing the element assignment is safe.

Eigen::Matrix<var, -1, 1> y;
Eigen::Matrix<var, -1, -1> X;
// Bad!! Will change memory used by reverse pass callback within multiply!
auto mu = multiply(X, y);
mu(4) = 1.0;
// Good! Will not change memory used by reverse pass callback within multiply
Eigen::Matrix<var, -1, 1> mu_good = multiply(X, y);
mu_good(4) = 1.0;

This is a breaking change so I think we should do a major version bump. In the tests below I've found anywhere from a 5% to 15% speed increase from using this.

The graph below plots the relative % improvement of using move semantics relative to the current develop. We are testing the expression below, which just does a bunch of multiplies, adds, and then a sum.

var lp = sum(stan::math::add(stan::math::multiply(stan::math::multiply(x, y), std::move(y)), std::move(x)));

The variables x and y in the above are matrices of the same size. Only the forward pass of reverse mode autodiff is used in the benchmark since this should not have any effect on the reverse passes performance.

compare_plot

We can see that for small sizes this is great. As we get to larger sizes it's still good, but at that point a lot of the actual computation and fetching data for the operations is taking up enough time that the memory allocation is not a huge concern.

You can run this performance benchmark yourself using the following

git clone https://github.com/SteveBronder/stan-perf
cd stan-perf
git checkout tmp-values
cmake -S . -B "build" -DCMAKE_BUILD_TYPE=Release
cd ./build
make -j4 move_ex move_ex_orig
# WARNING: Will use a lot of memory for the large problems and take a while
taskset -c 1 ./benchmarks/move_stuff/move_ex --benchmark_out_format=csv --benchmark_out=./benchmarks/move_stuff/move_ex.csv --benchmark_report_aggregates_only=false --benchmark_repetitions=30 --benchmark_report_aggregates_only=false --benchmark_display_aggregates_only=true --benchmark_enable_random_interleaving=true
taskset -c 1 ./benchmarks/move_stuff/move_ex_orig --benchmark_out_format=csv --benchmark_out=./benchmarks/move_stuff/move_ex_orig.csv  --benchmark_report_aggregates_only=false --benchmark_display_aggregates_only=true --benchmark_repetitions=30 --benchmark_enable_random_interleaving=true
Rscript ./plots/move_stuff/plots.R

Tests

Tests are added to the arena_matrix_test file for checking moves work. This can be run with

python ./runTests.py ./test/unit/math/rev/core/arena_matrix_test

Side Effects

Yes

In order to utilize this we need to use perfect forwarding in our reverse mode functions. See the new multiply and add signatures in this PR for an example. I think I could do this for most of our functions that can use it with a day or two of work, though now our functions will be more "modern" but also harder to understand.

Release notes

Allows arena_matrix to use move semantics along with the stack allocator.

Checklist

  • Math issue #(issue number)

  • Copyright holder: Simons Foundation

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

SteveBronder and others added 22 commits August 1, 2023 18:11
…e-move-semantics' into feature/reverse-mode-move-semantics
…e-move-semantics' into feature/reverse-mode-move-semantics
…e-move-semantics' into feature/reverse-mode-move-semantics
…-dev/math into feature/reverse-mode-move-semantics
@SteveBronder
Copy link
Collaborator Author

@t4c1 sorry but would you mind taking a quick glance at this?

@andrjohns would you feel comfortable reviewing this?

return normal_lpdf<propto, T_y, T_loc, T_scale>(y, mu, sigma);
inline return_type_t<T_y, T_loc, T_scale> normal_log(T_y&& y, T_loc&& mu,
T_scale&& sigma) {
return normal_lpdf<propto>(y, mu, sigma);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming you want to use perfect forwarding here. That would be:

Suggested change
return normal_lpdf<propto>(y, mu, sigma);
return normal_lpdf<propto>(std::forward<T_y>(y), std::forward<T_loc>(mu), std::forward<T_scale>(sigma));

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thanks! The normal distribution changes are mostly just to show what the new signatures would look like. But I'd like to make them so that we can forward them into operands_and_partials.

@SteveBronder
Copy link
Collaborator Author

@andrjohns would you mind taking a look at this?

@syclik
Copy link
Member

syclik commented Sep 6, 2023

@SteveBronder, this is rad!

I'm concerned with the impact it has on our use of auto through the Math codebase. Will we need to go through the current uses of auto everywhere to be safe? Or does this really only affect downstream usage.

Just to be absolutely clear, this is a "concern" and it is not blocking in my mind. It seems like we should do this; I just want to be cognizant of the effort to get this properly done before embarking on it.

@syclik
Copy link
Member

syclik commented Apr 22, 2024

@SteveBronder, I'm going to edit the PR to have the base branch be 5.0-breaking-changes instead of develop.

@syclik syclik changed the base branch from develop to 5.0-breaking-changes April 22, 2024 13:12
@syclik syclik added the breaking change Requires a breaking change to the Math library label Apr 22, 2024
Add support for Windows ARM64
@SteveBronder
Copy link
Collaborator Author

Thanks! I'll update the merge conflicts tmrw

@SteveBronder
Copy link
Collaborator Author

@syclik @andrjohns fixed the merge conflict, all good for me to merge to 5.0?

@andrjohns
Copy link
Collaborator

@syclik @andrjohns fixed the merge conflict, all good for me to merge to 5.0?

Yep, all good from me

@syclik
Copy link
Member

syclik commented Apr 26, 2024 via email

@@ -179,6 +177,93 @@ We can see in the above that the standard style of a move (the constructor takin
But in Stan, particularly for reverse mode, we need to keep memory around even if it's only temporary for when we call the gradient calculations in the reverse pass.
And since memory for reverse mode is stored in our arena allocator no copying happens in the first place.

Functions for Stan math's reverse mode autodiff should use [_perfect forwarding_](https://drewcampbell92.medium.com/understanding-move-semantics-and-perfect-forwarding-part-3-65575d523ff8) arguments. This arguments are a template parameter with `&&` next to them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar. Not sure exactly what should be used, but "This arguments" is off.

Maybe "Template arguments that allow perfect forwarding"?

}
```

A perfect forwarding argument of a function accepts any reference type as it's input argument.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "it's" --> "its"


The `std::forward<T>` in the in the code above tells the compiler that if `T` is deduced to be an rvalue reference (such as `Eigen::MatrixXd&&`), then it should be moved to `my_other_function`, where there it can possibly use another objects move constructor to reuse memory.

In Stan, perfect forwarding is used in reverse mode functions which can accept an Eigen matrix type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest putting this paragraph above L190. That way the words describe the perfect forwarding and then the expansion of what it does is written out after it.

As written, this sentence doesn't quite make sense here: the code above doesn't have std::forward.


The use of auto with the Stan Math library should be used with care, like in [Eigen](https://eigen.tuxfamily.org/dox/TopicPitfalls.html).
Along with the cautions mentioned in the Eigen docs, there are also memory considerations when using reverse mode automatic differentiation.
When returning from a function in the Stan math library with an Eigen matrix output with a scalar `var` type, the actual returned type will often be an `arena_matrix<Eigen::Matrix<...>>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize "Stan Math"

Along with the cautions mentioned in the Eigen docs, there are also memory considerations when using reverse mode automatic differentiation.
When returning from a function in the Stan math library with an Eigen matrix output with a scalar `var` type, the actual returned type will often be an `arena_matrix<Eigen::Matrix<...>>`.
The `arena_matrix` class is an Eigen matrix where the underlying array of memory is located in Stan's memory arena.
The `arena_matrix` that is returned by Stan functions is normally the same one resting in the callback used to calculate gradients in the reverse pass.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change "Stan functions" to "Math functions"

The `arena_matrix` that is returned by Stan functions is normally the same one resting in the callback used to calculate gradients in the reverse pass.
Directly changing the elements of this matrix would also change the memory the reverse pass callback sees which would result in incorrect calculations.

The simple solution to this is that when you use a math library function that returns a matrix and then want to assign to any of the individual elements of the matrix, assign to an actual Eigen matrix type instead of using auto.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"math library" -> "Math library"

/*! @tparam T the type to check */
template <typename T>
using require_not_arena_matrix_t
= require_t<bool_constant<!is_arena_matrix<std::decay_t<T>>::value>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this defined with bool_constant?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require_t wants back a struct with a bool value member so we wrap the !struct<T>::value in bool constant for that.

We should really just make a negate<> type trait so the above can become

require_t<negate<is_arena_matrix<std::decay_t<T>>>>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'd like to do that in a separate PR

@@ -49,25 +49,91 @@ class arena_matrix<MatrixType, require_eigen_dense_base_t<MatrixType>>
ChainableStack::instance_->memalloc_.alloc_array<Scalar>(size),
size) {}

private:
template <typename T>
constexpr auto get_rows(const T& x) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two functions are a bit hard to understand. Could you add doxygen doc to explain what the intent is here? It's not coming through directly through code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two function definitions are identical. Is that intentional?

Copy link
Collaborator Author

@SteveBronder SteveBronder Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two function definitions are identical. Is that intentional?

One gets the rows in case the conditional is true and the other gets the cols if the conditional is true. And vice versa for false

@@ -280,7 +330,7 @@ class arena_matrix<MatrixType, require_eigen_sparse_base_t<MatrixType>>
template <typename Expr,
require_not_same_t<Expr, arena_matrix<MatrixType>>* = nullptr>
arena_matrix& operator=(Expr&& expr) {
*this = arena_matrix(std::forward<Expr>(expr));
new (this) arena_matrix(std::forward<Expr>(expr));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind walking me through this? I think I know what it's doing, but I want to double check.

(Nothing too long; even just a couple of sentences.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! This is a good question because it's kind of complicated why we use placement new here.

This signature is for assigning Eigen types that are not arena matrices. This is using placement new. The placement new is new but instead of calling malloc to assign new memory it is given a pointer as an argument to assign the memory for the newly constructed object. When we use an arena matrix we are doing something like

arena_matrix<Eigen::MatrixXd> my_arena = x;

So on the lhs the arean matrix is constructed using the constructor with no arguments arena_matrix(). Then we call the assignment operator here. Since this is already constructed, inherits from Eigen::Map and an Eigen::Map cannot replace the memory it's data points to we use placement new here for the assignment.

Eigen explicitly says to do this in it's documentation for map

Tip: to change the array of data mapped by a Map object, you can use the C++ placement new syntax:

Example:

int data[] = {1,2,3,4,5,6,7,8,9};
Map<RowVectorXi> v(data,4);
cout << "The mapped vector v is: " << v << "\n";
new (&v) Map<RowVectorXi>(data+4,5);
cout << "Now v is: " << v << "\n";

https://eigen.tuxfamily.org/dox/classEigen_1_1Map.html

@@ -129,49 +129,49 @@ inline var multiply(const T1& A, const T2& B) {
* @tparam T1 type of the scalar
* @tparam T2 type of the matrix or expression
*
* @param A scalar
* @param a scalar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks!

@syclik
Copy link
Member

syclik commented Apr 26, 2024

Minor comments on the doc.

I'm more curious about some of the code changes.

Is there any place where we might have made a mistake by using auto internally? Do we need to sweep through the code base searching for that based on the changes here, or are we ok?

@SteveBronder
Copy link
Collaborator Author

Is there any place where we might have made a mistake by using auto internally? Do we need to sweep through the code base searching for that based on the changes here, or are we ok?

I ctrl+f'd for auto in the library and it all seemed fine. Since we are already careful with using it with Eigen types we have been pretty careful on its usage

@SteveBronder
Copy link
Collaborator Author

@syclik and @andrjohns anymore Qs? Else I'm going to merge this to the 5.0 branch

@syclik
Copy link
Member

syclik commented May 31, 2024

No, no more questions!! Please merge when you get a chance!!!

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Requires a breaking change to the Math library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants