-
-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ad embedded expression tests to expect_ad testing framework #2837
base: develop
Are you sure you want to change the base?
Conversation
…ev/math into feature/inbedded-expression-tests
…ndled by the ad testing framework
…rmal version is executed
…rmal version is executed
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
a concern: this seems to nearly double the time it takes to run the mix unit tests: A random run from #2983 took 1h31m to run the mix tests: https://jenkins.flatironinstitute.org/blue/organizations/jenkins/Stan%2FMath/detail/PR-2983/4/pipeline/209 That is already the longest part of that parallel stage, so the added time does directly impact the overall runtime of the pipeline |
I agree it takes more time, but so many countless times we've had users add functions here in Stan math that failed upstream in stanc3 that I think it's worth having these. Also as you can see in the PR there we places in Stan math that actually had small bugs we missed! |
I think there are alternative ways of avoiding the issue only cropping up in stanc3. The simplest I can think of is just having the expression tests (which only take ~25 minutes currently) pull not only from the If the automated tests here are strictly better, we could also lessen the time/resource impact by removing the previous stage for expression testing? |
I don't like this. Our PR process is already pretty bespoke and I think this can suffer a lot from user to keyboard error. The full test suite takes like 12 hours to run so honestly an extra 2 hours is actually kind of fine imo. Like it doesn't change the time I will wait to come back and look if my PR is done. If we wanted to we could have flags that look at what folders have code changes and only run the expression tests with
Yes! this covers a lot more of the library, though it does not cover the distributions. But I can modify the distribution code to use the expression test framework as well. |
Yeah, I just wish if we were changing this time in increments of 2 hours it would be going the other way. I think a shorter feedback loop would be dramatically beneficial for contributors, even if it was still the kind of thing where you only got 2 tries in a work day (versus the current ~1 try). If we're willing to just throw up our hands and say that will never happen, then yes, two hours here or there doesn't matter that much until we start saturating the build machines
We could also have a flag (kind of like the current |
I think if we want this as a goal what we could do is have a "quick mode" that is the default when a PR is opened that reads the diff, looks at the name of the files changed, then greps and only runs the tests with the same name in the test folder. And turn off all upstream testing. Then after an initial review we fall over to "test everything mode" which runs overnight. Or alt we could have the tests and headers only include strictly what they need. Then we can kind of do the same thing as above but with a more broad scope of what tests need to be run. We used to have a setup like that but header errors were extremely annoying so we opted for the "include everything" scheme.
Is there a way to tell github / jenkins "after this PR is approved run the tests one more time with this new flag and do not allow a merge until all tests pass"? That would be a pretty easy / automated way to turn this on and off |
That is already in place, sort of. It just continues on to run everything else after those initial tests. It does make some subset of possible errors fail-fast, which is nice. Being much stricter about including what we use would let us run exactly the correct subset of tests, which would be nice. @syclik and I have talked about this a couple times, especially for the distribution tests which also take a huge chunk of time to run.
I know a lot of projects use a merge bot for things like this. When a approving review is submitted, rather than click merge themselves they leave a comment like |
Just catching up. Given what I've read, I think it's sensible to do two things:
How do we get this moving forward? The loss of two hours of wait time to prevent upstream failures is worth it, I think. |
I just wanted to make sure it was clear: these upstream failures are in PRs, not in anything user-facing. So it prevents some developer frustration and churn here in that situation. The same thing could be prevented if the user was told anywhere how to preemptively run the tests, which is what the discussion in #2736 was about |
Needing to compile all of The |
It would be possible, but I don't think that is an easy change in the current test infrastructure, since Care would need to be taken to make sure the output was still readable from this |
I was thinking of the steps:
That way the compilation is still parallelised, but each call is only building and immediately running one test. Whether or not this would be particularly 'clean', I'm not sure |
I think something of that flavor could work as long as anything shared between tests (like TBB or Sundials) was already built, otherwise you'd get filesystem races. Worth exploring things like that. |
…s' into feature/inbedded-expression-tests
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
@WardBrian do you have any idea what is going on with the benchmark tests failing? I'm not seeing an error in the logs? |
It looks like the failure is coming from
Seems weird? |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
@WardBrian I think this is good to merge after the release is finished |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
Summary
This adds a simple version of the expression testing framework directly into the call to
expect_ad()
. This helps fix #2736 by making sure any function that goes through the expression tests will also go check that it is safe to pass Eigen expressions to.The files for all of this are in
test/unit/math/expr_tests.hpp
with the main function beingcheck_expr_test(...)
. This function takes in any set of arguments and for each type that is an Eigen matrix or vector it checks if the function is 'safe' to pass expressions to. The word 'safe' in this context means that, if an unary expression was passed as an argument to the function, is it only called as many times as there are individual elements in the underlying Eigen object? The original Eigen object is wrapped in an unaryFunction that calls thecounterOp
functor from the original expression testing framework.Tests
Tests can be run with
The tests include examples for one, two, and three inputs to functions. They check that for each combination of inputs we fail the correct number of Eigen objects passed to each function. Each set of inputs also has an associated test that just runs a 'good' funtion which does not fail the tests.
Side Effects
This will increase compile times a bit
Release notes
Adds a simple expression tests framework to the autodiff test framework
Checklist
Math issue Document Stan Math expressions requirements #2736
Copyright holder: Steve Bronder
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit
)make test-headers
)make test-math-dependencies
)make doxygen
)make cpplint
)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested