Set change_t_via_interpolation! to use the init mode from the callback #1102

BenChung · 2024-10-31T22:49:46Z

Checklist

Appropriate tests were added
Any code changes were done in a way that does not break public API
All documentation related to code changes were updated
The new code follows the
contributor guidelines, in particular the SciML Style Guide and
COLPRAC.
Any new documentation only uses public API

Additional context

isaacsas · 2024-11-13T16:17:29Z

This PR seems to have directly broken tests for lots of packages? (Just look at the downstream failures and where they are crashing.) Seems like this shouldn't have been put into a non-breaking DiffEqBase release...

ChrisRackauckas · 2024-11-13T16:19:35Z

There is no other way to pull off a function argument change though, so...

But that's not the point. The actual issue is a bit illusive but being looked into.

isaacsas · 2024-11-13T16:26:36Z

There is no other way to pull off a function argument change though, so...

Make a breaking release?

ChrisRackauckas · 2024-11-13T16:28:06Z

Every week? Can you spend 12 hours each week bumping versions for it?

isaacsas · 2024-11-13T16:28:45Z

I'm just trying to figure out how we can get a bit of stability back. Things have been really rough for an extended period of time at this point.

ChrisRackauckas · 2024-11-13T16:31:53Z

Non-public internal APIs are just an issue for updating. We already have JuliaLang/julia#55516 as an open request about it. We have to acknowledge that there is a tooling issue and not come up with a solution that is just stopping anything from ever changing: we'll get the tools but have to do the best we can in the meantime.

isaacsas · 2024-11-13T16:40:23Z

Is there some kind of offline testing approach that could be developed as a bit of a hedge while waiting on that (i.e. PI author runs tests with master for impacted packages locally and confirms they pass before making releases)? Could we add a CI script to also test packages against the master of their dependencies to look at in cases like this (so we could then wait to make releases until such tests are running successfully)?

isaacsas · 2024-11-13T16:42:07Z

There just seems to be a lot of PR merged, release rapidly made, and then problems discovered going on lately. There must be something we can do in the overall dev or CI process to help prevent or catch such situations (while waiting on better tooling).

ChrisRackauckas · 2024-11-13T16:53:01Z

Is there some kind of offline testing approach that could be developed as a bit of a hedge while waiting on that (i.e. PI author runs tests with master for impacted packages locally and confirms they pass before making releases)?

This was offline tested.

Could we add a CI script to also test packages against the master of their dependencies to look at in cases like this (so we could then wait to make releases until such tests are running successfully)?

It would need to grab specific PR branches.

There just seems to be a lot of PR merged, release rapidly made, and then problems discovered going on lately. There must be something we can do in the overall dev or CI process to help prevent or catch such situations (while waiting on better tooling).

For internal APIs its not really all that possible other than the test locally and make sure you're right. Sometimes that ends up being wrong simply because of the number of versions going around. But I don't think this is really talking about the core problem anyways, the core problem is that the package split PRs are just a ton of work and are prone to have minor interface issues that take time to remedy since they can be hard to test, since any combined test will not find it.

ChrisRackauckas · 2024-11-13T19:56:08Z

@BenChung my understanding is that no one can reproduce this locally?

BenChung · 2024-11-14T01:06:45Z

At least I haven't been able to. I've tried 1.10 and 1.11 and matching all the versions exactly and it doesn't behave the same. I'm going to try rerunning with Act to see if I can make it happen. So far, everything local works great and 1.10.6 runners on CI also seem to work, while 1.11 runners don't. I have no idea why at this point.

Is there some kind of offline testing approach that could be developed as a bit of a hedge while waiting on that (i.e. PI author runs tests with master for impacted packages locally and confirms they pass before making releases)?

I tested all of this locally and added tests downstream of it that exercise this code. The implementations even got merged 1-2 releases in advance of this usage of the interface.

I think that the best answer here is to make the integration tests on PRs smarter; right now, it's difficult to interpret their outputs because there's almost always at least a few broken tests which is understandable but makes it hard to tell if any new tests have been broken. A mechanism to diff the breakage would be very useful - or just marking the broken ones as such - would probably help a lot.

BenChung · 2024-11-14T01:21:11Z

Okay, the problem for Catalyst is that the compat bound for OrdinaryDiffEqCore excludes the implementation of this call (which made it in in 1.10.0, whereas the compat in Catalyst is set at 1.9.0). I think that the best fix is to bump the compat for DiffEqBase on OrdinaryDiffEqCore and that should force the versions to get resolved.

isaacsas · 2024-11-14T01:59:39Z

@BenChung just to add, most of my comments above are not in reference to this specific issue, but more a general observation about stability issues we’ve been having going back months or more (maybe even as far back as last Fall).

Anything that could be done in CI / GitHub actions to help catch such issues more often would be much appreciated. But I think they are likely never going to be eliminated / made infrequent as long as the current approach of rapidly making releases from master right after merging PRs is maintained. Getting a higher level of stability really seems like it will ultimately require a more conservative release process, perhaps more in line with how big libraries like SciPy, PyTorch, or Jax do things (or even Julia itself).

edit: i.e. where there are release branches with feature freeze points, followed by extended testing periods before new releases are made.

ChrisRackauckas · 2024-11-14T06:30:52Z

SciPy, PyTorch, or Jax do things (or even Julia itself).

Jax has had more than 1 breaking release per month for the last few months? v0.4.33, v0.4.34, and v0.4.35 all broke the downstream packages in ways that required updates in order to work. Are you sure you're characterizing this promise land correctly?

Anything that could be done in CI / GitHub actions to help catch such issues more often would be much appreciated. But I think they are likely never going to be eliminated / made infrequent as long as the current approach of rapidly making releases from master right after merging PRs is maintained.

The core issue is once again JuliaLang/julia#55516. You cannot solve this with CI or Github Actions, it is strictly not solvable through that unless you're also creating auto-registration systems to effectively mimic the request of that PR.

To be clear the issue is the following:

Internal core shared APIs in deeper stack packages can make changes to functions which are non-breaking changes (i.e. things not exposed to the public) which require changes from our own packages. This is by definition non-breaking from a semvar definition, but something that will cause friction between two packages.
The package split PRs mean that we now have to track approximately 60 more packages through any updates.
If any of the updates are not done in time, then you can have a bumpy update and notice the friction.

This is always true and is simply a downside to having the package splits. (3) has always been possible, but we've normally been able to get downstream PRs in fast enough that you generally couldn't notice. Now that we have potentially 60 packages to update when solvers need updating (due to solver splits), it can be very unmanagable to hit all of the required releases, and what you're seeing is missed releases. Extra versioning restrictions are then the exact opposite of helpful, since the entire issue is figuring out how to get everything to run together on the latest code.

The root of the issue of course is (1). It's perfectly fine to change internal APIs, but that means that those pieces need to be kept in more lock-step with each other in order to function. DiffEqBase and OrdinaryDiffEq are fairly intertwined and there are definitely going to be moments where this will happen, unless you start to define all their internals as public API, which causes even more issues with trying to get everything to bump the release and is even more prone to subtle bugs.

The problem is that (1) is an issue because packages are not split necessarily by functionality but due to load times. DiffEqBase is kept separate so that dependencies can be kept small, not necessarily because it makes sense for it to be a different repository (though there are some shared elements with Sundials/ODEInterface etc. so it would need a bit of a refactor at this point). When you're splitting packages because of load times and not because they are actually different packages, you end up with links in the internals that make it so you sometimes can only have lock-step compatibility.

So it's kind of a pick your poison: do you want load times dramatically lowered by a few orders of magnitude, or do you want to version the different pieces together? The split modules is required in order to support the coming Pkg features in the future, and we know the timeline on that because it's in the roadmap, and the lowered load times was the most requested feature, so that determines the direction. But it's going to be painful until Pkg features lands. I have some mitigation tools like bump all packages SciML/OrdinaryDiffEq.jl#2469 but ultimately the problem is that separate loading and separate packages are orthogonal concerns which currently have to be coupled.

Set change_t_via_interpolation! to use the init mode from the callback

b8f1fd4

BenChung mentioned this pull request Oct 31, 2024

Allow configuration of the initialization done after interpolation while applying an affect SciML/OrdinaryDiffEq.jl#2507

Merged

5 tasks

Format

ed18a42

ChrisRackauckas merged commit f6aab6d into SciML:master Nov 12, 2024
19 of 44 checks passed

BenChung mentioned this pull request Nov 12, 2024

Fix callback reinit (test that #2994 is fixed) SciML/ModelingToolkit.jl#3197

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set change_t_via_interpolation! to use the init mode from the callback #1102

Set change_t_via_interpolation! to use the init mode from the callback #1102

BenChung commented Oct 31, 2024

isaacsas commented Nov 13, 2024

ChrisRackauckas commented Nov 13, 2024

isaacsas commented Nov 13, 2024

ChrisRackauckas commented Nov 13, 2024

isaacsas commented Nov 13, 2024

ChrisRackauckas commented Nov 13, 2024

isaacsas commented Nov 13, 2024

isaacsas commented Nov 13, 2024 •

edited

Loading

ChrisRackauckas commented Nov 13, 2024

ChrisRackauckas commented Nov 13, 2024

BenChung commented Nov 14, 2024

BenChung commented Nov 14, 2024

isaacsas commented Nov 14, 2024 •

edited

Loading

ChrisRackauckas commented Nov 14, 2024

Set change_t_via_interpolation! to use the init mode from the callback #1102

Set change_t_via_interpolation! to use the init mode from the callback #1102

Conversation

BenChung commented Oct 31, 2024

Checklist

Additional context

isaacsas commented Nov 13, 2024

ChrisRackauckas commented Nov 13, 2024

isaacsas commented Nov 13, 2024

ChrisRackauckas commented Nov 13, 2024

isaacsas commented Nov 13, 2024

ChrisRackauckas commented Nov 13, 2024

isaacsas commented Nov 13, 2024

isaacsas commented Nov 13, 2024 • edited Loading

ChrisRackauckas commented Nov 13, 2024

ChrisRackauckas commented Nov 13, 2024

BenChung commented Nov 14, 2024

BenChung commented Nov 14, 2024

isaacsas commented Nov 14, 2024 • edited Loading

ChrisRackauckas commented Nov 14, 2024

isaacsas commented Nov 13, 2024 •

edited

Loading

isaacsas commented Nov 14, 2024 •

edited

Loading