Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Children in spawned gltf scenes are incorrectly ordered #17720

Closed
rparrett opened this issue Feb 7, 2025 · 3 comments · Fixed by #17858
Closed

Children in spawned gltf scenes are incorrectly ordered #17720

rparrett opened this issue Feb 7, 2025 · 3 comments · Fixed by #17858
Labels
A-ECS Entities, components, systems, and events A-glTF Related to the glTF 3D scene/model format A-Scenes Serialized ECS data stored on the disk C-Bug An unexpected or incorrect behavior D-Complex Quite challenging from either a design or technical perspective. Ask for help! S-Needs-Design This issue requires design work to think about how it would best be accomplished
Milestone

Comments

@rparrett
Copy link
Contributor

rparrett commented Feb 7, 2025

Bevy version

main, bisected to #17687

Relevant system information

SystemInfo { os: "macOS 15.3 Sequoia", kernel: "24.3.0", cpu: "Apple M4 Max", core_count: "16", memory: "64.0 GiB" }
AdapterInfo { name: "Apple M4 Max", vendor: 0, device: 0, device_type: IntegratedGpu, driver: "", driver_info: "", backend: Metal }

Also failing on linux/mac in the example runner.

What you did

cargo run --example gltf_skinned_mesh

What went wrong

thread 'Compute Task Pool (1)' panicked at examples/animation/gltf_skinned_mesh.rs:61:68:
called `Result::unwrap()` on an `Err` value: QueryDoesNotMatch(12v1 with components Transform, GlobalTransform, Visibility, InheritedVisibility, ViewVisibility, VisibilityClass, ChildOf, PreviousGlobalTransform, Mesh3d, SkinnedMesh, Name, MeshMaterial3d<StandardMaterial>, Aabb)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Encountered a panic in system `gltf_skinned_mesh::joint_animation`!
Encountered a panic in system `bevy_app::main_schedule::Main::run_main`!
@rparrett rparrett added C-Bug An unexpected or incorrect behavior S-Needs-Triage This issue needs to be labelled labels Feb 7, 2025
@rparrett
Copy link
Contributor Author

rparrett commented Feb 9, 2025

It seems that the order of the children within the spawned hierarchy has changed.

// Bevy 0.15.2
8v1
 9v1
  GltfNode0
   Mesh
   GltfNode1
    GltfNode2

// Bevy main
8v1
 9v1
  GltfNode0
   GltfNode1
    GltfNode2
   Mesh

So the following from gltf_skinned_mesh is no longer true:

// First joint is the second child of the mesh node.

So the following patch fixes the example:

         // First joint is the second child of the mesh node.
-        let first_joint_entity = mesh_node_parent[1];
+        let first_joint_entity = mesh_node_parent[0];
         // Get `Children` in the first joint.
         let first_joint_children = parents.get(first_joint_entity).unwrap();

However, if I'm understanding the situation right, this may be pointing to some sort of bug in scene spawning or entity cloning that we want to fix.

@rparrett
Copy link
Contributor Author

rparrett commented Feb 9, 2025

cc @cart

@mockersf mockersf added this to the 0.16 milestone Feb 9, 2025
@cart
Copy link
Member

cart commented Feb 11, 2025

Nice catch!

This became visible in #17687 because that fixed the "duplicate entities on spawn" bug introduced in #17398. But the reordering was actually introduced in #17398.

The problem is that we're spawning a flat (arbitrarily ordered) list of entities in the SceneSpawner (ex: iter archetypes, spawn each entity in the archetype). We ignore the Children component, meaning we are spawning entities, some of which have ChildOf, and letting the arbitrary spawn order determine the final order of the Children. Definitely not shippable behavior.

The original RelationshipTarget is the only thing holding the order information. We cannot discard it.

Some options:

  1. Spawn hierarchically, which would ensure the order is preserved (ex: spawn starting from a list of roots, and then trigger spawns for all relationshiptarget components with LINKED_SPAWN). This would largely fix the problem. But for any non-LINKED_SPAWN relationships, their order would still be lost (and therefore this isn't a compelling option). And spawning in archetype order also has better memory access patterns in theory (although its hard to say if we're actually getting benefits from that cache friendliness here).
  2. We could do a final "fixup pass" where we replace all relationship target entities with a mapped version of their original collection. This has the downside of doing all of the "setup code" with the wrong order, then doing additional work to fix it up retroactively. This overhead puts it in the "last-resort-hack" category for me.
  3. We could spawn the mapped version of the collection, and then somehow suppress the Relationship (ex: ChildOf) insert hook logic to avoid inserting the out-of-order duplicates. This could be accomplished efficiently with a flag on World or something, which we set when spawning / cloning and reset when finished.
  4. We could "reverse" the process: spawn a mapped version of the RelationshipTarget collection, fully ignore the Relationship component (to avoid inserting duplicates), and then drive the inserts of the Relationship component via the RelationshipTarget collection. This has the downside of forcing archetype moves unnecessarily (spawning without ChildOf and then inserting later is an archetype move). The approach on main does not (theoretically) suffer from unnecessary archetype moves, as we can spawn the empty Children (preallocated with the space necessary for each child) and the children are each spawned directly with ChildOf. In practice main still has archetype moves because we're spawning each component one-by-one, but that will change once we switch to using the EntityCloner to spawn.

I was leaning toward (3), as it seemed like our "least bad" option. It preserves order across all relationship types, doesn't introduce overhead (just one more branch ... and in fact it is actually faster because we're skipping the insert code), and it allows us to spawn/clone in arbitrary orders (including cache-efficient orders like iterating archetypes or tables). We can trust that the contents of the RelationshipTarget collections are valid, as they were generated via the normal hooks flows.

Sadly it has a fatal flaw: cloning can trigger hooks/observers, which can trigger arbitrary spawns. In those cases, with a global flag the triggered spawns wouldn't properly maintain their relationships. This means we need this configuration to be properly scoped to the call (ex: add a new set of spawn / insert methods that pass the relevant information to the hook so it can skip only where relevant).

So thats what I'm leaning toward now. If anyone has other ideas let me know.

@alice-i-cecile alice-i-cecile added A-ECS Entities, components, systems, and events A-Scenes Serialized ECS data stored on the disk A-glTF Related to the glTF 3D scene/model format D-Complex Quite challenging from either a design or technical perspective. Ask for help! S-Needs-Design This issue requires design work to think about how it would best be accomplished and removed S-Needs-Triage This issue needs to be labelled labels Feb 11, 2025
@rparrett rparrett changed the title gltf_skinned_mesh example panics Children in spawned gltf scenes are incorrectly ordered Feb 11, 2025
github-merge-queue bot pushed a commit that referenced this issue Feb 11, 2025
# Objective

Fix gltf validation errors in `Fox.glb`.

Inspired by #8099, but that issue doesn't appear to describe a real bug
to fix, as far as I can tell.

## Solution

Use the latest version of the Fox from
[glTF-Sample-Assets](https://github.com/KhronosGroup/glTF-Sample-Assets/blob/main/Models/Fox/glTF-Binary/Fox.glb).

## Testing

Dropped both versions in https://github.khronos.org/glTF-Validator/

`cargo run --example animated_mesh` seems to still look fine.

Before:

```
The asset contains errors.
"numErrors": 126,
"numWarnings": 4184,
```

After:

```
The asset is valid.
"numErrors": 0,
"numWarnings": 0,
```

## Discussion

The 3d testbed was panicking with
```
thread 'main' panicked at examples/testbed/3d.rs:288:60:
called `Result::unwrap()` on an `Err` value: QueryDoesNotMatch(35v1 with components Transform, GlobalTransform, Visibility, InheritedVisibility, ViewVisibility, ChildOf, Children, Name)
```
Which is bizarre. I think this might be related to #17720, or maybe the
structure of the gltf changed.

I fixed it by using updating the testbed to use a more robust method of
finding the correct entity as is done in `animated_mesh`.
@cart cart moved this to Respond (With Priority) in @cart's attention Feb 13, 2025
github-merge-queue bot pushed a commit that referenced this issue Mar 5, 2025
)

Fixes #17720

## Objective

Spawning RelationshipTargets from scenes currently fails to preserve
RelationshipTarget ordering (ex: `Children` has an arbitrary order).
This is because it uses the normal hook flow to set up the collection,
which means we are pushing onto the collection in _spawn order_ (which
is currently in archetype order, which will often produce mismatched
orderings).

We need to preserve the ordering in the original RelationshipTarget
collection. Ideally without expensive checking / fixups.

## Solution

One solution would be to spawn in hierarchy-order. However this gets
complicated as there can be multiple hierarchies, and it also means we
can't spawn in more cache-friendly orders (ex: the current per-archetype
spawning, or future even-smarter per-table spawning). Additionally,
same-world cloning has _slightly_ more nuanced needs (ex: recursively
clone linked relationships, while maintaining _original_ relationships
outside of the tree via normal hooks).

The preferred approach is to directly spawn the remapped
RelationshipTarget collection, as this trivially preserves the ordering.
Unfortunately we can't _just_ do that, as when we spawn the children
with their Relationships (ex: `ChildOf`), that will insert a duplicate.

We could "fixup" the collection retroactively by just removing the back
half of duplicates, but this requires another pass / more lookups /
allocating twice as much space. Additionally, it becomes complicated
because observers could insert additional children, making it harder
(aka more expensive) to determine which children are dupes and which are
not.

The path I chose is to support "opting out" of the relationship target
hook in the contexts that need that, as this allows us to just cheaply
clone the mapped collection. The relationship hook can look for this
configuration when it runs and skip its logic when that happens. A
"simple" / small-amount-of-code way to do this would be to add a "skip
relationship spawn" flag to World. Sadly, any hook / observer that runs
_as the result of an insert_ would also read this flag. We really need a
way to scope this setting to a _specific_ insert.

Therefore I opted to add a new `RelationshipInsertHookMode` enum and an
`entity.insert_with_relationship_insert_hook_mode` variant. Obviously
this is verbose and ugly. And nobody wants _more_ insert variants. But
sadly this was the best I could come up with from a performance and
capability perspective. If you have alternatives let me know!

There are three variants:

1. `RelationshipInsertHookMode::Run`: always run relationship insert
hooks (this is the default)
2. `RelationshipInsertHookMode::Skip`: do not run any relationship
insert hooks for this insert (this is used by spawner code)
3. `RelationshipInsertHookMode::RunIfNotLinked`: only run hooks for
_unlinked_ relationships (this is used in same-world recursive entity
cloning to preserve relationships outside of the deep-cloned tree)

Note that I have intentionally only added "insert with relationship hook
mode" variants to the cases we absolutely need (everything else uses the
default `Run` mode), just to keep the code size in check. I do not think
we should add more without real _very necessary_ use cases.

I also made some other minor tweaks:

1. I split out `SourceComponent` from `ComponentCloneCtx`. Reading the
source component no longer needlessly blocks mutable access to
`ComponentCloneCtx`.
2. Thanks to (1), I've removed the `RefCell` wrapper over the cloned
component queue.
3. (1) also allowed me to write to the EntityMapper while queuing up
clones, meaning we can reserve entities during the component clone and
write them to the mapper _before_ inserting the component, meaning
cloned collections can be mapped on insert.
4. I've removed the closure from `write_target_component_ptr` to
simplify the API / make it compatible with the split `SourceComponent`
approach.
5. I've renamed `EntityCloner::recursive` to
`EntityCloner::linked_cloning` to connect that feature more directly
with `RelationshipTarget::LINKED_SPAWN`
6. I've removed `EntityCloneBehavior::RelationshipTarget`. This was
always intended to be temporary, and this new behavior removes the need
for it.

---------

Co-authored-by: Viktor Gustavsson <[email protected]>
@cart cart moved this from Respond (With Priority) to Responded in @cart's attention Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events A-glTF Related to the glTF 3D scene/model format A-Scenes Serialized ECS data stored on the disk C-Bug An unexpected or incorrect behavior D-Complex Quite challenging from either a design or technical perspective. Ask for help! S-Needs-Design This issue requires design work to think about how it would best be accomplished
Projects
Status: Responded
Development

Successfully merging a pull request may close this issue.

4 participants