Use a tree to match routes #4786

pcattori · 2022-12-06T19:46:23Z

pcattori
Dec 6, 2022
Maintainer

The Representation Principle: Once a problem is described using an appropriate representation, the problem is almost solved
-- Patrick H. Winston

tl;dr: Let's just copy what routers like DNS do!

Routes are inherently hierarchical as each / represents a subpath.
A tree directly models these hierarchies.
Ubiquitous, reliable, and performant routers use trees to match routes.
For example, DNS uses uses trie (a specific type of tree, also known as a prefix tree) to store and search domain names efficiently.

The main benefit of representing routes as a tree is that our route matching will be simpler, leading to more robust and predictable matching. By robustness, I mean that redundant or ambiguous routes will not be representable in a tree like they are now.
And by predictable, I mean that it will be simpler to explain and understand why the route was chosen as the best match for an incoming path.

Additionally, all matching will run at optimal $O(input)$, so we should see also see perf improvements, especially at larger scales.

Robust matching

Currently, /one/:two and /one/:three can coexist as routes and we arbitrarily (but consistently) choose one of these to have a higher priority.
While we could prune these routes manually, a better solution is to never allow redundant routes like these to coexist in the first place.

Encoding routes into a tree would ensure that we never have redundant routes, as redundant routes would occupy the same edge.
We can even warn or error when an ambiguous route is being added to the tree: WARN: '/one/:three' is ambiguous since '/one/:two' is already a route.

Simple, fast, deterministic matching

Trees natively support $O(input)$ lookups where $input$ is the number of nested segments of the incoming path.
We would then do one tree lookup to match an incoming path to a route.

Note that this is indeed optimal because we are using input to represent the size of the input, not the number of nodes in the tree. So this would be $O(log(n))$ where $n$ is the number of nodes in the tree if the tree is balanced.

Mutually exclusive sibling branches?

One decision we'll have to make is if dynamic segments are automatically mutually exclusive of their sibling paths or not.
For example, if we have routes /one/two and /one/:param1/three, the tree would look like:

Can we tell at the /one node whether to go down the two branch or the :param1 branch?
If :param1 is modeled to be exclusive of its siblings (e.g. two), then yes!
We'd know to go down the two branch.
Then we'd conclude that there is no valid match for /one/two/three

If we don't assume mutual exclusivity of sibling branches, then we'd have to traverse both two and :param1.
Like before, the two branch would not yield a valid match, but in this case :param1 branch would yield a match for /one/:param1/three.¹

In this case, we could have gotten multiple valid matches if we replaced /one/two with /one/two/three.
For now, we can pick the "best" match out of these via the existing scoring of routes, though we can improve on this as well.

Changing routes at runtime

If we represent our routes as a tree, we'll be able to use standard tree algorithms to efficiently match paths to a route.
Additionally, tries support $O(input)$ insertions and deletions, so we can efficiently support route modifications at runtime if needed.

Longest common static prefix

By default, lookups in our route tree would prefer static paths over dynamic paths so that /one matches /one over /:param.
This mirrors the "longest common prefix" matching found in DNS and other routers.

However, if we decide not to make dynamic segments mutually exclusive of their siblings, then we'd need to still pick the "best" match from among the valid matches.

A simple method to extend "longest common prefix" is to prefer "longest common static prefix".

For example, consider these two routes:

/one/two/:foo
/one/:bar/three

Which one should /one/two/three match?
Because the longest common static prefix for each is:

/one/two/:foo -> /one/two
/one/:bar/three -> /one

So /one/two/:foo should be chosen.
Note that we are measuring "longest" in terms of number of segments, not characters.

For two matching routes with the same number of static segments, we keep going until a tie is broken where one route has a static segment and one route has a dynamic segment.

For example, matching the path /one/two/baz/three to the routes:

/one/two/:bar1/:bar2' -> /one/two/:/:
/one/two/:foo/three -> /one/two/:/three

where we use : to denote a dynamic segment so that its easy to compare the routes.
Both match the same up to /one/two/:, but then /one/two/:foo/three matches statically while the other route matches dynamically.
So /one/two/:foo/three should be chosen.

Currently, we use scoring scheme with arbitrary weights to rank valid matches.
Switching to longest common static prefix would be a breaking change.
We can still switch internally to a tree for matching and use the existing scoring to pick between valid matches returned by the tree lookup to vastly simplify and speed up our implementation without any breaking changes.

I think longest common static prefix captures most people's intuitions about how matching should work better than our scoring scheme and is much more easily explained.
I also think that our scoring was trying to capture the same intuition in a less robust way.
So its likely that virtually all use-cases would not be breaking and I would bet that any use-cases that are breaking are probably behaving unpredictably for users today.

So we could philosophically call this a bug fix to our current routing rather than a breaking change.
That's probably what I would do, but I realize this could be contentious.

To make it less contentious, we could try see how closely it matches existing usage in our test suite, examples, and existing Remix projects.
But we can also treat this as a breaking change if we want to be super risk-averse.

Again, with context, I think I can convince you that this really is a bug fix, but let's talk about it 😁

For large paths with many dynamic segments, lookup would take $O(2^dynamic)$ where $dynamic$ is the number of dynamic segments. Compare that to the exponentially faster $O(input)$ when dynamic segments are mutually exclusive to their siblings. Note that $O(2^dynamic)$ is the best we can hope to do in the worst case as there are $2^dynamic$ possible combinations for a path with dynamic segments. But this should be a rare case as we don't expect multiple routes of the form /:a1/:b2/:c3/:d4/:e5/:f6/some-static-segment-at-the-end to be a common use-case. ↩

brophdawg11 · 2022-12-06T20:30:26Z

brophdawg11
Dec 6, 2022
Maintainer

I really like the idea of "longest matching static prefix" but I do think it would constitute a breaking change based on the ambiguity/flexibility of today's routes. Our unit tests will give some indication of the severity, but we'll have no way of making that determination for real-world apps. So it might need to be a v7 thing.

I've been envisioning something like this trie walk but where we keep some version of the current scoring (for v6). So we would walk all matching paths to completion (while pruning whole sub-sections of the tree) and we just score along the way using the same values as we use today. So it's a single trie walk (probably higher than O(log(n)) since we walk multiple paths?) to determine all matching/scored paths, then sort and pick the highest score. In the vast majority of cases we'd expect 1 matching route so there would be no additional sorting cost. In the minority of cases we'dhave maybe a few matching routes and the sorting wouldn't constitute a large overhead.

Even with this approach, we'd still be significantly cheaper from a O(?) standpoint than we are today.

1 reply

pcattori Dec 6, 2022
Maintainer Author

Changed my complexity analysis to refer to input rather than n for clarity, since n is often used to denote the number of nodes in the tree rather than the number of segments/branches in the lookup input.

But yes, we'd be doing O(input) operations, which is optimal since you have to read the whole input to match it (unless you're doing probabilistic matching, which we're not).

brophdawg11 · 2023-01-18T17:31:52Z

brophdawg11
Jan 18, 2023
Maintainer

@pcattori something to keep in mind when we implement this: remix-run/react-router#9925

3 replies

machour Jan 18, 2023
Maintainer

@brophdawg11 by the way, do you think we should revisit partial dynamic params support? I feel like people are going to keep being confused by this, so if it's something we can support, let's do it.

pcattori Jan 18, 2023
Maintainer Author

@machour we actually supported partial dynamic params incidentally before, so from an implementation point of view, its definitely doable. The only limitation here encoding-wise is that users would need to escape $ since currently foo$bar is interpreted statically, so to then you'd need foo[$]bar to escape it. But that's easy to do.

I think partial dynamic params would be good to support to keep the mental model for the user as simple as possible.

brophdawg11 Jan 18, 2023
Maintainer

I'd open a new discussion for partial param support - I know Michael has some opinions there relating to keeping route matching patterns as simple as possible.

gyx1000 · 2023-07-18T12:49:29Z

gyx1000
Jul 18, 2023

@pcattori , I made this in order to have a better understanding of react-router.

However, I have a few questions about this proposal.

I probably don't have the ability to understand correctly, but what do you mean by representing routes as a tree?
Currently, matchRoutes receives a RouteObjectType[] object which can be represented as a "tree" because there is a parent->child relationship.
Were you thinking of changing the object passed to matchRoutes or building a new object when calling it?

What interests me most is the fact that you could add/delete routes at runtime.
Indeed, it would take O(input) if we had a tree where each node contained a single segment, but this would imply a more substantial refactor.

0 replies

blowery · 2024-03-19T16:49:31Z

blowery
Mar 19, 2024

👋 We use react-router on Tumblr. When profiling production traffic, I'm seeing matchPaths show up as the most expensive thing our little node server does when servicing requests, so I'm pretty intrigued by this. If folks want a "real world" place to test improvements, I'm happy to try out some patches.

We're currently working around how slow matchPaths is by patching it to memoize the results of flattening and scoring routes. While that helps, it's very specific to how our routes are organized and probably wouldn't work as a general solution.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a tree to match routes #4786

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Use a tree to match routes #4786

pcattori Dec 6, 2022 Maintainer

Robust matching

Simple, fast, deterministic matching

Mutually exclusive sibling branches?

Changing routes at runtime

Longest common static prefix

Footnotes

Replies: 4 comments · 4 replies

brophdawg11 Dec 6, 2022 Maintainer

pcattori Dec 6, 2022 Maintainer Author

brophdawg11 Jan 18, 2023 Maintainer

machour Jan 18, 2023 Maintainer

pcattori Jan 18, 2023 Maintainer Author

brophdawg11 Jan 18, 2023 Maintainer

gyx1000 Jul 18, 2023

blowery Mar 19, 2024

pcattori
Dec 6, 2022
Maintainer

Replies: 4 comments 4 replies

brophdawg11
Dec 6, 2022
Maintainer

pcattori Dec 6, 2022
Maintainer Author

brophdawg11
Jan 18, 2023
Maintainer

machour Jan 18, 2023
Maintainer

pcattori Jan 18, 2023
Maintainer Author

brophdawg11 Jan 18, 2023
Maintainer

gyx1000
Jul 18, 2023

blowery
Mar 19, 2024