Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust: Implement basic type inference in QL #18632

Merged
merged 11 commits into from
Mar 14, 2025

Conversation

hvitved
Copy link
Contributor

@hvitved hvitved commented Jan 30, 2025

Overview

This PR adds basic type inference for Rust, implemented in QL. Type inference and method call resolution are mutually recursive:

struct<T> MyStruct(T);

impl<S> MyStruct<S> {
    fn foo(self) -> S {
        match self {
            MyStruct(t) => t
        }
    }
}

let x = MyStruct(0);
let y = x.foo();

In the example above, in order to resolve the call x.foo() we need to know the type of x, and in order to resolve the type of x.foo() we need to know the return type of foo (meaning we need to be able to resolve the call).

The example above involves generics, and we want to be able to conclude that x.foo() has type i64 and not simply the type S, which is the declared return type of foo.

This means that we need to be able to represent the fact that x has type MyStruct<i64>, and while it is tempting to represent constructed types like this using a newtype encoding, such an encoding will become mutually recursive with type inference/call resolution, meaning (at best) poor performance and (at worst) non-monotonic recursion. Another issue with using a newtype encoding is that we could only construct a constructed generic type once we are able to infer all type arguments (type arguments will be encoded as cons-lists), and in case one of the type arguments cannot be inferred, we will have to give up completely.

Instead, we index all type resolution relations with a type path, which is conceptually a (possibly empty) list of type argument indices. For example, the type MyStruct<i64> can be represented as

Type path Type
"" MyStruct
"0" i64

Type paths are represented as strings, so there will be no need for additional mutual recursion.

Using type path indexing, we can infer the following types for the program above:

Entity Type path Unbound type
self "" MyStruct
self "0" S
x "" MyStruct
x "0" i64

which means we will be able to match S against the type i64. We can then combine this with the return type of foo to get the correct return type of the call.

Implementation

The implementation is split into a shared language-agnostic type inference library (in a new typeinference QL pack) and a Rust-specific implementation on top.

The shared library defines the TypePath class, and provides the module Matching for matching types of arguments with types of parameters in the declarations that they target (like matching i64 with S in the example above). Matching takes into account that type arguments may be supplied explicitly (like MyStruct<i64>::foo(x)) and that matching may need to take base types into account in order to match.

The entry predicate of the Rust-specific implementation is

Type resolveType(AstNode n, TypePath path)

which assigns type-path-indexed types to AST nodes. Using the Matching module from the shared library, the implementation infers types for record expressions Foo { bar = baz}, call expressions x.foo() and Foo::bar(x), and field expressions x.field (in mutual recursion with resolveType in the latter two cases). For call expressions, we take implicit borrows and implicit dereferences into account, and base types in the context of Rust translates to trait bounds and impl blocks.

Known limitations

The implementation has a lot of known limitations, for example:

  • Missing support for a lot of expression kinds.
  • Missing support for patterns.
  • Missing support for operator calls (should be handled like method calls).
  • Missing support for constrained impl blocks.
  • Missing support for dyn trait types.
  • Missing support for array types.
  • Missing support for tuple types.
  • Missing support for associated types (should be handled like type parameters, they are the reason for the two new missing data flow results).
  • Type matching does not take variance into account; when matching return types, it should be contravariant and not covariant.
  • Missing support for the Deref trait and more generally type coercions.
  • Missing support for union types

Evaluation

DCA shows that the implementation adds a negligible overhead to the total analysis time (~5 %). Overall, we mostly loose call edges compared to what we get from rust-analyzer (except for the diem project), but this is really not surprising given the known limitations above, as well as the known limitations of our path resolution implementation.

Note to the reviewer

Commit-by-commit review is encouraged.

@github-actions github-actions bot added the Rust Pull requests that update Rust code label Jan 30, 2025
@hvitved hvitved force-pushed the rust/type-inference branch 4 times, most recently from 66bea42 to 2d0e953 Compare February 4, 2025 12:10
@hvitved hvitved force-pushed the rust/type-inference branch 7 times, most recently from c88c9fe to a8540b1 Compare February 7, 2025 14:58
@hvitved hvitved force-pushed the rust/type-inference branch 3 times, most recently from 02184af to 4f723dd Compare February 10, 2025 12:26
@hvitved hvitved force-pushed the rust/type-inference branch 6 times, most recently from 486f813 to a254679 Compare February 13, 2025 09:02
@hvitved hvitved force-pushed the rust/type-inference branch 3 times, most recently from 42e5970 to be2ec0f Compare February 18, 2025 14:36
@hvitved
Copy link
Contributor Author

hvitved commented Mar 13, 2025

My understanding is that at a high-level the type inference works by starting with types that are known (usually from declarations) and then having those type propagate though the AST (forwards and backwards). A big part of that is the Matching module which pairs declarations with usages of the declaration, and propagates declared types from the declaration to the use.

This is somewhat different from traditional type inference by unification where one builds bigger and bigger sets of terms that have equal type, with the hope that the set is eventually unified with a fully concrete type. However, in QL we can never "update" anything in place, and hence the present approach works better as all the sub-steps produce final accurate information, coupled with the path approach that makes it possible to build different layers of a type in smaller steps.

Your understanding is absolutely correct.

  • We already use "resolution" for matching paths to declarations. In this PR "resolve" often seems to be synonymous with "infer". Could we instead use "infer" and reserve "resolve" for path resolution only? So for instance resolveType would be inferType?

Good idea, I'll change it.

  • In C# constraints on types are based on classes and interfaces, both of which are types themselves. So a constraint is something like "type A is a subtype of type B" whereas in Rust constraints are "type A implements trait T". That is, it's a not a type * type relation but a type * trait relation. Currently this is handled by treating traits as types and trait implementation like subtyping.
    I wonder if things could be a bit clearer by instead introducing a TypeConstraint class in the shared interface as a generalization. Perhaps this could also reduce the number of types produced in examples like the one in the documentation for resolveType/2 where a term has both a type and a trait as its type. Unless there's a reason why this wouldn't work, this is something that I'd like to play around with.

I don't know if this could work, but feel free to try it out. Note that we also need type constraints for traits that extend other traits.

@hvitved hvitved force-pushed the rust/type-inference branch from e21ecc3 to 3bb89ea Compare March 13, 2025 14:05
@hvitved
Copy link
Contributor Author

hvitved commented Mar 13, 2025

@paldepind : Thanks for the review, I have (hopefully) addressed all your comments. I also decided to rebase, so we can see the combined effects with #18228 on DCA.

@hvitved hvitved requested a review from paldepind March 13, 2025 18:52
Copy link
Contributor

@paldepind paldepind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments :) Looks great to me 🎉 🎉

@hvitved hvitved merged commit cf0b3b5 into github:main Mar 14, 2025
39 checks passed
@hvitved hvitved deleted the rust/type-inference branch March 14, 2025 08:43
Declaration decl, DeclarationPosition dpos, Type base, TypePath path, Type t
) {
t = decl.getDeclaredType(dpos, path) and
path.isCons(base.getATypeParameter(), _)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relies crucially on TypeParameters being unique to a given Type. Should that be checked in a consistency check somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, presumably it would be equivalent to replace path.isCons(base.getATypeParameter(), _) with base = decl.getDeclaredType(dpos, TypePath::nil()) and path != TypePath::nil(), and wouldn't that be more efficient in terms of the amount of string manipulation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rewrite will not work if a declaration has multiple declared types at dpos, which there is no reason to rule out. But I noticed that declarationBaseType is only used in a context where t is in fact a TypeParameter, so we may as well specialize it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relies crucially on TypeParameters being unique to a given Type. Should that be checked in a consistency check somewhere?

Good idea; not all type parameters need to belong to a type though (e.g. method type parameters in C#), but they should belong to at most one type.

Comment on lines +671 to +672
accessBaseType(a, apos, target, base, pathToTypeParam.append(path), t) and
declarationBaseType(target, dpos, base, pathToTypeParam, tp) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check my understanding: The join of base here is the point where we introduce covariance, correct? Perhaps leave a comment in this spot about covariance if this is the place that needs updating to potentially support contravariance in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

Comment on lines +690 to +692
directTypeMatch(a, target, path, t, tp)
or
baseTypeMatch(a, target, path, t, tp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So adjustAccessType is applied in directTypeMatch, but not in baseTypeMatch? Maybe an example to clarify why we have this distinction would be good to add in the qldoc for adjustAccessType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, only used in directTypeMatch. I think we would have to add it in accessBaseType as well.

@paldepind
Copy link
Contributor

I've resolved the comments that should be addressed over in #19081.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-change-note-required This PR does not need a change note Rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants