Rust: Implement basic type inference in QL #18632

hvitved · 2025-01-30T09:42:18Z

Overview

This PR adds basic type inference for Rust, implemented in QL. Type inference and method call resolution are mutually recursive:

struct<T> MyStruct(T);

impl<S> MyStruct<S> {
    fn foo(self) -> S {
        match self {
            MyStruct(t) => t
        }
    }
}

let x = MyStruct(0);
let y = x.foo();

In the example above, in order to resolve the call x.foo() we need to know the type of x, and in order to resolve the type of x.foo() we need to know the return type of foo (meaning we need to be able to resolve the call).

The example above involves generics, and we want to be able to conclude that x.foo() has type i64 and not simply the type S, which is the declared return type of foo.

This means that we need to be able to represent the fact that x has type MyStruct<i64>, and while it is tempting to represent constructed types like this using a newtype encoding, such an encoding will become mutually recursive with type inference/call resolution, meaning (at best) poor performance and (at worst) non-monotonic recursion. Another issue with using a newtype encoding is that we could only construct a constructed generic type once we are able to infer all type arguments (type arguments will be encoded as cons-lists), and in case one of the type arguments cannot be inferred, we will have to give up completely.

Instead, we index all type resolution relations with a type path, which is conceptually a (possibly empty) list of type argument indices. For example, the type MyStruct<i64> can be represented as

Type path	Type
`""`	`MyStruct`
`"0"`	`i64`

Type paths are represented as strings, so there will be no need for additional mutual recursion.

Using type path indexing, we can infer the following types for the program above:

Entity	Type path	Unbound type
`self`	`""`	`MyStruct`
`self`	`"0"`	`S`
`x`	`""`	`MyStruct`
`x`	`"0"`	`i64`

which means we will be able to match S against the type i64. We can then combine this with the return type of foo to get the correct return type of the call.

Implementation

The implementation is split into a shared language-agnostic type inference library (in a new typeinference QL pack) and a Rust-specific implementation on top.

The shared library defines the TypePath class, and provides the module Matching for matching types of arguments with types of parameters in the declarations that they target (like matching i64 with S in the example above). Matching takes into account that type arguments may be supplied explicitly (like MyStruct<i64>::foo(x)) and that matching may need to take base types into account in order to match.

The entry predicate of the Rust-specific implementation is

Type resolveType(AstNode n, TypePath path)

which assigns type-path-indexed types to AST nodes. Using the Matching module from the shared library, the implementation infers types for record expressions Foo { bar = baz}, call expressions x.foo() and Foo::bar(x), and field expressions x.field (in mutual recursion with resolveType in the latter two cases). For call expressions, we take implicit borrows and implicit dereferences into account, and base types in the context of Rust translates to trait bounds and impl blocks.

Known limitations

The implementation has a lot of known limitations, for example:

Missing support for a lot of expression kinds.
Missing support for patterns.
Missing support for operator calls (should be handled like method calls).
Missing support for constrained impl blocks.
Missing support for dyn trait types.
Missing support for array types.
Missing support for tuple types.
Missing support for associated types (should be handled like type parameters, they are the reason for the two new missing data flow results).
Type matching does not take variance into account; when matching return types, it should be contravariant and not covariant.
Missing support for the Deref trait and more generally type coercions.
Missing support for union types

Evaluation

DCA shows that the implementation adds a negligible overhead to the total analysis time (~5 %). Overall, we mostly loose call edges compared to what we get from rust-analyzer (except for the diem project), but this is really not surprising given the known limitations above, as well as the known limitations of our path resolution implementation.

Note to the reviewer

Commit-by-commit review is encouraged.

rust/ql/lib/codeql/rust/elements/internal/TypeInference.qll

rust/ql/lib/codeql/rust/dataflow/internal/DataFlowImpl.qll

rust/ql/lib/codeql/rust/elements/internal/PathResolution.qll

rust/ql/lib/codeql/rust/elements/internal/RecordExprImpl.qll

rust/ql/lib/codeql/rust/elements/internal/TypeInference.qll

rust/ql/lib/codeql/rust/elements/internal/PathResolution.qll

hvitved · 2025-03-13T12:27:25Z

My understanding is that at a high-level the type inference works by starting with types that are known (usually from declarations) and then having those type propagate though the AST (forwards and backwards). A big part of that is the Matching module which pairs declarations with usages of the declaration, and propagates declared types from the declaration to the use.

This is somewhat different from traditional type inference by unification where one builds bigger and bigger sets of terms that have equal type, with the hope that the set is eventually unified with a fully concrete type. However, in QL we can never "update" anything in place, and hence the present approach works better as all the sub-steps produce final accurate information, coupled with the path approach that makes it possible to build different layers of a type in smaller steps.

Your understanding is absolutely correct.

We already use "resolution" for matching paths to declarations. In this PR "resolve" often seems to be synonymous with "infer". Could we instead use "infer" and reserve "resolve" for path resolution only? So for instance resolveType would be inferType?

Good idea, I'll change it.

In C# constraints on types are based on classes and interfaces, both of which are types themselves. So a constraint is something like "type A is a subtype of type B" whereas in Rust constraints are "type A implements trait T". That is, it's a not a type * type relation but a type * trait relation. Currently this is handled by treating traits as types and trait implementation like subtyping.
I wonder if things could be a bit clearer by instead introducing a TypeConstraint class in the shared interface as a generalization. Perhaps this could also reduce the number of types produced in examples like the one in the documentation for resolveType/2 where a term has both a type and a trait as its type. Unless there's a reason why this wouldn't work, this is something that I'd like to play around with.

I don't know if this could work, but feel free to try it out. Note that we also need type constraints for traits that extend other traits.

hvitved · 2025-03-13T14:07:34Z

@paldepind : Thanks for the review, I have (hopefully) addressed all your comments. I also decided to rebase, so we can see the combined effects with #18228 on DCA.

paldepind

Thanks for addressing my comments :) Looks great to me 🎉 🎉

shared/typeinference/codeql/typeinference/internal/TypeInference.qll

aschackmull · 2025-03-19T15:03:57Z

shared/typeinference/codeql/typeinference/internal/TypeInference.qll

+        Declaration decl, DeclarationPosition dpos, Type base, TypePath path, Type t
+      ) {
+        t = decl.getDeclaredType(dpos, path) and
+        path.isCons(base.getATypeParameter(), _)


This relies crucially on TypeParameters being unique to a given Type. Should that be checked in a consistency check somewhere?

Also, presumably it would be equivalent to replace path.isCons(base.getATypeParameter(), _) with base = decl.getDeclaredType(dpos, TypePath::nil()) and path != TypePath::nil(), and wouldn't that be more efficient in terms of the amount of string manipulation?

The rewrite will not work if a declaration has multiple declared types at dpos, which there is no reason to rule out. But I noticed that declarationBaseType is only used in a context where t is in fact a TypeParameter, so we may as well specialize it.

This relies crucially on TypeParameters being unique to a given Type. Should that be checked in a consistency check somewhere?

Good idea; not all type parameters need to belong to a type though (e.g. method type parameters in C#), but they should belong to at most one type.

shared/typeinference/codeql/typeinference/internal/TypeInference.qll

aschackmull · 2025-03-19T15:27:41Z

shared/typeinference/codeql/typeinference/internal/TypeInference.qll

+          accessBaseType(a, apos, target, base, pathToTypeParam.append(path), t) and
+          declarationBaseType(target, dpos, base, pathToTypeParam, tp) and


To check my understanding: The join of base here is the point where we introduce covariance, correct? Perhaps leave a comment in this spot about covariance if this is the place that needs updating to potentially support contravariance in the future?

shared/typeinference/codeql/typeinference/internal/TypeInference.qll

aschackmull · 2025-03-19T15:45:36Z

shared/typeinference/codeql/typeinference/internal/TypeInference.qll

+        directTypeMatch(a, target, path, t, tp)
+        or
+        baseTypeMatch(a, target, path, t, tp)


So adjustAccessType is applied in directTypeMatch, but not in baseTypeMatch? Maybe an example to clarify why we have this distinction would be good to add in the qldoc for adjustAccessType.

Correct, only used in directTypeMatch. I think we would have to add it in accessBaseType as well.

shared/typeinference/codeql/typeinference/internal/TypeInference.qll

paldepind · 2025-03-20T15:58:07Z

I've resolved the comments that should be addressed over in #19081.

github-actions bot added the Rust Pull requests that update Rust code label Jan 30, 2025

github-advanced-security bot found potential problems Jan 30, 2025

View reviewed changes

rust/ql/lib/codeql/rust/elements/internal/TypeInference.qll Fixed Show fixed Hide fixed

hvitved force-pushed the rust/type-inference branch 4 times, most recently from 66bea42 to 2d0e953 Compare February 4, 2025 12:10

github-advanced-security bot found potential problems Feb 4, 2025

View reviewed changes

rust/ql/lib/codeql/rust/dataflow/internal/DataFlowImpl.qll Fixed Show fixed Hide fixed

hvitved force-pushed the rust/type-inference branch 7 times, most recently from c88c9fe to a8540b1 Compare February 7, 2025 14:58

github-advanced-security bot found potential problems Feb 7, 2025

View reviewed changes

rust/ql/lib/codeql/rust/elements/internal/PathResolution.qll Fixed Show fixed Hide fixed

rust/ql/lib/codeql/rust/elements/internal/PathResolution.qll Fixed Show fixed Hide fixed

hvitved force-pushed the rust/type-inference branch 3 times, most recently from 02184af to 4f723dd Compare February 10, 2025 12:26

github-advanced-security bot found potential problems Feb 10, 2025

View reviewed changes

rust/ql/lib/codeql/rust/elements/internal/PathResolution.qll Fixed Show fixed Hide fixed

hvitved force-pushed the rust/type-inference branch 6 times, most recently from 486f813 to a254679 Compare February 13, 2025 09:02

github-advanced-security bot found potential problems Feb 13, 2025

View reviewed changes

rust/ql/lib/codeql/rust/elements/internal/RecordExprImpl.qll Fixed Show fixed Hide fixed

rust/ql/lib/codeql/rust/elements/internal/TypeInference.qll Fixed Show fixed Hide fixed

rust/ql/lib/codeql/rust/elements/internal/TypeInference.qll Fixed Show fixed Hide fixed

hvitved force-pushed the rust/type-inference branch 3 times, most recently from 42e5970 to be2ec0f Compare February 18, 2025 14:36

github-advanced-security bot found potential problems Feb 18, 2025

View reviewed changes

rust/ql/lib/codeql/rust/elements/internal/PathResolution.qll Fixed Show fixed Hide fixed

hvitved added 4 commits March 13, 2025 13:23

Rust: Use type inference to resolve method calls and field accesses

e8505ad

Rust: Use type inference in path resolution test

fcdffc4

Rust: Add more consistency checks

795ba25

Rust: Fix bug in path resolution library

2394f2f

hvitved added 3 commits March 13, 2025 13:34

Rust: Use 'infer' instead of 'resolve' in type inference library

78280af

Address review comments

af91152

Rust: Move type inference/path resolution out of elements folder

3bb89ea

hvitved force-pushed the rust/type-inference branch from e21ecc3 to 3bb89ea Compare March 13, 2025 14:05

Rust: Update expected test output

255f06b

hvitved requested a review from paldepind March 13, 2025 18:52

Address review comments

c3739d4

paldepind approved these changes Mar 14, 2025

View reviewed changes

hvitved merged commit cf0b3b5 into github:main Mar 14, 2025
39 checks passed

hvitved deleted the rust/type-inference branch March 14, 2025 08:43

aschackmull reviewed Mar 19, 2025

View reviewed changes