-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First draft of an update to the Overloads chapter (DRAFT: DO NOT MERGE) #1839
base: main
Are you sure you want to change the base?
Conversation
erictraut
commented
Aug 13, 2024
- Attempts to clearly define the algorithm for overload matching.
- Describes checks for overload consistency, overlapping overloads, and implementation consistency.
* Attempts to clearly define the algorithm for overload matching. * Describes checks for overload consistency, overlapping overloads, and implementation consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(two quick comments)
…overloads * 'overloads' of https://github.com/erictraut/typing: [pre-commit.ci] auto fixes from pre-commit.com hooks # Conflicts: # docs/spec/overload.rst
…overloads * 'overloads' of https://github.com/erictraut/typing: [pre-commit.ci] auto fixes from pre-commit.com hooks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this tricky area! I haven't finished review yet, but may be called away soon, so I'm submitting the comments I have so far. (EDIT: I've now completed my review.)
We typically wait for a proposed spec change to be accepted by the TC prior to writing conformance tests. In this case, I think it's advisable to write the conformance tests prior to acceptance. This will help us validate the proposed spec changes and tell us if (and to what extent) these changes will be disruptive for existing stubs and current type checker implementations. I would normally volunteer to write the conformance tests, but in this case I think it would be preferable for someone else to write the tests based on their reading of the spec update. If I write the tests, there's a real possibility that they will match what's in my head but not accurately reflect the letter of the spec. There's also a possibility that I'll miss some important cases in the tests. If someone else writes the tests, they can help identify holes and ambiguities in the spec language. Is there anyone willing to volunteer to write a draft set of conformance tests for this overload functionality? I'm thinking that there should be four new test files:
If this is more work than any one person wants to volunteer for, we could split it up. |
I am willing to work on conformance tests for this, but I probably can't get to it until the core dev sprint, Sept 23-27. I realize that implies a delay to moving forward with this PR. Happy for someone else to get to it first. |
I've completed the first set of tests (for the "Invalid overload definitions" section of the spec.) I just realized that I named it It's slow going adding these tests, because running One other limitation of the conformance suite that I've observed is that sometimes the rules differ for stub files vs normal files, but as far as I can see the conformance suite tooling doesn't support a stub file as an actual test file, only as an importable resource for a non-stub file. Has there been prior discussion of lifting this limitation? |
I don't think this has been a requirement yet. The test infrastructure can be updated if necessary. |
|
||
When a type checker checks the implementation for consistency with overloads, | ||
it should first apply any transforms that change the effective type of the | ||
implementation including the presence of a ``yield`` statement in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a circumstance in which the presence of a yield
in the body of a function changes the meaning of its return type annotation? My understanding is that it does not: a generator must be manually annotated with a Generator
or Iterator
return type. So I'm not sure how to write a test for this mention of "presence of a yield
statement." Should this be removed from the text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct, the presence of a yield
statement doesn't change the effective type of the return type annotation. It does affect the inferred return type if the type checker (like pyright) implements return type inference. I think I meant to say "async
keyword" rather than "yield
statement". This should be changed in the spec.
Step 1: Examine the argument list to determine the number of | ||
positional and keyword arguments. Use this information to eliminate any | ||
overload candidates that are not plausible based on their | ||
input signatures. | ||
|
||
- If no candidate overloads remain, generate an error and stop. | ||
- If only one candidate overload remains, it is the winning match. Evaluate | ||
it as if it were a non-overloaded function call and stop. | ||
- If two or more candidate overloads remain, proceed to step 2. | ||
|
||
|
||
Step 2: Evaluate each remaining overload as a regular (non-overloaded) | ||
call to determine whether it is compatible with the supplied | ||
argument list. Unlike step 1, this step considers the types of the parameters | ||
and arguments. During this step, do not generate any user-visible errors. | ||
Simply record which of the overloads result in evaluation errors. | ||
|
||
- If all overloads result in errors, proceed to step 3. | ||
- If only one overload evaluates without error, it is the winning match. | ||
Evaluate it as if it were a non-overloaded function call and stop. | ||
- If two or more candidate overloads remain, proceed to step 4. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The separation of step 1 and step 2 here results in a more complex algorithm than if they were combined into "iterate the overloads looking for the first one that binds to the call without error, if none, issue a no-matching-overload error." Because in your typing-meetup presentation, you emphasized that we have a complex algorithm for overload matching due to legacy, I assumed that this complexity must originate from long-time mypy behavior. So I was surprised to find that while this algorithm matches pyright behavior, mypy appears to use the simpler one-iteration combined algorithm. Pyre agrees with pyright.
I don't have strong feelings here (I could see arguments in favor of either behavior), but it seems that if we don't have agreement on this algorithm between existing type checkers (and thus don't have a clear backwards-compatibility argument in favor of specifying one behavior), perhaps we should have some discussion of the pros and cons of specifying this more-complex algorithm? Or some input from mypy developers (@hauntsaninja?) on whether mypy would be willing to switch to the algorithm described here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mypy is using this same algorithm. Why do you think it isn't? In the mypy playground example that you provided, there are two overload signatures. Step one eliminates one, which leaves only one remaining. When it is evaluated as "as if it were a non-overloaded function call", it also fails, which leaves no valid overloads. Mypy reports the error as such. Perhaps the wording in mypy's error is leading you to believe that it's using some different process.
I purposely separated steps 1 and 2 to try to simplify both the description of the algorithm and the potential implementation. It's much cheaper to filter based on arity. Evaluating types of argument expressions is orders of magnitude more expensive — especially if they need to be re-evaluated for ever overload signature due to bidirectional type inference. It therefore doesn't make sense for a type checker to combine steps 1 and 2 if it cares about performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, the only way to observe the difference in the algorithm in the semantics, is by observing the selected return type (and secondarily, the emitted diagnostics). Mypy selects a return type of Any, and does not emit argument type errors relative to any one overload. Both of these behaviors suggest that it is simply checking all overloads and concluding that no overload matches (consistent with the single-step algorithm), not picking the sole arity-matching overload as the winner and then checking the call normally relative to that overload, as described here.
Pyright and pyre, in contrast, both infer the return type of the overload whose arity matches, not Any, and both emit errors about mismatched argument types, relative to only that overload. This behavior is consistent with the algorithm described here.
I agree with you about performance, but I think the spec should concern itself with the simplest possible description of the semantics, not with performance. I don't agree that separating steps 1 and 2 makes the description simpler ("check each call in order and discard any that result in errors" is a simple single step and doesn't require any separate discussion of arity vs types), and I'm not sure whether the return type difference above should be specified.
It's possible that internally mypy is doing a separate first arity pass, for performance reasons. But its observable behavior in the return type selected is as if it does not; it doesn't match what is specified here.
The tests I've written currently specify the return type behavior of pyright and pyre in this case, and mark mypy as out of compliance because it says the return type is Any. If instead the linked behaviors of mypy and pyright should both be permitted by the conformance suite, I'm fine with that outcome, but in that case I don't believe the conformance suite can observe any distinction between the algorithm as specified vs an implementation that combines step 1 and step 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't draw any conclusions from the type that is evaluated for an errant call expression. The spec is (and should be) silent on what type should be evaluated in the face of an error. That's true of all expression types that result in errors of various types (syntactical and semantic). Once an error is reported, all bets are off in terms of what type you should expect for type evaluation. Mypy's evaluation of Any
is reasonable for a type checker not associated with a language server. Pyright's behavior (where it attempts to "guess" the most likely intended return type) is more appropriate for a language server. I would object if someone were to propose that the spec should specify the evaluated type in the face of an error.
I think the description of the algorithm is simpler and clearer if steps 1 and 2 are separated. It sounds like you see it differently. I guess it's subjective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. All type checkers, including mypy, currently always evaluate a call expression to the annotated return type of the called function, even if there are argument type or arity errors in the call. So it still seems that mypy is not really following the algorithm as described here, in that it is not evaluating the sole arity-matching overload as if that were the signature of the function.
I guess there is some ambiguity in what the overload spec means here, because we are attempting to specify overloads without having specified regular call evaluation yet. But if we assume that call evaluation would be specified to not mandate any return type when the call errors, then I think that there is no observable semantic difference between the one-step and two-step version of this part of the algorithm, so it's just a matter of which description is easier to understand. Since that's subjective, and there is some value in describing a more performant algorithm, I'm good with the current text.
I'll update the conformance suite to not expect any particular return type in case no overload fully matches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the conformance suite to not expect any particular return type in case no overload fully matches.
Done.
Step 4: If the argument list is compatible with two or more overloads, | ||
determine whether one or more of the overloads has a variadic parameter | ||
(either ``*args`` or ``**kwargs``) that maps to a corresponding argument | ||
that supplies an indeterminate number of positional or keyword arguments. | ||
If so, eliminate overloads that do not have a variadic parameter. | ||
|
||
- If this results in only one remaining candidate overload, it is | ||
the winning match. Evaluate it as if it were a non-overloaded function | ||
call and stop. | ||
- If two or more candidate overloads remain, proceed to step 5. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step seems to assume that unpacked arguments of indeterminate length should be matched to fixed numbers of parameters without error, even though this is unsound.
This does appear to describe the actual behavior of pyright (even in strict mode), mypy, and pyre.
As far as I can find, this behavior is not specified anywhere.
Should we acknowledge in the text that this step assumes an unsound call binding strategy, which is not (yet?) specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step seems to assume that unpacked arguments of indeterminate length should be matched to fixed numbers of parameters without error, even though this is unsound.
That is not what was intended. What I was trying to say is that *args
and **kwargs
parameters (which are of indeterminate length) should be matched against unpacked sequences or mappings of indeterminate length.
The text does not say anything about matching a fixed numbers of parameters. It talks only about *args
or **kwargs
.
I guess that *args
and **kwargs
parameters can be of determinate length if they use an unpacked tuple or unpacked TypedDict, respectively, so maybe the text needs to be clearer to indicate that these cases are exempt from this rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was trying to say is that
*args
and**kwargs
parameters (which are of indeterminate length) should be matched against unpacked sequences or mappings of indeterminate length.
Yes, this is also what I understood.
The reason I took this to imply that "unpacked arguments of indeterminate length should be matched to fixed numbers of parameters without error", is that if the latter is not the case, then I don't see how step 4 could ever eliminate an overload that wasn't already eliminated by step 2. The overloads that could pass step 2 and then be eliminated by step 4, would be overloads where an unpacked argument of indeterminate length successfully ("without error", in order to pass step 2) matched against an overload without a corresponding variadic parameter.
So without the implication I mentioned, step 4 would be redundant. (And this is immediately relevant to the conformance suite, because the only tests I could write for step 4 have to rely on "unpacked arguments of indeterminate length should be matched to fixed numbers of parameters without error", which means we are now at least implicitly specifying that. Which may be the right thing to do, since type checkers already appear to agree on it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this thread is a second case of "overload handling has to be built on top of some specification for binding a call to a single (non-overloaded) signature, and we don't have an explicit specification for that yet, which makes some things less clear in the overload spec"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this is immediately relevant to the conformance suite, because the only tests I could write for step 4 have to rely on "unpacked arguments of indeterminate length should be matched to fixed numbers of parameters without error", which means we are now at least implicitly specifying that.
Oh, actually, this is not true. The test I wrote would pass even if we required strict handling of indeterminate-length unpacked arguments. It would just pass because of step 2 instead, and step 4 would be redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So without the implication I mentioned, step 4 would be redundant.
I see what you mean. Yes, both mypy and pyright allow an unpacked argument of indeterminate (unknown) length to match against a fixed number of parameters. That can produce false negatives, but flagging this as an error will produce false positives. Since it's a common use case in python, so I think it would be pretty annoying to emit an error for this case. Neither mypy nor pyright do.
from typing import Literal, overload
x1 = [1]
x4 = [1, 2, 3, 4]
def func1(p1: int, /) -> Literal[1]: ...
reveal_type(func1(*x1)) # No type checker error, no runtime error
reveal_type(func1(*x4)) # No type checker error, runtime error
@overload
def func2(p1: int, /) -> Literal[1]: ...
@overload
def func2(p1: int, p2: int, /, *args: int) -> Literal[3]: ...
def func2(*args: int) -> int: ...
reveal_type(func2(*x4)) # Literal[1] (pyright), Literal[3] (mypy)
reveal_type(func2(1, 2, *x4)) # Literal[3]
And yes, you're correct that without this behavior specified, it's still possible for two type checkers to be conformant with the existing spec but still differ in how they interpret overloaded calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it seems like the lenient behavior shared by all type checkers is the one we'd need to specify here. So I don't really think any change to the spec text is needed.
Thanks for the example! It showed that my test for step 4 wasn't adequate, as it wasn't catching the fact that pyright doesn't seem to prefer the variadic (second) overload of func2
for the call func2(*x4)
, where I believe this spec says that it should? I've updated the test to catch this.
for all remaining overloads are :term:<equivalent>, proceed to step 6. | ||
|
||
If the return types are not equivalent, overload matching is ambiguous. In | ||
this case, assume a return type of ``Any`` and stop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a correctness requirement for the return type here to be assumed to be Any
? It seems to me that it would also be valid for a type-checker to use the union of all the ambiguous matching overloads. I would prefer for the specification not to prevent that option. (No type checker currently does this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing stubs (including typeshed, numpy, pandas, and others) assume that the result will be Any
in this case, so I don't think this is something we can change at this point. An earlier version of pyright generated a union, and it resulted in many false positive errors and lots of unhappy users. I think it's important for the spec to specify Any
here so stub authors can rely on the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, ok, that's useful info, thank you. If you happen to know of any old pyright issues where unhappy users surfaced these problems with the union behavior, I would be curious to take a look at some real world cases relying on this.
I believe I've completed the test suite, with reasonably good coverage of everything specified as a "should". I intentionally avoided adding tests either way for behaviors specified as a "may". I also added the capability to have stub test files in the conformance suite, and added I aimed to write tests that reflect the specification as it currently exists in this PR, to help illuminate where type checkers currently do and don't conform to this spec. I commented inline on some points where I wonder if we should adjust the spec. |
@carljm, thanks for doing this! I'll try to find time next week to review your test code and update the draft spec if your test uncovered any areas of ambiguity or lack of clarity. |