Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Add support for re-export conventions for imports #15848

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Jan 31, 2025

Summary

This PR adds support for re-export conventions for imports.

The implementation differs for stub files and runtime files as follows:

  • For stub files, this is enforced by using the unresolved-import and the inferred type is Unknown by way of making the symbol is unbound
  • For runtime files, this is enforced by using a new rule implicit-reexport and the type is inferred as usual

The current implementation updates the Symbol::Type variant to include a new ReExport enum which specifies whether the symbol is defined in an import statement and if so if it's being implicitly or explicitly re-exported. This is then used downstream to perform the check.

This implementation does not yet support __all__ and * imports as both are features that needs to be implemented independently.

Other potential solution

While implementing this, another potential solution would be to use the visibility qualifier for a symbol to specify whether this is an implicitly exported symbol which translates to:

enum Visibility {
    Public,
    Private,
    ImplicitExport,
}

This would then mean that this enum could be created for the Definition while building the semantic index and it could also consider the __all__ values. One issue that I saw with this is that the values in __all__ could come from another module which isn't possible to do in the semantic index builder because that does not cross the file boundary.

closes: #14099
closes: #15476

Test Plan

Add test cases, update existing ones if required.

@dhruvmanila dhruvmanila added the red-knot Multi-file analysis & type inference label Jan 31, 2025
Copy link

codspeed-hq bot commented Jan 31, 2025

CodSpeed Performance Report

Merging #15848 will not alter performance

Comparing dhruv/re-export (3f92a06) with main (1f7a29d)

Summary

✅ 32 untouched benchmarks

@dhruvmanila dhruvmanila changed the base branch from main to dhruv/add-missing-imports February 1, 2025 10:30
dhruvmanila added a commit that referenced this pull request Feb 3, 2025
## Summary

Related to #15848, this PR adds the imports explicitly as we'll now flag
these symbols as undefined.

---------

Co-authored-by: Alex Waygood <[email protected]>
Base automatically changed from dhruv/add-missing-imports to main February 3, 2025 09:27
@dhruvmanila dhruvmanila force-pushed the dhruv/re-export branch 6 times, most recently from 0abf09b to 66aa74c Compare February 4, 2025 12:18
@dhruvmanila dhruvmanila marked this pull request as ready for review February 4, 2025 12:31
@MichaReiser
Copy link
Member

MichaReiser commented Feb 4, 2025

Hmm, it's not obvious to me why we see a 4% regression in the incremental benchmark. It either suggests that we now create more ingredients (which I didn't see) or that some queries are invalidated more often (which they shouldn't, because the benchmark only inserts a comment). Did you look into where we're spending more time?

@dhruvmanila
Copy link
Member Author

dhruvmanila commented Feb 4, 2025

Hmm, it's not obvious to me why we see a 4% regression in the incremental benchmark. It either suggests that we now create more ingredients (which I didn't see) or that some queries are invalidated more often (which they shouldn't, because the benchmark only inserts a comment). Did you look into where we're spending more time?

Oh, I missed that completely as I didn't see the comment being updated and I didn't expect this change to regress. Let me look into it. Thanks for catching that.

Edit: I think I've a hunch on where it might be but need to confirm. The Symbol type now includes one additional information for the Type variant. Symbol is the return type of various salsa queries so that might be it but as said need to confirm.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adjusting here as we figured out on the fly what the right semantics should be! The difference in behavior between the stub and non-stub case definitely made this more complex than I'd initially realized. In hindsight we could have postponed the non-stub opt-in rule entirely, but I think we would have wanted it eventually, so this way it forces us to consider it in the design.

This first review pass is mostly just about the tests and the semantics; will do a more detailed review of the code changes once we have that stuff nailed down.

@dhruvmanila
Copy link
Member Author

dhruvmanila commented Feb 5, 2025

Todo:

Comment on lines 2623 to +2601
qualifiers: TypeQualifiers,
re_export: ReExport,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you consider encoding the information on whether or not the type is a re-export in the qualifiers field? It feels like "metadata carried along with the type" in a similar way to the way that whether a symbol is a class variable or not is additional metadata that we carry around with the type of the symbol. We could possibly even rename TypeQualifiers to TypeMetadata, and add documentation saying that the metadata we record includes (but is not limited to) type qualifiers such as ClassVar?

I suppose the ReExport::Maybe variant means that we'd have to have two separate flags on the TypeQualifiers struct: MAYBE_REEXPORT and DEFINITE_REEXPORT.

Curious what @carljm thinks of this idea

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea and it seems more prominent now because for the binding, the Type would also need to contain the re-export metadata. My only concern with this would be that the metadata structure would be shared for both declared and binding type so there's a chance that we could mistakenly set metadata for binding type while it's only relevant for a declared type. For example, setting the Final qualifier. Any thoughts on this? If it's not a big concern then I might prefer to go this route.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feeling I've had in looking at this PR is that this should be metadata on a Definition (possibly actually just on the Import and ImportFrom variants of DefinitionKind, which might save some space if those are not currently the largest variants). Then it would be handled in symbol_from_bindings and symbol_from_declarations (where we'd ignore non-re-exported definitions if querying a public global symbol from a stub file). In general, I think we should be looking for ways for this metadata to spread less widely, not more widely, because the cases where it is relevant are limited. I don't think it needs to travel with Types in general, and I think ideally code in TypeInferenceBuilder would be totally unaffected by this change.

If we were just implementing the stub-file semantics this approach would be pretty simple, I think. The trickier part is the implicit-reexport rule, where we do want the type from the definition, plus some metadata that allows us to emit a diagnostic. To do this, we need symbol_from_bindings, symbol_from_declarations, and symbol to be able to pass back the metadata via Symbol that "this type came from, or maybe came from, a not-re-exported definition", and then we do at least need the import handling code in TypeInferenceBuilder to inspect that metadata and emit a diagnostic.

I'm very tempted to just consider the opt-in implicit-reexport rule out of scope for now, because it's adding a lot of complexity here, and I think the much more important thing right now is to fix the builtins.pyi problem. The ability to also opt-in to this rule in diagnostic form for non-stub files is nice to have, but not high priority right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to go with Carl's suggestion here; he's looked at this more closely than I have and it sounds very reasonable!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feeling I've had in looking at this PR is that this should be metadata on a Definition (possibly actually just on the Import and ImportFrom variants of DefinitionKind, which might save some space if those are not currently the largest variants). Then it would be handled in symbol_from_bindings and symbol_from_declarations (where we'd ignore non-re-exported definitions if querying a public global symbol from a stub file). In general, I think we should be looking for ways for this metadata to spread less widely, not more widely, because the cases where it is relevant are limited. I don't think it needs to travel with Types in general, and I think ideally code in TypeInferenceBuilder would be totally unaffected by this change.

I looked into implementing based on this comment, I originally thought of this as well but I didn’t pursue this because of the requirement of inferring the types.

To give context, the plan is to define a is_reexported field on both ImportDefinitionKind and ImportFromDefinitionKind and use that as Definition::is_reeexported which would default to true for non-import definitions.

This will be then used in symbol_from_declarations and symbol_from_bindings to filter out the definitions. This creates a problem as both functions does not know whether the symbol is coming from the current module or from another module. This means that for the following code (file other.pyi):

from typing import Any

# error: Name `Any` used when not defined
reveal_type(Any)

The reason being that while looking up the Any symbol in the reaveal_type call, we’ll first encounter the import definition which doesn’t re-export the symbol but the symbol_from_* does not know that this symbol is coming from the current module itself. We’ll need to pass additional info for the symbol_from_* functions to only consider the re-exports from another module.

Implementation diff: https://github.com/astral-sh/ruff/compare/dhruv/re-export-2?expand=1


Given the above context, I took a stab at addressing some of the short comings for conditional re-exports via the existing implementation and I think I've addressed Carl's feedback from #15848 (comment).

I've added four more examples to test conditional re-exports.

@dhruvmanila dhruvmanila force-pushed the dhruv/re-export branch 3 times, most recently from 90b0e80 to 85f5fd0 Compare February 7, 2025 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
4 participants