Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More precisely type pipe methods #10038

Merged
merged 12 commits into from
Feb 19, 2025
Merged

Conversation

chuckwondo
Copy link
Contributor

@chuckwondo chuckwondo commented Feb 7, 2025

Improved precision of type annotations on pipe methods to address shortcomings described in #9997.

  • Added pytest plugin that supports testing type annotations, along with relevant tests that use the plugin

  • Bumped mypy version, which includes a fix to something that previously required a few "type: ignore" comments, which I was able to remove

  • Combined mypy and mypy-min jobs into a single matrix job to avoid duplication

  • Added pytest mypy tests to mypy job

  • Improved mypy pre-commit hook, by reducing nearly 100 errors/warnings (when hook was run manually) to 6, but didn't manage to resolve the final 6 errors, as I was spending too much time on it. I suspect this hook has been "broken" for quite some time, but unnoticed, given that the hook must be run manually (and is not triggered in CI), so I'm wondering if this hook should be removed, particularly since 'act' can now be used to locally run the mypy github job (per changes added to this PR)

  • Closes Improve type signatures of pipe methods to enable type checkers to flag erroneous usages #9997

  • Tests added

Copy link

welcome bot commented Feb 7, 2025

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good, new testing approach looks super; thank you!

Left a couple of comments

Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a scope creep going on, but it's a good direction, thanks!

def pipe(
self,
func: Callable[..., T] | tuple[Callable[..., T], str],
func: tuple[Callable[..., T], str],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not this as well?

Suggested change
func: tuple[Callable[..., T], str],
func: tuple[Callable[P, T], str]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot do that because when we pass the function as part of a tuple we don't have enough information to precisely type the function's parameters.

In your suggested change, the ParamSpec P represents all of the function's parameters, but we need to know all of the parameters excluding one (the one that takes the data value, as identified by the name given in the second value of the tuple).

If that's as clear as mud, let me try to clarify with more detail.

In the first form, where we pass only a function as the first argument to pipe, we expect the function to take the data value as the first argument, meaning that we know exactly where in the function's list of parameters the data parameter is: the first position.

This means we can type the function more precisely, like so, indicating that the data parameter is first, concatenated with zero or more positional and keyword parameters (and returning a value of some type T):

# Self is a DataWithCoords (DataArray, Dataset, others?), or DataTree
# P represents all parameters to func *excluding* Self
func: Callable[Concatenate[Self, P], T]

Therefore, after passing func to pipe, we must pass all arguments except for the data (self) argument, and this is represented by P, which excludes Self:

# pipe expects a function followed by all arguments to pass to the function,
# *except* for the data argument, which pipe will *implicitly* pass.
def pipe(f: Callable[Concatenate[Self, P], T], *args: P.args, **kwargs: P.kwargs) -> T: ...

However, when we pass a function/keyword 2-tuple as the first argument to pipe, we have no idea what position the keyword parameter is in func's type signature. We only know that it's not the first parameter.

This means, we cannot do the following, as you suggest, because in this case P includes the data parameter, but it must not, and there's no way of omitting it without knowing precisely what position it's in:

# We don't know where Self is within P, so we have no way of defining P
# as meaning all of func's parameters *except* for Self. Therefore, this
# signature indicates that we can *explicity* pass a data argument, but that's
# not correct.
def pipe(func: tuple[Callable[P, T], str], *args: P.args, **kwargs: P.kwargs) -> T: ...

To clarify, although Callable[P, T] is valid for the function itself within the tuple, using *args: P.args, **kwargs: P.kwargs for the rest of the parameters to pipe in this case is incorrect, because it means that we can explicitly pass the data argument in the mix there (because P includes Self), but the whole point of pipe, of course, is to implicitly pass the data argument, and thus not allow it to be passed explicitly.

Technically speaking, as far as mypy is concerned, your suggestion probably make no difference from what I propose, but in terms of the information it conveys to the reader, it is incorrect.

This is why I discourage the use of the tuple form, and instead recommend the use of a lambda (or another function def with args reordered such that the data arg is first). Even using a lambda to reorder things allows mypy to be more helpful than is possible with the tuple form.

For example (taken from the test cases in xarray/tests/test_dataset_typing.yml in this PR):

    from xarray import Dataset

    def f(arg: int, ds: Dataset) -> Dataset:
        return ds

    # Since we cannot provide a precise type annotation when passing a tuple to
    # pipe, there's not enough information for type analysis to indicate that
    # we are missing an argument for parameter `arg`, so we get no error here.

    ds = Dataset().pipe((f, "ds"))
    reveal_type(ds)  # N: Revealed type is "xarray.core.dataset.Dataset"

    # Rather than passing a tuple, passing a lambda that calls `f` with args in
    # the correct order allows for proper type analysis, indicating (perhaps
    # somewhat cryptically) that we failed to pass an argument for `arg`.

    ds = Dataset().pipe(lambda data, arg: f(arg, data))

    # mypy produces the following output for the line above, as it should,
    # indicating that we forgot to pass an argument to pipe, which pipe needs
    # to pass to `f` in the 2nd position.
    error: No overload variant of "pipe" of "DataWithCoords" matches argument type "Callable[[Any, Any], Dataset]"  [call-overload]
    note: Possible overload variants:
    note:     def [P`9, T] pipe(self, func: Callable[[Dataset, **P], T], *args: P.args, **kwargs: P.kwargs) -> T
    note:     def [T] pipe(self, func: tuple[Callable[..., T], str], *args: Any, **kwargs: Any) -> T

IMHO, the tuple form should not be supported, for this very reason, but I don't expect that deprecating that form would get much traction from anybody else.

@chuckwondo
Copy link
Contributor Author

Bit of a scope creep going on, but it's a good direction, thanks!

Sorry, I'm not following. Would you mind clarifying?

@headtr1ck
Copy link
Collaborator

Bit of a scope creep going on, but it's a good direction, thanks!

Sorry, I'm not following. Would you mind clarifying?

Ah sorry, just that the initial issue was only about the pipe typing and now mainly ci stuff was changed that probably should go into it's own PR for easier reviewing.

@headtr1ck
Copy link
Collaborator

Are we sure about pytest-mypy? There has not been a release for over 2 years. Is there any other larger project using this?

@chuckwondo
Copy link
Contributor Author

Bit of a scope creep going on, but it's a good direction, thanks!

Sorry, I'm not following. Would you mind clarifying?

Ah sorry, just that the initial issue was only about the pipe typing and now mainly ci stuff was changed that probably should go into it's own PR for easier reviewing.

Thanks for clarifying. In part, that was because I wasn't sure how else to wire up these changes and related tests, and didn't want to write duplicate lines in the ci file.

@chuckwondo
Copy link
Contributor Author

Are we sure about pytest-mypy? There has not been a release for over 2 years. Is there any other larger project using this?

That's a different one. I've pulled in this one, which was most recently released Dec 2024: https://pypi.org/project/pytest-mypy-plugins/

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good. I would still like to hold the line on ensuring that running basic commands works, I left one comment. But otherwise, great to merge from me!

Comment on lines 157 to 164
# As noted in the comment on the previous step, we must run type annotation tests
# separately, and we will run them even if the preceding tests failed. Further,
# we must restrict these tests to run only when matrix.env is empty, as this is
# the only case when all of the necessary dependencies are included such that
# spurious mypy errors due to missing packages are eliminated.
- name: Run mypy tests
if: ${{ always() && matrix.env == '' }}
run: python -m pytest xarray/tests/test_*.yml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite committed to ensuring that running a simple pytest works! I really think it's important that running tests doesn't require looking up various commands in .github/workflows/ci.yaml

If we want to have tests that don't run by default, then we can use pytest marks and run them with an additional options; check out --run-nightly for an example...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took a bit of digging, but I think I've made the changes you're looking for. If not, please provide specific guidance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you v much

numpy,
]
additional_dependencies:
# Type stubs plus additional dependencies from ci/requirements/environment.yml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these, that's helpful

@chuckwondo chuckwondo requested a review from max-sixty February 19, 2025 00:10
# no explicit test functions on which we can apply a pytest.mark.mypy
# decorator. Therefore, we mark them via this name-based, automatic
# marking approach, meaning that each test case must contain "mypy" in the
# name.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, not ideal to use the test name, but v nice comment

@max-sixty max-sixty merged commit 0caf096 into pydata:main Feb 19, 2025
31 checks passed
Copy link

welcome bot commented Feb 19, 2025

Congratulations on completing your first pull request! Welcome to Xarray! We are proud of you, and hope to see you again! celebration gif

@max-sixty
Copy link
Collaborator

Excellent, thank you very much @chuckwondo ! Any other follow-ups very welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve type signatures of pipe methods to enable type checkers to flag erroneous usages
4 participants