-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More precisely type pipe
methods
#10038
Conversation
Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good, new testing approach looks super; thank you!
Left a couple of comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit of a scope creep going on, but it's a good direction, thanks!
def pipe( | ||
self, | ||
func: Callable[..., T] | tuple[Callable[..., T], str], | ||
func: tuple[Callable[..., T], str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not this as well?
func: tuple[Callable[..., T], str], | |
func: tuple[Callable[P, T], str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot do that because when we pass the function as part of a tuple we don't have enough information to precisely type the function's parameters.
In your suggested change, the ParamSpec P
represents all of the function's parameters, but we need to know all of the parameters excluding one (the one that takes the data value, as identified by the name given in the second value of the tuple).
If that's as clear as mud, let me try to clarify with more detail.
In the first form, where we pass only a function as the first argument to pipe
, we expect the function to take the data value as the first argument, meaning that we know exactly where in the function's list of parameters the data parameter is: the first position.
This means we can type the function more precisely, like so, indicating that the data parameter is first, concatenated with zero or more positional and keyword parameters (and returning a value of some type T
):
# Self is a DataWithCoords (DataArray, Dataset, others?), or DataTree
# P represents all parameters to func *excluding* Self
func: Callable[Concatenate[Self, P], T]
Therefore, after passing func
to pipe
, we must pass all arguments except for the data (self) argument, and this is represented by P
, which excludes Self
:
# pipe expects a function followed by all arguments to pass to the function,
# *except* for the data argument, which pipe will *implicitly* pass.
def pipe(f: Callable[Concatenate[Self, P], T], *args: P.args, **kwargs: P.kwargs) -> T: ...
However, when we pass a function/keyword 2-tuple as the first argument to pipe
, we have no idea what position the keyword parameter is in func
's type signature. We only know that it's not the first parameter.
This means, we cannot do the following, as you suggest, because in this case P
includes the data parameter, but it must not, and there's no way of omitting it without knowing precisely what position it's in:
# We don't know where Self is within P, so we have no way of defining P
# as meaning all of func's parameters *except* for Self. Therefore, this
# signature indicates that we can *explicity* pass a data argument, but that's
# not correct.
def pipe(func: tuple[Callable[P, T], str], *args: P.args, **kwargs: P.kwargs) -> T: ...
To clarify, although Callable[P, T]
is valid for the function itself within the tuple, using *args: P.args, **kwargs: P.kwargs
for the rest of the parameters to pipe
in this case is incorrect, because it means that we can explicitly pass the data argument in the mix there (because P
includes Self
), but the whole point of pipe
, of course, is to implicitly pass the data argument, and thus not allow it to be passed explicitly.
Technically speaking, as far as mypy is concerned, your suggestion probably make no difference from what I propose, but in terms of the information it conveys to the reader, it is incorrect.
This is why I discourage the use of the tuple form, and instead recommend the use of a lambda
(or another function def with args reordered such that the data arg is first). Even using a lambda
to reorder things allows mypy to be more helpful than is possible with the tuple form.
For example (taken from the test cases in xarray/tests/test_dataset_typing.yml
in this PR):
from xarray import Dataset
def f(arg: int, ds: Dataset) -> Dataset:
return ds
# Since we cannot provide a precise type annotation when passing a tuple to
# pipe, there's not enough information for type analysis to indicate that
# we are missing an argument for parameter `arg`, so we get no error here.
ds = Dataset().pipe((f, "ds"))
reveal_type(ds) # N: Revealed type is "xarray.core.dataset.Dataset"
# Rather than passing a tuple, passing a lambda that calls `f` with args in
# the correct order allows for proper type analysis, indicating (perhaps
# somewhat cryptically) that we failed to pass an argument for `arg`.
ds = Dataset().pipe(lambda data, arg: f(arg, data))
# mypy produces the following output for the line above, as it should,
# indicating that we forgot to pass an argument to pipe, which pipe needs
# to pass to `f` in the 2nd position.
error: No overload variant of "pipe" of "DataWithCoords" matches argument type "Callable[[Any, Any], Dataset]" [call-overload]
note: Possible overload variants:
note: def [P`9, T] pipe(self, func: Callable[[Dataset, **P], T], *args: P.args, **kwargs: P.kwargs) -> T
note: def [T] pipe(self, func: tuple[Callable[..., T], str], *args: Any, **kwargs: Any) -> T
IMHO, the tuple form should not be supported, for this very reason, but I don't expect that deprecating that form would get much traction from anybody else.
Sorry, I'm not following. Would you mind clarifying? |
Ah sorry, just that the initial issue was only about the pipe typing and now mainly ci stuff was changed that probably should go into it's own PR for easier reviewing. |
Are we sure about |
Thanks for clarifying. In part, that was because I wasn't sure how else to wire up these changes and related tests, and didn't want to write duplicate lines in the ci file. |
That's a different one. I've pulled in this one, which was most recently released Dec 2024: https://pypi.org/project/pytest-mypy-plugins/ |
Mypy 1.15 includes fix for <python/mypy#9031>, allowing several "type: ignore" comments to be removed.
In addition, enhance mypy job configuration to support running it locally via `act`. Fixes pydata#9997
9a3f95e
to
95084f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good. I would still like to hold the line on ensuring that running basic commands works, I left one comment. But otherwise, great to merge from me!
.github/workflows/ci.yaml
Outdated
# As noted in the comment on the previous step, we must run type annotation tests | ||
# separately, and we will run them even if the preceding tests failed. Further, | ||
# we must restrict these tests to run only when matrix.env is empty, as this is | ||
# the only case when all of the necessary dependencies are included such that | ||
# spurious mypy errors due to missing packages are eliminated. | ||
- name: Run mypy tests | ||
if: ${{ always() && matrix.env == '' }} | ||
run: python -m pytest xarray/tests/test_*.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite committed to ensuring that running a simple pytest
works! I really think it's important that running tests doesn't require looking up various commands in .github/workflows/ci.yaml
If we want to have tests that don't run by default, then we can use pytest marks and run them with an additional options; check out --run-nightly
for an example...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took a bit of digging, but I think I've made the changes you're looking for. If not, please provide specific guidance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you v much
.pre-commit-config.yaml
Outdated
numpy, | ||
] | ||
additional_dependencies: | ||
# Type stubs plus additional dependencies from ci/requirements/environment.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding these, that's helpful
# no explicit test functions on which we can apply a pytest.mark.mypy | ||
# decorator. Therefore, we mark them via this name-based, automatic | ||
# marking approach, meaning that each test case must contain "mypy" in the | ||
# name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, not ideal to use the test name, but v nice comment
Excellent, thank you very much @chuckwondo ! Any other follow-ups very welcome! |
Improved precision of type annotations on
pipe
methods to address shortcomings described in #9997.Added pytest plugin that supports testing type annotations, along with relevant tests that use the plugin
Bumped mypy version, which includes a fix to something that previously required a few "type: ignore" comments, which I was able to remove
Combined
mypy
andmypy-min
jobs into a single matrix job to avoid duplicationAdded pytest mypy tests to
mypy
jobImproved mypy pre-commit hook, by reducing nearly 100 errors/warnings (when hook was run manually) to 6, but didn't manage to resolve the final 6 errors, as I was spending too much time on it. I suspect this hook has been "broken" for quite some time, but unnoticed, given that the hook must be run manually (and is not triggered in CI), so I'm wondering if this hook should be removed, particularly since 'act' can now be used to locally run the mypy github job (per changes added to this PR)
Closes Improve type signatures of
pipe
methods to enable type checkers to flag erroneous usages #9997Tests added