More precisely type `pipe` methods #10038

chuckwondo · 2025-02-07T22:01:05Z

Improved precision of type annotations on pipe methods to address shortcomings described in #9997.

Added pytest plugin that supports testing type annotations, along with relevant tests that use the plugin
Bumped mypy version, which includes a fix to something that previously required a few "type: ignore" comments, which I was able to remove
Combined mypy and mypy-min jobs into a single matrix job to avoid duplication
Added pytest mypy tests to mypy job
Improved mypy pre-commit hook, by reducing nearly 100 errors/warnings (when hook was run manually) to 6, but didn't manage to resolve the final 6 errors, as I was spending too much time on it. I suspect this hook has been "broken" for quite some time, but unnoticed, given that the hook must be run manually (and is not triggered in CI), so I'm wondering if this hook should be removed, particularly since 'act' can now be used to locally run the mypy github job (per changes added to this PR)
Closes Improve type signatures of pipe methods to enable type checkers to flag erroneous usages #9997
Tests added

welcome · 2025-02-07T22:01:08Z

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

max-sixty

Looks really good, new testing approach looks super; thank you!

Left a couple of comments

.github/workflows/ci-additional.yaml

xarray/core/groupby.py

headtr1ck

Bit of a scope creep going on, but it's a good direction, thanks!

headtr1ck · 2025-02-08T13:52:12Z

xarray/core/common.py

    def pipe(
        self,
-        func: Callable[..., T] | tuple[Callable[..., T], str],
+        func: tuple[Callable[..., T], str],


Why not this as well?

Suggested change

func: tuple[Callable[..., T], str],

func: tuple[Callable[P, T], str]

We cannot do that because when we pass the function as part of a tuple we don't have enough information to precisely type the function's parameters.

In your suggested change, the ParamSpec P represents all of the function's parameters, but we need to know all of the parameters excluding one (the one that takes the data value, as identified by the name given in the second value of the tuple).

If that's as clear as mud, let me try to clarify with more detail.

In the first form, where we pass only a function as the first argument to pipe, we expect the function to take the data value as the first argument, meaning that we know exactly where in the function's list of parameters the data parameter is: the first position.

This means we can type the function more precisely, like so, indicating that the data parameter is first, concatenated with zero or more positional and keyword parameters (and returning a value of some type T):

# Self is a DataWithCoords (DataArray, Dataset, others?), or DataTree # P represents all parameters to func *excluding* Self func: Callable[Concatenate[Self, P], T]

Therefore, after passing func to pipe, we must pass all arguments except for the data (self) argument, and this is represented by P, which excludes Self:

# pipe expects a function followed by all arguments to pass to the function, # *except* for the data argument, which pipe will *implicitly* pass. def pipe(f: Callable[Concatenate[Self, P], T], *args: P.args, **kwargs: P.kwargs) -> T: ...

However, when we pass a function/keyword 2-tuple as the first argument to pipe, we have no idea what position the keyword parameter is in func's type signature. We only know that it's not the first parameter.

This means, we cannot do the following, as you suggest, because in this case P includes the data parameter, but it must not, and there's no way of omitting it without knowing precisely what position it's in:

# We don't know where Self is within P, so we have no way of defining P # as meaning all of func's parameters *except* for Self. Therefore, this # signature indicates that we can *explicity* pass a data argument, but that's # not correct. def pipe(func: tuple[Callable[P, T], str], *args: P.args, **kwargs: P.kwargs) -> T: ...

To clarify, although Callable[P, T] is valid for the function itself within the tuple, using *args: P.args, **kwargs: P.kwargs for the rest of the parameters to pipe in this case is incorrect, because it means that we can explicitly pass the data argument in the mix there (because P includes Self), but the whole point of pipe, of course, is to implicitly pass the data argument, and thus not allow it to be passed explicitly.

Technically speaking, as far as mypy is concerned, your suggestion probably make no difference from what I propose, but in terms of the information it conveys to the reader, it is incorrect.

This is why I discourage the use of the tuple form, and instead recommend the use of a lambda (or another function def with args reordered such that the data arg is first). Even using a lambda to reorder things allows mypy to be more helpful than is possible with the tuple form.

For example (taken from the test cases in xarray/tests/test_dataset_typing.yml in this PR):

from xarray import Dataset def f(arg: int, ds: Dataset) -> Dataset: return ds # Since we cannot provide a precise type annotation when passing a tuple to # pipe, there's not enough information for type analysis to indicate that # we are missing an argument for parameter `arg`, so we get no error here. ds = Dataset().pipe((f, "ds")) reveal_type(ds) # N: Revealed type is "xarray.core.dataset.Dataset" # Rather than passing a tuple, passing a lambda that calls `f` with args in # the correct order allows for proper type analysis, indicating (perhaps # somewhat cryptically) that we failed to pass an argument for `arg`. ds = Dataset().pipe(lambda data, arg: f(arg, data)) # mypy produces the following output for the line above, as it should, # indicating that we forgot to pass an argument to pipe, which pipe needs # to pass to `f` in the 2nd position. error: No overload variant of "pipe" of "DataWithCoords" matches argument type "Callable[[Any, Any], Dataset]" [call-overload] note: Possible overload variants: note: def [P`9, T] pipe(self, func: Callable[[Dataset, **P], T], *args: P.args, **kwargs: P.kwargs) -> T note: def [T] pipe(self, func: tuple[Callable[..., T], str], *args: Any, **kwargs: Any) -> T

IMHO, the tuple form should not be supported, for this very reason, but I don't expect that deprecating that form would get much traction from anybody else.

chuckwondo · 2025-02-11T18:22:33Z

Bit of a scope creep going on, but it's a good direction, thanks!

Sorry, I'm not following. Would you mind clarifying?

headtr1ck · 2025-02-11T18:52:19Z

Bit of a scope creep going on, but it's a good direction, thanks!

Sorry, I'm not following. Would you mind clarifying?

Ah sorry, just that the initial issue was only about the pipe typing and now mainly ci stuff was changed that probably should go into it's own PR for easier reviewing.

headtr1ck · 2025-02-11T18:54:12Z

Are we sure about pytest-mypy? There has not been a release for over 2 years. Is there any other larger project using this?

chuckwondo · 2025-02-11T19:11:07Z

Bit of a scope creep going on, but it's a good direction, thanks!

Sorry, I'm not following. Would you mind clarifying?

Ah sorry, just that the initial issue was only about the pipe typing and now mainly ci stuff was changed that probably should go into it's own PR for easier reviewing.

Thanks for clarifying. In part, that was because I wasn't sure how else to wire up these changes and related tests, and didn't want to write duplicate lines in the ci file.

chuckwondo · 2025-02-11T19:17:49Z

Are we sure about pytest-mypy? There has not been a release for over 2 years. Is there any other larger project using this?

That's a different one. I've pulled in this one, which was most recently released Dec 2024: https://pypi.org/project/pytest-mypy-plugins/

Mypy 1.15 includes fix for <python/mypy#9031>, allowing several "type: ignore" comments to be removed.

In addition, enhance mypy job configuration to support running it locally via `act`. Fixes pydata#9997

max-sixty

Looks really good. I would still like to hold the line on ensuring that running basic commands works, I left one comment. But otherwise, great to merge from me!

max-sixty · 2025-02-14T22:43:27Z

.github/workflows/ci.yaml

+      # As noted in the comment on the previous step, we must run type annotation tests
+      # separately, and we will run them even if the preceding tests failed.  Further,
+      # we must restrict these tests to run only when matrix.env is empty, as this is
+      # the only case when all of the necessary dependencies are included such that
+      # spurious mypy errors due to missing packages are eliminated.
+      - name: Run mypy tests
+        if: ${{ always() && matrix.env == '' }}
+        run: python -m pytest xarray/tests/test_*.yml


I'm quite committed to ensuring that running a simple pytest works! I really think it's important that running tests doesn't require looking up various commands in .github/workflows/ci.yaml

If we want to have tests that don't run by default, then we can use pytest marks and run them with an additional options; check out --run-nightly for an example...

It took a bit of digging, but I think I've made the changes you're looking for. If not, please provide specific guidance.

Looks great, thank you v much

max-sixty · 2025-02-14T22:44:14Z

.pre-commit-config.yaml

-            numpy,
-          ]
+        additional_dependencies:
+          # Type stubs plus additional dependencies from ci/requirements/environment.yml


Thanks for adding these, that's helpful

.pre-commit-config.yaml

max-sixty · 2025-02-19T01:07:38Z

conftest.py

+            # no explicit test functions on which we can apply a pytest.mark.mypy
+            # decorator.  Therefore, we mark them via this name-based, automatic
+            # marking approach, meaning that each test case must contain "mypy" in the
+            # name.


OK, not ideal to use the test name, but v nice comment

welcome · 2025-02-19T01:11:41Z

Congratulations on completing your first pull request! Welcome to Xarray! We are proud of you, and hope to see you again!

max-sixty · 2025-02-19T01:11:57Z

Excellent, thank you very much @chuckwondo ! Any other follow-ups very welcome!

max-sixty reviewed Feb 7, 2025

View reviewed changes

headtr1ck reviewed Feb 8, 2025

View reviewed changes

TomNicholas added the topic-typing label Feb 10, 2025

chuckwondo added 8 commits February 14, 2025 11:55

Upgrade mypy to 1.15

54475ed

Mypy 1.15 includes fix for <python/mypy#9031>, allowing several "type: ignore" comments to be removed.

Add type annotations to DataTree.pipe tests

369bb32

More precisely type pipe methods.

9815d61

In addition, enhance mypy job configuration to support running it locally via `act`. Fixes pydata#9997

Pin mypy to 1.15 in CI

e28f100

Revert mypy CI job changes

73f92c9

Add pytest-mypy-plugin and typestub packages

5d859b2

Add pytest-mypy-plugins to all conda env files

a6f9ef5

Remove dup pandas-stubs dep

95084f1

chuckwondo force-pushed the pipe-type-annotations branch from 9a3f95e to 95084f1 Compare February 14, 2025 18:33

chuckwondo requested review from max-sixty and headtr1ck February 14, 2025 20:23

max-sixty reviewed Feb 14, 2025

View reviewed changes

chuckwondo added 4 commits February 18, 2025 18:33

Revert pre-commit config changes

d92b992

Place mypy tests behind pytest mypy marker

a9307cf

Set default pytest numprocesses to 4

83b59fa

Ignore pytest-mypy-plugins for min version check

6d6083e

chuckwondo requested a review from max-sixty February 19, 2025 00:10

max-sixty reviewed Feb 19, 2025

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

max-sixty reviewed Feb 19, 2025

View reviewed changes

max-sixty merged commit 0caf096 into pydata:main Feb 19, 2025
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More precisely type `pipe` methods #10038

More precisely type `pipe` methods #10038

chuckwondo commented Feb 7, 2025 •

edited

Loading

welcome bot commented Feb 7, 2025

max-sixty left a comment

headtr1ck left a comment

headtr1ck Feb 8, 2025

chuckwondo Feb 11, 2025

chuckwondo commented Feb 11, 2025

headtr1ck commented Feb 11, 2025

headtr1ck commented Feb 11, 2025

chuckwondo commented Feb 11, 2025

chuckwondo commented Feb 11, 2025

max-sixty left a comment

max-sixty Feb 14, 2025

chuckwondo Feb 19, 2025

max-sixty Feb 19, 2025

max-sixty Feb 14, 2025

max-sixty Feb 19, 2025

welcome bot commented Feb 19, 2025

max-sixty commented Feb 19, 2025

	func: tuple[Callable[..., T], str],
	func: tuple[Callable[P, T], str]

More precisely type pipe methods #10038

More precisely type pipe methods #10038

Conversation

chuckwondo commented Feb 7, 2025 • edited Loading

welcome bot commented Feb 7, 2025

max-sixty left a comment

Choose a reason for hiding this comment

headtr1ck left a comment

Choose a reason for hiding this comment

headtr1ck Feb 8, 2025

Choose a reason for hiding this comment

chuckwondo Feb 11, 2025

Choose a reason for hiding this comment

chuckwondo commented Feb 11, 2025

headtr1ck commented Feb 11, 2025

headtr1ck commented Feb 11, 2025

chuckwondo commented Feb 11, 2025

chuckwondo commented Feb 11, 2025

max-sixty left a comment

Choose a reason for hiding this comment

max-sixty Feb 14, 2025

Choose a reason for hiding this comment

chuckwondo Feb 19, 2025

Choose a reason for hiding this comment

max-sixty Feb 19, 2025

Choose a reason for hiding this comment

max-sixty Feb 14, 2025

Choose a reason for hiding this comment

max-sixty Feb 19, 2025

Choose a reason for hiding this comment

welcome bot commented Feb 19, 2025

max-sixty commented Feb 19, 2025

More precisely type `pipe` methods #10038

More precisely type `pipe` methods #10038

chuckwondo commented Feb 7, 2025 •

edited

Loading