-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propose ADR 0001: Remove isinstance checks when setting parameters #153
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,68 @@ | ||||||
# ADR 0001: Remove isinstance checks when setting parameters | ||||||
|
||||||
- Status: proposed | ||||||
- Deciders: Jan-Lukas, Neil, Simon | ||||||
- Date: 2024-04-15 | ||||||
|
||||||
## Context | ||||||
|
||||||
Sciline builds a data dependency graph based on type hints of callables. | ||||||
Dependencies can be fulfilled by setting values (instances of classes) as so called *parameters*. | ||||||
In an attempt to extend the correctness guarantees of the dependency graph, Sciline's `__setitem__` checks if the value is instance of the key (a type) when setting a parameter. | ||||||
|
||||||
This has led to a number of problems. | ||||||
For example, supporting different file handles types is too difficult [#140](https://github.com/scipp/sciline/issues/140), | ||||||
parameter type handling is too inflexible in general [#144](https://github.com/scipp/sciline/issues/144), | ||||||
and the mechanism is broken with Python 3.12 type aliases [#145](https://github.com/scipp/sciline/issues/145). | ||||||
In short, the mechanism gets in the way of the user, since it causes false positives. | ||||||
|
||||||
Considering the bigger picture, we can think of this mechanism as a poor man's form of *validation*. | ||||||
Validation of input parameters is very important when running workflows, but it should be done in a more explicit way. | ||||||
Validating the type is only a fraction of what we want to do when validating parameters. | ||||||
Therefore, we should remove this mechanism and replace it with a more general validation mechanism. | ||||||
The more general validation mechanism can be considered out of scope for Sciline, and should be implemented in the user code or using other common libraries such as `pydantic`. | ||||||
|
||||||
Finally, we can think of this mechanism as a form of runtime type checking. | ||||||
We should ask ourselves if this is the intended scope of Sciline. | ||||||
If it is, shouldn't we also check that each provider actually returns the correct type? | ||||||
Comment on lines
+26
to
+27
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you reformulate to turn it into a statement that this is not the scope of sciline? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess I should put that statement into the "Decision" section? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes |
||||||
|
||||||
The main problem with not checking value types when setting parameters is that it is not possible to catch such errors with `mypy`, in contrast to return values of providers, which `mypy` *can* check. | ||||||
|
||||||
Consider the following example of setting $Q$ bins for a workflow, given by a `scipp.Variable`, which would then be passed to `scipp.hist` to create a histogram: | ||||||
|
||||||
```python | ||||||
pipeline[QBins] = sc.linspace(...) | ||||||
pipeline[QBins] = 1000 # error in current implementation | ||||||
pipeline[QBins] = sc.linspace(..., unit='m') # no error, but wrong unit | ||||||
``` | ||||||
|
||||||
Checking the type catches the first error, but not the second. | ||||||
Paradoxically, setting an integer would often be a valid operation in the example, since `scipp.hist` can handle this case, whereas the wrong unit would not be valid. | ||||||
This may indicate that defining `QBins` as an alias of `scipp.Variable` is actually an anti-pattern. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
It's a too specific case to call it a pattern. |
||||||
Instead, imagine we have defined a specific `class QBins`, which performs validation in its constructor, and defines `__call__` so it can be used as a provider: | ||||||
|
||||||
```python | ||||||
pipeline.insert(QBins(sc.linspace(...))) | ||||||
pipeline.insert(QBins(1000)) # ok | ||||||
pipeline.insert(QBins(sc.linspace(..., unit='m'))) # error constructing QBins | ||||||
``` | ||||||
|
||||||
This example illustrates that a clearer and more specific expression of intent can avoid the need for relying on checking the type of the value when setting a parameter. | ||||||
|
||||||
## Decision | ||||||
|
||||||
- Remove the mechanism that checks if a value is an instance of the key when setting it as a parameter. | ||||||
- Encourage users to validate inputs in providers, which can also be tested in unit tests without setting up the full workflow. | ||||||
- Encourage users to use a more general parameter validation mechanism using other libraries. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we can also provide a small framework to help users carry out validation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My plan is to add something basic in |
||||||
- Consider adding a mechanism to inject a callable to use for parameter validation as a argument when creating a `Pipeline`. | ||||||
|
||||||
## Consequences | ||||||
|
||||||
### Positive | ||||||
|
||||||
- The mechanism will no longer get in the way of the user. | ||||||
- The code will be simplified slightly. | ||||||
|
||||||
### Negative | ||||||
|
||||||
- `sciline.Pipeline` will support duck-typing for parameters, in a way that cannot be checked with `mypy`. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Architecture Decision Records | ||
|
||
```{toctree} | ||
--- | ||
maxdepth: 1 | ||
glob: true | ||
--- | ||
|
||
adr/* | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is a good idea. It would not solve #140 and #144. E.g., this
would work but it would fail during computation.
As I see it, this introduces duck-typing for functions that are annotated with strict type annotations and thus violate the assumptions of the author of the function.
To give another example, a function annotated to take a
DataGroup
may be able to work with anyMapping
. But that would be coincidental and an implementation detail because the function is allowed to use the full interface ofDataGroup
.So I think this proposal only solves #145.
You can argue that at runtime, we strictly speaking always have duck-typing. But that can be checked by a type checker. What you are proposing is uncheckable, as far as I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it violates the assumptions of the author. Unless we explicitly promise to validate types, users should assume the same level of type checking as anywhere else in python land (that is, duck typing).
I think you're right that setting parameter values would be difficult to check by a type checker.Not sure.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the argument. Don't you have the same problem with providers that lie about their return type? Is your concern that providers can be checked with
mypy
butPipeline.__setitem__
cannot?Regarding
DataGroup
, in general I would argue that the use of domain types (type aliases) is a way of encoding a potentially very complex contract into a brief name. Does this absolve the consumer from checking relevant parts of the contract? Is it not allowed to fail if the contract is violated? I would see the type as part of the contract.Not sure why you think #144 is not solved. We can argue about #140, but wait for ADR-0002 where I will propose to remove special handling of
Optional
andUnion
.