Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose ADR 0001: Remove isinstance checks when setting parameters #153

Merged
merged 4 commits into from
Apr 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is a good idea. It would not solve #140 and #144. E.g., this

def foo(p: Path) -> str:
  return p.read_text()

pl = Pipeline([foo], params={Path: "my/file"})
tg = pl.get(str)

would work but it would fail during computation.

As I see it, this introduces duck-typing for functions that are annotated with strict type annotations and thus violate the assumptions of the author of the function.
To give another example, a function annotated to take a DataGroup may be able to work with any Mapping. But that would be coincidental and an implementation detail because the function is allowed to use the full interface of DataGroup.

So I think this proposal only solves #145.

You can argue that at runtime, we strictly speaking always have duck-typing. But that can be checked by a type checker. What you are proposing is uncheckable, as far as I can tell.

Copy link
Contributor

@jokasimr jokasimr Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it violates the assumptions of the author. Unless we explicitly promise to validate types, users should assume the same level of type checking as anywhere else in python land (that is, duck typing).

I think you're right that setting parameter values would be difficult to check by a type checker. Not sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see it, this introduces duck-typing for functions that are annotated with strict type annotations and thus violate the assumptions of the author of the function.

Not sure I understand the argument. Don't you have the same problem with providers that lie about their return type? Is your concern that providers can be checked with mypy but Pipeline.__setitem__ cannot?

Regarding DataGroup, in general I would argue that the use of domain types (type aliases) is a way of encoding a potentially very complex contract into a brief name. Does this absolve the consumer from checking relevant parts of the contract? Is it not allowed to fail if the contract is violated? I would see the type as part of the contract.

Not sure why you think #144 is not solved. We can argue about #140, but wait for ADR-0002 where I will propose to remove special handling of Optional and Union.

Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# ADR 0001: Remove isinstance checks when setting parameters

- Status: proposed
- Deciders: Jan-Lukas, Neil, Simon
- Date: 2024-04-15

## Context

Sciline builds a data dependency graph based on type hints of callables.
Dependencies can be fulfilled by setting values (instances of classes) as so called *parameters*.
In an attempt to extend the correctness guarantees of the dependency graph, Sciline's `__setitem__` checks if the value is instance of the key (a type) when setting a parameter.

This has led to a number of problems.
For example, supporting different file handles types is too difficult [#140](https://github.com/scipp/sciline/issues/140),
parameter type handling is too inflexible in general [#144](https://github.com/scipp/sciline/issues/144),
and the mechanism is broken with Python 3.12 type aliases [#145](https://github.com/scipp/sciline/issues/145).
In short, the mechanism gets in the way of the user, since it causes false positives.

Considering the bigger picture, we can think of this mechanism as a poor man's form of *validation*.
Validation of input parameters is very important when running workflows, but it should be done in a more explicit way.
Validating the type is only a fraction of what we want to do when validating parameters.
Therefore, we should remove this mechanism and replace it with a more general validation mechanism.
The more general validation mechanism can be considered out of scope for Sciline, and should be implemented in the user code or using other common libraries such as `pydantic`.

Finally, we can think of this mechanism as a form of runtime type checking.
We should ask ourselves if this is the intended scope of Sciline.
If it is, shouldn't we also check that each provider actually returns the correct type?
Comment on lines +26 to +27
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reformulate to turn it into a statement that this is not the scope of sciline?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I should put that statement into the "Decision" section?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes


The main problem with not checking value types when setting parameters is that it is not possible to catch such errors with `mypy`, in contrast to return values of providers, which `mypy` *can* check.

Consider the following example of setting $Q$ bins for a workflow, given by a `scipp.Variable`, which would then be passed to `scipp.hist` to create a histogram:

```python
pipeline[QBins] = sc.linspace(...)
pipeline[QBins] = 1000 # error in current implementation
pipeline[QBins] = sc.linspace(..., unit='m') # no error, but wrong unit
```

Checking the type catches the first error, but not the second.
Paradoxically, setting an integer would often be a valid operation in the example, since `scipp.hist` can handle this case, whereas the wrong unit would not be valid.
This may indicate that defining `QBins` as an alias of `scipp.Variable` is actually an instance of an anti-pattern.
Instead, imagine we have defined a specific `class QBins`, which performs validation in its constructor, and defines `__call__` so it can be used as a provider:

```python
pipeline.insert(QBins(sc.linspace(...)))
pipeline.insert(QBins(1000)) # ok
pipeline.insert(QBins(sc.linspace(..., unit='m'))) # error constructing QBins
```

This example illustrates that a clearer and more specific expression of intent can avoid the need for relying on checking the type of the value when setting a parameter.

## Decision

- The core scope of Sciline is the definition of task graphs.
Type validation is not.
- Remove the mechanism that checks if a value is an instance of the key when setting it as a parameter.
- Encourage users to validate inputs in providers, which can also be tested in unit tests without setting up the full workflow.
- Encourage users to use a more general parameter validation mechanism using other libraries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can also provide a small framework to help users carry out validation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan is to add something basic in essreduce, which will also be used to auto-generate widgets for setup up workflows.

- Consider adding a mechanism to inject a callable to use for parameter validation as a argument when creating a `Pipeline`.

## Consequences

### Positive

- The mechanism will no longer get in the way of the user.
- The code will be simplified slightly.

### Negative

- `sciline.Pipeline` will support duck-typing for parameters, in a way that cannot be checked with `mypy`.
10 changes: 10 additions & 0 deletions docs/developer/architecture-decision-records.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Architecture Decision Records

```{toctree}
---
maxdepth: 1
glob: true
---

adr/*
```
1 change: 1 addition & 0 deletions docs/developer/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,5 @@ getting-started
coding-conventions
dependency-management
architecture-and-design/index
architecture-decision-records
```
Loading