Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting different file handle types is too difficult #140

Open
jl-wynen opened this issue Mar 4, 2024 · 12 comments
Open

Supporting different file handle types is too difficult #140

jl-wynen opened this issue Mar 4, 2024 · 12 comments

Comments

@jl-wynen
Copy link
Member

jl-wynen commented Mar 4, 2024

In the context of scipp/essreduce#8 we considered supporting opening files once and passing the file handle to the various loaders. Supporting this is tricky because there are many different file handle types:

open('..', 'rb', buffering=True) -> io.BufferedReader
open('..', 'rb', buffering=False) -> io.FileIO
io.BytesIO(...) -> io.BytesIO

So supporting these with the current system requires

File = NewType('File', io.FileIO)
BufferedFile = NewType('BufferedFile', io.BufferedReader)
Buffer = NewType('Buffer', io.BytesIO)

def load(file: Union[File, BufferedFile, Buffer]) -> X: ...

with open('..', 'rb') as f:
   sciline.Pipeline([load], params={BufferedFile: BufferedFile(f)

where the param key needs to be adjusted for the concrete type of file handle. This is bad UX and Python has a workaround. For normal functions, we would annotate them as

def ideal_load(file: typing.BinaryIO) -> X: ...

but that is not possible because Pipeline requires an exact type match for parameters, i.e.

File = NewType('File', typing.BinaryIO)
with open('..', 'rb') as f:
   sciline.Pipeline([load], params={File: File(f)

fails.

This gets exasperated when load needs to also support a file path and maybe an open h5py.File or scippnexus.File.

I don't know a good solution at this point. But automatic type conversion in parameters might help (as proposed as part of #139).

@SimonHeybrock
Copy link
Member

Maybe the type check in __setitem__ is not a good idea after all? Or would it be possible to relax the check to a base class or protocol?

I have also seen trouble with Python 3.12 type aliases in relation to that, but I need to try that again as I do not remember the details.

@YooSunYoung
Copy link
Member

YooSunYoung commented Mar 5, 2024

Maybe the type check in setitem is not a good idea after all? Or would it be possible to relax the check to a base class or protocol?

I think checking if the type if a subclass of the desired type should be enough.
It's also compatible with static type checkers.
Like...

NewInt = NewType("NewInt", int)

a: int = NewInt(1)  # pass the static type check
b: NewInt = 1  # does not pass the static type check

Hmm... but then the Union still won't work...

@jl-wynen
Copy link
Member Author

jl-wynen commented Mar 5, 2024

The IO objects are a bit strange. typing.BinaryIO is an abstract class. But the types in io don't inherit from it. I don't know how type checkers handle them. But a simple check for subclasses or implementations of protocols is not enough. Unless we define our own protocol.

@YooSunYoung
Copy link
Member

The IO objects are a bit strange. typing.BinaryIO is an abstract class. But the types in io don't inherit from it. I don't know how type checkers handle them. But a simple check for subclasses or implementations of protocols is not enough. Unless we define our own protocol.

Hmm.... Maybe we can define our own protocol/meta class in this particular case....

@MridulS
Copy link
Member

MridulS commented Apr 15, 2024

Some reading material: python/typing#829

@SimonHeybrock
Copy link
Member

The isinstance checks have been removed. Should this be closed, or is there more relevant discussion here (e.g., relating to passing mypy?).

@jl-wynen
Copy link
Member Author

I would appreciate some advice on how to implement the affected functions. Should we support str | os.PathLike[str] | BytesIO | StringIO | ... but define the domain type as an alias of str? If so, then the loader provider will not pass mypy. And mypy won't inform us if we use the 'path' incorrectly, e.g., call open(filename) which does not work with file-like objects.

@SimonHeybrock
Copy link
Member

I would appreciate some advice on how to implement the affected functions. Should we support str | os.PathLike[str] | BytesIO | StringIO | ...

Can't we define that as the domain type?

but define the domain type as an alias of str? If so, then the loader provider will not pass mypy. And mypy won't inform us if we use the 'path' incorrectly, e.g., call open(filename) which does not work with file-like objects.

@jl-wynen
Copy link
Member Author

I would appreciate some advice on how to implement the affected functions. Should we support str | os.PathLike[str] | BytesIO | StringIO | ...

Can't we define that as the domain type?

This is not allowed:

from typing import NewType
from pathlib import Path
Filename = NewType("Filename", str | Path)

gives this error with mypy:

Argument 2 to NewType(...) must be subclassable (got "str | Path")  [valid-newtype]

See https://mypy.readthedocs.io/en/stable/error_code_list.html#check-the-target-of-newtype-valid-newtype

Using a type alias does not work either:

import sciline as sl
from typing import TypeAlias
from pathlib import Path
Filename: TypeAlias = str | Path
pl = sl.Pipeline([], params={Filename: "test.txt"})

mypy complains with (for the params of Pipeline)

Dict entry 0 has incompatible type "<typing special form>": "str"; expected "type[Any]": "Any"  [dict-item]

@SimonHeybrock
Copy link
Member

Does it work with Python 3.12 type aliases?

@jl-wynen
Copy link
Member Author

Not yet:

error: PEP 695 type aliases are not yet supported  [valid-type]

@jl-wynen
Copy link
Member Author

But in any case, waiting for that would mean that all our projects fail type checks for the next 2 years until 3.12 becomes our minimum Python version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Triage
Development

No branches or pull requests

4 participants