-
Notifications
You must be signed in to change notification settings - Fork 625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: type error when accessing length()
without schema information
#10651
Comments
I can't reproduce this on
|
The code works. It's the static type checking that outputs an error while it should not. |
Schema information generally can't be made static in Python. A type checker would need access to information from a running database. |
Of course, but in this case I think the DX would be much better if instead of assuming that the column can't be a list, we assume it could be, and prevent the type checker from erroring Right now I have to do from typing import cast
from ibis.expr.types import ArrayColumn
t.select(t).filter(
cast(ArrayColumn, t.my_list_column).length() > ibis.literal(0)
) (type checking also requires me to write More and more projects use type checking, and not having it in ibis will prevent all of them from having a good DX and reduce ibis' adoption. |
It sounds like you effectively want every type-specific method to exist on |
That could be a solution. Maybe there's another one which does not imply modifying the |
In that case a giant If this pattern shows up in a bunch of places maybe we can make a type alias to simplify our lives. |
Wouldn't pyright still complain in this case? if you try to do I started #10682 as an attempt to slightly mitigate this pain point for some workflows. eg if/after that lands, then |
It's been merged into
That is by design.
I think this is true of any type system that supports recursive types, because there are infinitely many types in such systems. |
Are you sure? |
Yes. By implementation detail here, I mean that we don't expose those objects as inputs to expression APIs. Casting is an in-database operation, and our backends don't and probably won't (and shouldn't) ever have knowledge of how Ibis models expressions. |
Hi! I'm hitting similar static type checking issues with many expressions. Long text ahead - I can open a separate issue if more appropriate. I am also willing to open a PR if there's an agreement on changes. Consider the following expressions and their types as inferred by Pyright. t = ibis.memtable({"a": [1, 2]})
x = t.a.name("b").min() # Typing error
x = t.a.log(t.a) # Typing error
x = t.a.log(t.a).min().log(ibis.literal(2)) # Typing error
a = typing.cast("ibis.expr.types.IntegerColumn", t.a)
y = a.name("y") # Typed as Value - is IntegerColumn
y = a.log() # Typed as NumericValue - is NumericColumn
y = a.isnull() # Typed as BooleanValue - is BooleanColumn Note that Nonetheless
Even when the type checker is informed of the precise shape and dtype of Why this is a problemStatic type checking with tools like Pyright is heavily relied on by many Python projects, and only gets more popular with time. I agree with @choucavalier : valid Ibis basic operations being flagged as invalid is painful for DX and a real obstacle to wider adoption. Users must choose between living with a good chunk of their Ibis code being highlighted as invalid, or disabling type checking on Ibis altogether. Neither option is particularly appealing. Note that this is completely separate from Ibis runtime operations type safety (which took me some time to figure out). SuggestionsI believe that, in this context and at this scale, false positives (statically typed as valid, actually invalid at runtime) are better than false negatives (statically typed as invalid, actually valid at runtime). Dtype-agnostic declarationsGiven that value dtype usually can't be inferred statically, I see no other option than declaring dtype-specific methods (e.g. A similar choice was made in pandas for Series, or Polars for Expr. Shape-aware typingContrary to dtype, shape can usually be inferred statically, and it's valuable typing information. Most (if not all?) Proof of conceptTrying out on some methods, see how it looks in romaingd#1 T = TypeVar("T", "Value", "Column", "Scalar")
class Value(Expr):
def name(self: Self, name: str, /) -> Self: ...
def isnull(self: T) -> T: ...
@abstractmethod # Implemented in `NumericValue`
def log(self: T, base: Value | None = None, /) -> T: ... Results: t = ibis.memtable({"a": [1, 2]})
x = t.a.name("b").min() # Typed as Scalar
x = t.a.log(t.a) # Typed as Column
x = t.a.log(t.a).min().log(ibis.literal(2)) # Typed as Scalar
a = typing.cast("ibis.expr.types.IntegerColumn", t.a)
y = a.name("y") # Typed as IntegerColumn
y = a.log() # Typed as Column
y = a.isnull() # Typed as Column What we gained:
What we lost:
Other options
|
What happened?
The following example is self-explanatory
By default, it's assumed that
t.some_col
is of typeColumn
, while it could be of typeArrayColumn
(or other more specific types for that matter).This results in typing issues.
What version of ibis are you using?
9.5.0
What backend(s) are you using, if any?
DuckDB
Relevant log output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: