-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG(string dtype): groupby/resampler.min/max returns float on all NA strings #60985
base: main
Are you sure you want to change the base?
Conversation
…groupby_all_na_min_max
expected_dtype, expected_value = dtype, pd.NA | ||
if reduction_func in ["all", "any"]: | ||
expected_dtype = "bool" | ||
# TODO: For skipna=False, bool(pd.NA) raises; should groupby? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there are a few TODOs / inconsistencies in our interface here. I think rather than branching and trying to document all of them, it would help to simplify this test and just skip/xfail the cases where things are not consistent. It may even be helpful to split this up into multiple tests that are more focused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rather than branching and trying to document all of them, it would help to simplify this test and just skip/xfail the cases where things are not consistent.
If they added significant complexity I would agree. However the complexity added seems minimal to me, and testing the current behavior tells us when it changes. So even if it's not the final behavior we'd desire, testing it seems better than skipping or xfailing.
It may even be helpful to split this up into multiple tests that are more focused
If there was a good way of doing this while ensuring we are going through all the reduction funcs, I'd definitely be on board. However to my knowledge there is not.
pandas/core/groupby/groupby.py
Outdated
res_values = res_values.astype(object, copy=False) | ||
elif is_string_dtype(dtype) and how in ["min", "max"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to avoid special-casing these functions here? Where is the return value from other functions being handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea - good call. We only get here with min/max today. If we do end up here with other ops at some point in the future, either (a) the dtype is already correct in which case _from_sequence
is O(1) or (b) we want to cast. So I've removed the condition.
…groupby_all_na_min_max
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Built on top of #60936