BUG(string dtype): groupby/resampler.min/max returns float on all NA strings #60985

rhshadrach · 2025-02-22T15:30:08Z

closes BUG: Inconsistent dtype with GroupBy for StrDtype and all missing values #60810 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Built on top of #60936

…strings

…groupby_all_na_min_max

WillAyd · 2025-03-24T13:32:14Z

pandas/tests/groupby/test_reductions.py

+    expected_dtype, expected_value = dtype, pd.NA
+    if reduction_func in ["all", "any"]:
+        expected_dtype = "bool"
+        # TODO: For skipna=False, bool(pd.NA) raises; should groupby?


It looks like there are a few TODOs / inconsistencies in our interface here. I think rather than branching and trying to document all of them, it would help to simplify this test and just skip/xfail the cases where things are not consistent. It may even be helpful to split this up into multiple tests that are more focused

I think rather than branching and trying to document all of them, it would help to simplify this test and just skip/xfail the cases where things are not consistent.

If they added significant complexity I would agree. However the complexity added seems minimal to me, and testing the current behavior tells us when it changes. So even if it's not the final behavior we'd desire, testing it seems better than skipping or xfailing.

It may even be helpful to split this up into multiple tests that are more focused

If there was a good way of doing this while ensuring we are going through all the reduction funcs, I'd definitely be on board. However to my knowledge there is not.

WillAyd · 2025-03-24T13:32:59Z

pandas/core/groupby/groupby.py

            res_values = res_values.astype(object, copy=False)
+        elif is_string_dtype(dtype) and how in ["min", "max"]:


Is there a way to avoid special-casing these functions here? Where is the return value from other functions being handled?

Yea - good call. We only get here with min/max today. If we do end up here with other ops at some point in the future, either (a) the dtype is already correct in which case _from_sequence is O(1) or (b) we want to cast. So I've removed the condition.

…groupby_all_na_min_max

rhshadrach added 4 commits February 22, 2025 10:29

BUG(string dtype): groupby/resampler.min/max returns float on all NA …

9254a10

…strings

Merge cleanup

fbf2d11

whatsnew

898a9bc

Merge main

f90a268

rhshadrach added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Strings String extension data type and string data Reduction Operations sum, mean, min, max, etc. Bug labels Mar 19, 2025

rhshadrach added this to the 2.3 milestone Mar 19, 2025

rhshadrach added 2 commits March 22, 2025 11:15

Merge branch 'main' of https://github.com/pandas-dev/pandas into bug_…

ffcff02

…groupby_all_na_min_max

Add type-ignore

ba0fba4

rhshadrach marked this pull request as ready for review March 23, 2025 12:07

rhshadrach requested review from WillAyd and jorisvandenbossche March 23, 2025 12:07

WillAyd requested changes Mar 24, 2025

View reviewed changes

rhshadrach added 2 commits March 26, 2025 09:57

Merge branch 'main' of https://github.com/pandas-dev/pandas into bug_…

1337ccb

…groupby_all_na_min_max

Remove condition

4ec7ac4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG(string dtype): groupby/resampler.min/max returns float on all NA strings #60985

BUG(string dtype): groupby/resampler.min/max returns float on all NA strings #60985

rhshadrach commented Feb 22, 2025

WillAyd Mar 24, 2025

rhshadrach Mar 26, 2025

WillAyd Mar 24, 2025

rhshadrach Mar 26, 2025

		res_values = res_values.astype(object, copy=False)
		elif is_string_dtype(dtype) and how in ["min", "max"]:

BUG(string dtype): groupby/resampler.min/max returns float on all NA strings #60985

Are you sure you want to change the base?

BUG(string dtype): groupby/resampler.min/max returns float on all NA strings #60985

Conversation

rhshadrach commented Feb 22, 2025

WillAyd Mar 24, 2025

Choose a reason for hiding this comment

rhshadrach Mar 26, 2025

Choose a reason for hiding this comment

WillAyd Mar 24, 2025

Choose a reason for hiding this comment

rhshadrach Mar 26, 2025

Choose a reason for hiding this comment