-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix binary operations on attrs for Series and DataFrame #59636
base: main
Are you sure you want to change the base?
Conversation
fbourgey
commented
Aug 28, 2024
- closes BUG: binary operations don't propogate attrs depending on order with Series and/or DataFrame/Series #51607
- Test
- Test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small change to prefer fixtures to writing out our own binop implementations, but generally lgtm. I don't think current CI failures are related.
@mroeschke any thoughts here?
pandas/tests/frame/test_api.py
Outdated
df_2 = DataFrame({"A": [-3, 9]}) | ||
attrs = {"info": "DataFrame"} | ||
df_1.attrs = attrs | ||
assert (df_1 + df_2).attrs == attrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing this you can just use the all_binary_operators
fixture from conftest.py (I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think attrs propagation logic should should only be handled by __finalize__
, so these binary operations should dispatch to that method
@mroeschke should everything be rewritten using |
Yes, or |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
@mroeschke, @WillAyd, I tried using |
I think it looks good but will defer to @mroeschke |
I don't think the CI failures are related. This still lgtm - @mroeschke can you take another look? |
@WillAyd @mroeschke, any chance I could get an update? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that _construct_result
is the common denominator method that is called after these methods. I think other
should be passed along there where __finalize__(other)
is ultimately called
… improved attribute handling
@mroeschke I suggested something. I kept some checks for attrs in |
This looks like a good job. However, I noticed some conditional logic added in if not getattr(self, "attrs", None) and getattr(other, "attrs", None):
self.__finalize__(other) This logic seems to attempt copying attributes from the Could we simplify the implementation by removing these conditional |
pandas/core/series.py
Outdated
self, | ||
result: ArrayLike | tuple[ArrayLike, ArrayLike], | ||
name: Hashable, | ||
other=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you type this argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other=None, | |
other: Series | None = None, |
like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing ArrayLike
but I'm also not familiar with this code path
pandas/core/series.py
Outdated
@@ -5949,11 +5953,13 @@ def _construct_result( | |||
Series | |||
In the case of __divmod__ or __rdivmod__, a 2-tuple of Series. | |||
""" | |||
if not getattr(self, "attrs", None) and getattr(other, "attrs", None): | |||
self.__finalize__(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be result.__finalize__(other)
(after the if isinstance(result, tuple)
is evaluated)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing
if not getattr(self, "attrs", None) and getattr(other, "attrs", None):
result.__finalize__(other)
after the block
if isinstance(result, tuple):
...
breaks as result
is an ArrayLike
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or rather out.__finalize__(other)
?
pandas/core/frame.py
Outdated
@@ -7875,13 +7875,19 @@ class diet | |||
def _cmp_method(self, other, op): | |||
axis: Literal[1] = 1 # only relevant for Series other case | |||
|
|||
if not getattr(self, "attrs", None) and getattr(other, "attrs", None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should need these anymore here since this should be handled in _construct_result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that sometimes
self, other = self._align_for_op(other, axis, flex=False, level=None)
resets other.attrs
to {}
.
This is why I kept it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because other
is getting overridden here? Otherwise, _align_for_op
should also preserve the attrs
of other.
@@ -497,6 +450,22 @@ def test_binops(request, args, annotate, all_binary_operators): | |||
assert result.attrs == {"a": 1} | |||
|
|||
|
|||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be simplified to
@pytest.mark.parametrize("left", [pd.Series, pd.DataFrame])
@pytest.mark.parametrize("right", [pd.Series, pd.DataFrame])
def test_attrs_binary_operations(...):
left = left([1])
left.attrs = {"a": 1}
right = right([2])
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks.