Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix binary operations on attrs for Series and DataFrame #59636

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small change to prefer fixtures to writing out our own binop implementations, but generally lgtm. I don't think current CI failures are related.

@mroeschke any thoughts here?

df_2 = DataFrame({"A": [-3, 9]})
attrs = {"info": "DataFrame"}
df_1.attrs = attrs
assert (df_1 + df_2).attrs == attrs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than doing this you can just use the all_binary_operators fixture from conftest.py (I think)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think attrs propagation logic should should only be handled by __finalize__, so these binary operations should dispatch to that method

@mroeschke mroeschke added metadata _metadata, .attrs Numeric Operations Arithmetic, Comparison, and Logical operations labels Sep 25, 2024
@fbourgey
Copy link
Contributor Author

@mroeschke should everything be rewritten using finalize then?

@mroeschke
Copy link
Member

Yes, or __finalize__ needs to be probably be called somewhere

Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Nov 29, 2024
@fbourgey
Copy link
Contributor Author

fbourgey commented Dec 2, 2024

@mroeschke, @WillAyd, I tried using __finalize__ instead. What do you think?

@WillAyd WillAyd removed the Stale label Dec 3, 2024
@WillAyd
Copy link
Member

WillAyd commented Dec 3, 2024

I think it looks good but will defer to @mroeschke

@WillAyd
Copy link
Member

WillAyd commented Feb 3, 2025

I don't think the CI failures are related. This still lgtm - @mroeschke can you take another look?

@fbourgey
Copy link
Contributor Author

@WillAyd @mroeschke, any chance I could get an update?

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that _construct_result is the common denominator method that is called after these methods. I think other should be passed along there where __finalize__(other) is ultimately called

@fbourgey
Copy link
Contributor Author

@mroeschke I suggested something. I kept some checks for attrs in _cmp_method and _arithm_method in pandas/core/frame.py

@a-holm
Copy link

a-holm commented Mar 31, 2025

This looks like a good job. However, I noticed some conditional logic added in frame.py (_cmp_method, _arith_method, _construct_result) and series.py (_construct_result):

if not getattr(self, "attrs", None) and getattr(other, "attrs", None):
    self.__finalize__(other)

This logic seems to attempt copying attributes from the other (right) operand if self (left) doesn't initially have attributes. This appears potentially incorrect, as standard pandas behavior (and the new test) suggests the result should only inherit attributes from the left operand (self). The final out.__finalize__(self) call in _construct_result should already handle this correctly.

Could we simplify the implementation by removing these conditional self.__finalize__(other) calls and relying solely on out.__finalize__(self)? This seems like it would align better with the intended behavior and the added test case. However I might be wrong, just wanted to point this out in case something there was unintentional.

self,
result: ArrayLike | tuple[ArrayLike, ArrayLike],
name: Hashable,
other=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you type this argument?

Copy link
Contributor Author

@fbourgey fbourgey Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
other=None,
other: Series | None = None,

like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing ArrayLike but I'm also not familiar with this code path

@@ -5949,11 +5953,13 @@ def _construct_result(
Series
In the case of __divmod__ or __rdivmod__, a 2-tuple of Series.
"""
if not getattr(self, "attrs", None) and getattr(other, "attrs", None):
self.__finalize__(other)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be result.__finalize__(other) (after the if isinstance(result, tuple) is evaluated)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing

if not getattr(self, "attrs", None) and getattr(other, "attrs", None):
   result.__finalize__(other)

after the block

if isinstance(result, tuple):
   ...

breaks as result is an ArrayLike

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather out.__finalize__(other)?

@@ -7875,13 +7875,19 @@ class diet
def _cmp_method(self, other, op):
axis: Literal[1] = 1 # only relevant for Series other case

if not getattr(self, "attrs", None) and getattr(other, "attrs", None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should need these anymore here since this should be handled in _construct_result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that sometimes

self, other = self._align_for_op(other, axis, flex=False, level=None)

resets other.attrs to {}.

This is why I kept it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because other is getting overridden here? Otherwise, _align_for_op should also preserve the attrs of other.

@@ -497,6 +450,22 @@ def test_binops(request, args, annotate, all_binary_operators):
assert result.attrs == {"a": 1}


@pytest.mark.parametrize(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be simplified to

@pytest.mark.parametrize("left", [pd.Series, pd.DataFrame])
@pytest.mark.parametrize("right", [pd.Series, pd.DataFrame])
def test_attrs_binary_operations(...):
    left = left([1])
    left.attrs = {"a": 1}
    right = right([2])
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metadata _metadata, .attrs Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: binary operations don't propogate attrs depending on order with Series and/or DataFrame/Series
4 participants