-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Clip corr edge cases between -1.0 and 1.0 #61154
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, does nancorr_spearman
also need this change?
I do not think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks could you add a whatsnew note to v3.0.0.rst
?
Added a whatnew note under "Notable Bug Fixes" |
doc/source/whatsnew/v3.0.0.rst
Outdated
|
||
notable_bug_fix2 | ||
^^^^^^^^^^^^^^^^ | ||
Improved handling of numerical precision errors in ``DataFrame.corr`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single line entry under the Numeric
section would be more appropriate
Thanks again @j-hendricks |
pandas-dev/pandas#61154 changed the behavior of pandas' correleation methods for series whose `r` is close to -1 or 1. This updates our test to adapt to that change by increasing the tolerance we consider equal.
@j-hendricks I'm seeing different behavior, depending on whether missing values are present. Can you confirm whether the following behavior is expected? At 83979d6 In [50]: x = pd.DataFrame({"A": [1, 2, None, 4], "B": [2, 4, None, 9]})
In [51]: x.cov()
Out[51]:
A B
A 1.0 1.0
B 1.0 1.0
In [52]: x.dropna().cov()
Out[52]:
A B
A 2.333333 5.5
B 5.500000 13.0
In [53]: pd.__git_version__
Out[53]: '83979d6a0c5a223bac2af8ef706c5ff8d432bcca' With 2.2.3: In [67]: x = pd.DataFrame({"A": [1, 2, None, 4], "B": [2, 4, None, 9]})
In [68]: x.cov()
Out[68]:
A B
A 2.333333 5.5
B 5.500000 13.0
In [69]: x.dropna().cov()
Out[69]:
A B
A 2.333333 5.5
B 5.500000 13.0
In [70]: pd.__version__
Out[70]: '2.2.3' Depending on my digging in dask/dask#11857, there might be some more differences, but this seemed to be the largest. |
It appears this clipping should have only happened if |
Closes #61120
Clips correlation coefficient in
DataFrame.corr()
between-1.0
and1.0
Adds
test_corr_within_bounds
to ensure coefficient within boundsdoc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.