-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR: DataFrame.lookup #35224
DEPR: DataFrame.lookup #35224
Conversation
This probably needs more discussion, but thought I open this PR as a starting point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to change the existing tests to assert that there is a FutureWarning otherwise tests will fail
In the xref you mention "Maybe a standalone function somewhere", is this still the idea or are we going to mention |
no just put a nice example in the indexing docs - reference this in the deprecation don’t need a standalone function |
Hello @erfannariman! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-09-16 17:58:12 UTC |
@WillAyd I put |
doc/source/user_guide/indexing.rst
Outdated
'B': [80, 55, 76, 67]}) | ||
df | ||
melt = df.melt('col') | ||
df['lookup'] = melt.query('col == variable')['value'].to_numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than query, use .loc here as its more idiomatic (and plus its a single statement). Do we actually need to convert to numpy? (e.g. to avoid alignment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I used to_numpy
is indeed for the index alignment. With .loc
and without to_numpy
, I would need reset_index
. So that would look like:
melt = df.melt('col')
melt = melt.loc[melt['col'] == melt['variable'], 'value']
df['lookup'] = melt.reset_index(drop=True)
print(df)
col A B lookup
0 A 80.0 80 80.0
1 A 23.0 55 23.0
2 B NaN 76 76.0
3 B 22.0 67 67.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which one do you prefer? I can understand that this is more idiomatic, althought it's a bit more code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw now I'm typing it, I realize this method would fail if the index was not 0, .., n-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a horrible choice as an alternative for lookup
. Essentially, melt = df.melt('col')
duplicates the data (not to mention the extra variable
column) while one can resolve to numpy indexing.
I'm not sure what's the implementation of lookup
, but it's a real request emerging from what I could see on SO. To me pivot
and pivot_table
are more redundant than lookup
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@quanghm my response is to the tone of the argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not averse to the function of lookup but it's not going to be a top level api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not averse to the function of lookup but it's not going to be a top level api
Yet another reason why I don't go and make a pull request. I don't understand what's other than top level api that I can implement lookup
functionality, and still don't understand performant.
@quanghm my response is to the tone of the argument
Can you please be more specific? The way I see it, @erfannariman was basically saying that unless I made a pull request, I cannot criticize the suggested alternative as horrible even when I spent time to analyze, test and show that it is, in fact, horrible. OK, if horrible is an offensive word, my apology, English is not my first language. Let me re-phrase it is very bad.
I do feel that my effort to contribute here is not valued. That was the best I can afford to better Pandas. Not everyone can afford to make a pull request, modify the code, run and pass all the unit tests.
Thanks for all the work maintaining and developing Pandas anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I learned from the other discussion, you are not the one that has a say on bringing back
lookup
, and those that do don't seem to be willing to review the issue.
Opening a ticket with everything you wrote here has more chance to get reviewed by the core devs than to write all this on a closed PR. That way the other devs can see it as well, since I am quite sure they are not seeing this whole discussion.
If you don't feel comfortable open an issue, please let me know and I wil open one, since Jeff mentioned he's open for a (discussion) for re-implementing lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @jreback
thanks @erfannariman very nice! |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff