Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#386 LogisticRegression.predict_proba should return (n, 2) for binary #760

Merged
merged 3 commits into from
Apr 17, 2021

Conversation

rikturr
Copy link
Contributor

@rikturr rikturr commented Nov 25, 2020

This PR addresses #386 by making sure LogisticRegression.predict_proba() returns an array of shape (n, 2) to match what scikit-learn does. There are some things currently hard-coded for binary classification only, but it is clear that multi-class is currently not supported so that can be changed when multi-class is implemented.

There is some risk here that if users have downstream code that depends on the current functionality (return shape of (n,)), the changes from this PR could break their code.

@rikturr
Copy link
Contributor Author

rikturr commented Nov 25, 2020

I'm not quite sure why some of the checks are failing. It seems one of them is due to:

##[warning]numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject

but I'm not quite sure what I can change to make the check pass?

@TomAugspurger
Copy link
Member

TomAugspurger commented Nov 26, 2020 via email

Base automatically changed from master to main February 2, 2021 03:43
@rikturr
Copy link
Contributor Author

rikturr commented Feb 3, 2021

Looks like test failures have been addressed, besides linting (thanks @jsignell 🎉 ). I think this should be ready to merge if the implementation looks good

@jsignell
Copy link
Member

jsignell commented Feb 5, 2021

Looks like test failures have been addressed

Might not be quite done yet, but I'm chipping away at it.

@jsignell
Copy link
Member

Seems like only the keras test is broken and that'll be fixed by #794, so good to merge?

@rikturr
Copy link
Contributor Author

rikturr commented Apr 15, 2021

Triggered the tests to run again, so I think all is good now! Can this be merged?

Copy link
Member

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This has dragged on long enough (sorry) so I'm going to merge. But if you want to make a cleanup PR you could simplify things to just call np.vstack on the array. Thanks to NEP-18 it works on Dask Arrays too.

@@ -59,3 +59,13 @@ def add_intercept(X): # noqa: F811
if "intercept" in columns:
raise ValueError("'intercept' column already in 'X'")
return X.assign(intercept=1)[["intercept"] + list(columns)]


@dispatch(np.ndarray) # noqa: F811
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can actually just use np.vstack now, since Dask implemented NEP 18.

@TomAugspurger TomAugspurger merged commit 090dd27 into dask:main Apr 17, 2021
@rikturr rikturr deleted the predict-proba-shape branch April 19, 2021 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants