Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: compare to sklearn.neural_network #155

Merged
merged 10 commits into from
Jan 16, 2021
Merged

Conversation

adriangb
Copy link
Owner

Closes #119.

An overhaul of test_input_outputs.py to combine tests and compare to MLPRegressor/MLPClassifier.

A couple of interesting notes:

  • sklearn always returns np.int64 for multilabel-indicator tasks. This makes some sense, since the data is always 0/1. But there is something to be said for returning the same dtype as the input.
  • sklearn regressors always return np.float64, even if the input is int or float32. Returning floats even if the input is int makes sense, but it probably doesn't make sense to return higher precision floating points than the input, especially if Keras' backend dtype is usually float32. We might want to diverge from sklearn here and just return the raw data from Keras.

@codecov-io
Copy link

codecov-io commented Dec 28, 2020

Codecov Report

❗ No coverage uploaded for pull request base (develop@1c1e17b). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             develop     #155   +/-   ##
==========================================
  Coverage           ?   99.35%           
==========================================
  Files              ?        5           
  Lines              ?      618           
  Branches           ?        0           
==========================================
  Hits               ?      614           
  Misses             ?        4           
  Partials           ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c1e17b...af6a878. Read the comment docs.

tf-version: [2.2.0]
tf-version: [2.4.0]
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These version bumps are needed in order to be able to get rid of _windows_upcast_ints and have the same tests run on all platforms (otherwise there would've been a mess of pytest.skipif).

This also coincides with the same version bump in #128 , so I think it makes sense to do. We could perhaps do the version bump in a separate PR from those two to keep the diffs saner.

Copy link
Collaborator

@stsievert stsievert Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will fail when this PR is run with TF v2.2 on Windows? Will Windows users notice a difference?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_input_dtype_conversion will fail because on windows _windows_upcast_ints converts all ints to int64 before passing them to Keras.

This could be worked around by checking the dtype at a point before that (but the very fact that might work is an argument for not doing that) or by using pytest.skipif(os.platform == "nt") or something.

Windows users won't notice any difference (I doubt the type conversion mattered in terms of performance).

I'm going to make a develop branch where I'll do the version bump separately and then redirect this PR, #128 and #143 to develop.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the only difference a Windows user will notice with TF 2.2 is that the input and output dtypes aren't exactly the same? They'll input int32 and get out int64 or vice versa?

Is that fixed under TF 2.4?

Copy link
Owner Author

@adriangb adriangb Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, when users/the data transformer handed data of dtype int32 we transformed it to int64 since TF crashed with int32. Since we never returned data to the users, they did not see the difference. That seems to be fixed in 2.4.0, so we can remove the extra conversion without users noticing any difference.

This behavior/conversion never affected the output dtypes, that is completely separate. The reason it was originally in this PR (I since moved it to a separate commit to keep this PR focused) is that this PR implements a test that was not compatible with the ad-hoc dtype conversion being done in _windows_upcast_ints.

Copy link
Collaborator

@stsievert stsievert Jan 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good. If there was never a test for windows_upcast_ints don't worry about it. I can't parse all the test changes, which is why I asked.

Copy link
Collaborator

@stsievert stsievert Jan 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this answers your question on test deletion.

Why not delete a test if the [private function] has been removed? (paraphrased)

Why do you write tests? I write tests to ensure the software works as expected for the end user when they pick it up, and in every corner case too.

There's an argument for testing private behavior, or unit testing. But that's only appropriate for behemoth projects with many developers.

One of corner case is Windows for SciKeras. The fact that code has been deleted is irrelevant to test deletion. Again, why do you write tests? To ensure SciKeras works well for Windows users.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do realize the test diff is pretty large, and not particularly easy to parse. I am happy to try to rework this into multiple PRs, write out the new tests in pseudcode or anything else that might make review easier, just let me know what you think might help.

Regarding the question about test deletion, thank you for answering it even though it was a pretty poor question on my part. I think we both agree that writing unit tests for private functions like _windows_upcast_ints is not a good approach. Luckily, you already convinced me that that approach to testing is not optimal, so we never implemented test_windows_upcast_ints.

Because of your help and suggestions, I was able to remove _windows_upcast_ints in 1c1e17b (after bumping the TF version in 294183a) without having to change any tests. Thus all of the test changes in this PR are purely to re-organize our tests and implement comparisons to sklearn MLP{Classifier,Regressor}.

Copy link
Collaborator

@stsievert stsievert Jan 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do realize the test diff is pretty large, and not particularly easy to parse.

Don't worry about the diff. I only wrote that to let you know I only skimmed the tests. It sure sounds like you've got the testing under control.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking, would you like me to leave this open for a more in-depth review of the tests, or should I move forward with this and/or #128, which are both internal refactors? No problem either way, just let me know.

@adriangb adriangb marked this pull request as ready for review December 28, 2020 22:50
@adriangb adriangb changed the base branch from master to develop December 29, 2020 04:56
@adriangb adriangb merged commit 01c60ec into develop Jan 16, 2021
@adriangb adriangb deleted the mlp-input-output-tests branch January 16, 2021 06:21
adriangb added a commit that referenced this pull request Jan 16, 2021
* TST: compare to sklearn.neural_network (#155)

* MAINT/ENH: SaveModel based serialization (#128)

* Bump min TensorFlow version to 2.4.0 to accomodate other changes

Co-authored-by: Scott <[email protected]>
adriangb added a commit that referenced this pull request Jan 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TST: Add comparisons to MLPClassifier/MLPRegressor in terms of output shapes and dtypes
3 participants