TST: compare to sklearn.neural_network #155

adriangb · 2020-12-28T20:11:04Z

Closes #119.

An overhaul of test_input_outputs.py to combine tests and compare to MLPRegressor/MLPClassifier.

A couple of interesting notes:

sklearn always returns np.int64 for multilabel-indicator tasks. This makes some sense, since the data is always 0/1. But there is something to be said for returning the same dtype as the input.
sklearn regressors always return np.float64, even if the input is int or float32. Returning floats even if the input is int makes sense, but it probably doesn't make sense to return higher precision floating points than the input, especially if Keras' backend dtype is usually float32. We might want to diverge from sklearn here and just return the raw data from Keras.

codecov-io · 2020-12-28T20:16:27Z

Codecov Report

❗ No coverage uploaded for pull request base (develop@1c1e17b). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             develop     #155   +/-   ##
==========================================
  Coverage           ?   99.35%           
==========================================
  Files              ?        5           
  Lines              ?      618           
  Branches           ?        0           
==========================================
  Hits               ?      614           
  Misses             ?        4           
  Partials           ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c1e17b...af6a878. Read the comment docs.

scikeras/utils/transformers.py

Co-authored-by: Scott Sievert <[email protected]>

adriangb · 2020-12-28T22:32:22Z

.github/workflows/tests.yaml

-        tf-version: [2.2.0]
+        tf-version: [2.4.0]


These version bumps are needed in order to be able to get rid of _windows_upcast_ints and have the same tests run on all platforms (otherwise there would've been a mess of pytest.skipif).

This also coincides with the same version bump in #128 , so I think it makes sense to do. We could perhaps do the version bump in a separate PR from those two to keep the diffs saner.

What will fail when this PR is run with TF v2.2 on Windows? Will Windows users notice a difference?

test_input_dtype_conversion will fail because on windows _windows_upcast_ints converts all ints to int64 before passing them to Keras.

This could be worked around by checking the dtype at a point before that (but the very fact that might work is an argument for not doing that) or by using pytest.skipif(os.platform == "nt") or something.

Windows users won't notice any difference (I doubt the type conversion mattered in terms of performance).

I'm going to make a develop branch where I'll do the version bump separately and then redirect this PR, #128 and #143 to develop.

So the only difference a Windows user will notice with TF 2.2 is that the input and output dtypes aren't exactly the same? They'll input int32 and get out int64 or vice versa?

Is that fixed under TF 2.4?

Previously, when users/the data transformer handed data of dtype int32 we transformed it to int64 since TF crashed with int32. Since we never returned data to the users, they did not see the difference. That seems to be fixed in 2.4.0, so we can remove the extra conversion without users noticing any difference.

This behavior/conversion never affected the output dtypes, that is completely separate. The reason it was originally in this PR (I since moved it to a separate commit to keep this PR focused) is that this PR implements a test that was not compatible with the ad-hoc dtype conversion being done in _windows_upcast_ints.

Oh, good. If there was never a test for windows_upcast_ints don't worry about it. I can't parse all the test changes, which is why I asked.

I'm not sure if this answers your question on test deletion.

Why not delete a test if the [private function] has been removed? (paraphrased)

Why do you write tests? I write tests to ensure the software works as expected for the end user when they pick it up, and in every corner case too.

There's an argument for testing private behavior, or unit testing. But that's only appropriate for behemoth projects with many developers.

One of corner case is Windows for SciKeras. The fact that code has been deleted is irrelevant to test deletion. Again, why do you write tests? To ensure SciKeras works well for Windows users.

I do realize the test diff is pretty large, and not particularly easy to parse. I am happy to try to rework this into multiple PRs, write out the new tests in pseudcode or anything else that might make review easier, just let me know what you think might help.

Regarding the question about test deletion, thank you for answering it even though it was a pretty poor question on my part. I think we both agree that writing unit tests for private functions like _windows_upcast_ints is not a good approach. Luckily, you already convinced me that that approach to testing is not optimal, so we never implemented test_windows_upcast_ints.

Because of your help and suggestions, I was able to remove _windows_upcast_ints in 1c1e17b (after bumping the TF version in 294183a) without having to change any tests. Thus all of the test changes in this PR are purely to re-organize our tests and implement comparisons to sklearn MLP{Classifier,Regressor}.

I do realize the test diff is pretty large, and not particularly easy to parse.

Don't worry about the diff. I only wrote that to let you know I only skimmed the tests. It sure sounds like you've got the testing under control.

Just checking, would you like me to leave this open for a more in-depth review of the tests, or should I move forward with this and/or #128, which are both internal refactors? No problem either way, just let me know.

Co-authored-by: Scott Sievert <[email protected]>

…scikeras into mlp-input-output-tests

* TST: compare to sklearn.neural_network (#155) * MAINT/ENH: SaveModel based serialization (#128) * Bump min TensorFlow version to 2.4.0 to accomodate other changes Co-authored-by: Scott <[email protected]>

Co-authored-by: Scott Sievert <[email protected]>

TST: compare to sklearn.neural_network

7e69283

stsievert reviewed Dec 28, 2020

View reviewed changes

scikeras/utils/transformers.py Outdated Show resolved Hide resolved

Update scikeras/utils/transformers.py

0b26d1f

Co-authored-by: Scott Sievert <[email protected]>

adriangb mentioned this pull request Dec 28, 2020

TMP: for windows testing #156

Merged

Fix tests on windows (#156)

35db2ee

adriangb commented Dec 28, 2020

View reviewed changes

adriangb marked this pull request as ready for review December 28, 2020 22:50

adriangb and others added 5 commits December 28, 2020 22:53

TST: compare to sklearn.neural_network

cac7689

Update scikeras/utils/transformers.py

af23b4c

Co-authored-by: Scott Sievert <[email protected]>

Fix tests on windows (#156)

cdf4a4d

TST: compare to sklearn.neural_network

c744654

Merge branch 'mlp-input-output-tests' of https://github.com/adriangb/…

6d6d0c1

…scikeras into mlp-input-output-tests

adriangb changed the base branch from master to develop December 29, 2020 04:56

adriangb added 2 commits December 28, 2020 22:59

Merge branch 'develop' into mlp-input-output-tests

e7a2452

fix merge

af6a878

adriangb merged commit 01c60ec into develop Jan 16, 2021

adriangb deleted the mlp-input-output-tests branch January 16, 2021 06:21

adriangb added a commit that referenced this pull request Jan 16, 2021

TST: compare to sklearn.neural_network (#155)

739fd1d

Co-authored-by: Scott Sievert <[email protected]>

adriangb mentioned this pull request Jan 20, 2021

TST: Add comparisons to MLPClassifier/MLPRegressor in terms of output shapes and dtypes #119

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: compare to sklearn.neural_network #155

TST: compare to sklearn.neural_network #155

adriangb commented Dec 28, 2020

codecov-io commented Dec 28, 2020 •

edited

Loading

adriangb Dec 28, 2020

stsievert Dec 29, 2020 •

edited

Loading

adriangb Dec 29, 2020

stsievert Dec 29, 2020

adriangb Dec 29, 2020 •

edited

Loading

stsievert Jan 1, 2021 •

edited

Loading

stsievert Jan 1, 2021 •

edited

Loading

adriangb Jan 1, 2021

stsievert Jan 1, 2021 •

edited

Loading

adriangb Jan 7, 2021

TST: compare to sklearn.neural_network #155

TST: compare to sklearn.neural_network #155

Conversation

adriangb commented Dec 28, 2020

codecov-io commented Dec 28, 2020 • edited Loading

Codecov Report

adriangb Dec 28, 2020

Choose a reason for hiding this comment

stsievert Dec 29, 2020 • edited Loading

Choose a reason for hiding this comment

adriangb Dec 29, 2020

Choose a reason for hiding this comment

stsievert Dec 29, 2020

Choose a reason for hiding this comment

adriangb Dec 29, 2020 • edited Loading

Choose a reason for hiding this comment

stsievert Jan 1, 2021 • edited Loading

Choose a reason for hiding this comment

stsievert Jan 1, 2021 • edited Loading

Choose a reason for hiding this comment

adriangb Jan 1, 2021

Choose a reason for hiding this comment

stsievert Jan 1, 2021 • edited Loading

Choose a reason for hiding this comment

adriangb Jan 7, 2021

Choose a reason for hiding this comment

codecov-io commented Dec 28, 2020 •

edited

Loading

stsievert Dec 29, 2020 •

edited

Loading

adriangb Dec 29, 2020 •

edited

Loading

stsievert Jan 1, 2021 •

edited

Loading

stsievert Jan 1, 2021 •

edited

Loading

stsievert Jan 1, 2021 •

edited

Loading