Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Uniformize initialization for all algorithms #195

Conversation

wdevazelhes
Copy link
Member

@wdevazelhes wdevazelhes commented Apr 23, 2019

Fixes #124
Fixes #202

TODO:

  • Uniformize all algorithms that learn transformer except MLKR
  • Try to refactor MLKR to benefit as much as possible from the already made changed
  • Warning: do the tests for all algos that learn a transformer, not only for the supervised ones (currently not the case) (currently it does not change anything but it could, and semantically that's how the test should be)
  • Uniformize all the algorithms that learn a mahalanobis matrix
  • [ ] Remove the remaining num_dims if there are, after merging with Dimensionality reduction for algorithms learning Mahalanobis matrix M #167 we should first merge this PR
  • Check whether some algos can have both an init and a prior

@wdevazelhes
Copy link
Member Author

I am thinking for algorithms that learn a transformation L we could do the same initialization as in scikit-learn's NCA: https://scikit-learn.org/dev/modules/generated/sklearn.neighbors.NeighborhoodComponentsAnalysis.html)
But the problem is that for algorithms that learn the metric matrix, they don't have a n_components arguments, I was thinking we could do an initialization where we would take L=(lda) with n_components=None (by the way PCA in this case would'nt make sense because it would just rotate the data isn't it ? (if we assume it's already centered)), and have A_0 = L**T.dot(L), what do you think ? But then what to do for choosing what to do for 'auto' ? Maybe lda then ?

@wdevazelhes
Copy link
Member Author

By the way I was thinking it could be cool to have an option to give any estimator as init, as long as it has an components_ attribute after fitting, so that it would allow to use TruncatedSVD for instance, or any other custom estimator. But this is out of the scope of this PR I guess

@bellet
Copy link
Member

bellet commented Apr 23, 2019

For algorithms learning the transformation, it seems a good idea to re-use as much of the code you wrote for NCA. These algorithms are typically nonconvex so a good initialization can make a large difference. I remember that some benchmarks you had done for NCA seemed to show that LDA init was good, so let's go for that.

For algorithms learning the metric matrix: these are typically convex so the init is not as important (it mostly affects convergence speed). We also have to be careful with initialization that would lead to singular matrices (such as LDA) which could potentially make some solvers fail. My 2 cents: I would focus on 'identity', 'covariance' (corresponding to the inverse covariance), 'random' and numpy array, with the default to 'identity' unless some algorithms have a more natural init (e.g., SDML could be initialized to 'covariance'). I would not introduce LDA without a proper benchmark showing some concrete benefits on convergence speed and ensuring that it does not cause any issue.

@bellet
Copy link
Member

bellet commented Apr 23, 2019

By the way I was thinking it could be cool to have an option to give any estimator as init, as long as it has an components_ attribute after fitting, so that it would allow to use TruncatedSVD for instance, or any other custom estimator. But this is out of the scope of this PR I guess

I guess you mean that this estimator would be fitted within our own fit method? Could be convenient indeed (and PCA/LDA init could be obtained that way) but I think not a high priority and there may be some implementation subtleties. Better keep this for later

William de Vazelhes added 2 commits April 24, 2019 15:52
@@ -138,9 +139,6 @@ def test_embed_dim(estimator, build_dataset):
assert str(raised_error.value) == err_msg
# we test that the shape is also OK when doing dimensionality reduction
if type(model).__name__ in {'LFDA', 'MLKR', 'NCA', 'RCA'}:
# TODO:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -475,9 +475,7 @@ def test_singleton_class(self):

EPS = np.finfo(float).eps
A = np.zeros((X.shape[1], X.shape[1]))
np.fill_diagonal(A,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to double check here I don't think that's what I should have done

Copy link
Member Author

@wdevazelhes wdevazelhes May 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, I updated with a random matrix as init

np.fill_diagonal(A,
1. / (np.maximum(X.max(axis=0) - X.min(axis=0), EPS)))
nca = NCA(max_iter=30, num_dims=X.shape[1])
nca = NCA(init=A, max_iter=30, num_dims=X.shape[1])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -488,9 +486,7 @@ def test_one_class(self):
y = self.iris_labels[self.iris_labels == 0]
EPS = np.finfo(float).eps
A = np.zeros((X.shape[1], X.shape[1]))
np.fill_diagonal(A,
1. / (np.maximum(X.max(axis=0) - X.min(axis=0), EPS)))
nca = NCA(max_iter=30, num_dims=X.shape[1])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

mmc = MMC(convergence_threshold=0.01)
mmc.fit(*wrap_pairs(self.iris_points, [a,b,c,d]))
n_features = self.iris_points.shape[1]
mmc = MMC(convergence_threshold=0.01, init=np.eye(n_features) / 10)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous default init was identity divided by 10

@@ -366,7 +366,7 @@ def test_sdml_works_on_non_spd_pb_with_skggm(self):
X, y = load_iris(return_X_y=True)
sdml = SDML_Supervised(balance_param=0.5, sparsity_param=0.01,
init='covariance')
sdml.fit(X, y)
sdml.fit(X, y, random_state=np.random.RandomState(42))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing this random state makes the test pass (otherwise there is the error RuntimeError: There was a problem in SDML when using skggm's graphical lasso solver.). I tried to understand why but I couldn't even reproduce the "no-bug" from this travis build for instance (https://travis-ci.org/metric-learn/metric-learn/jobs/527783562), locally, so I must be doing something wrong (I created a virtualenv with the same versions of python and packages but it still fails so I don't know what could go wrong)

@wdevazelhes
Copy link
Member Author

I just pushed a commit that warns the user with a ChangedBehaviorWarning when the init is not set, in the cases when the default init is not the same as it was before, let me know what you think, this happens in the following cases:

  • lsml: it used the inverse covariance by default (instead of identity now)
  • mmc: it used identity/10 (instead of just identity)
  • mlkr: it used pca (instead of 'auto')
  • nca: it used np.eye(X.shape[1])/(np.maximum(X.max(axis=0)-X.min(axis=0), EPS) (instead of 'auto')
  • sdml: it used the inverse covariance by default (since use_cov was True by default), instead of identity

# Conflicts:
#	metric_learn/itml.py
#	test/metric_learn_test.py
@wdevazelhes wdevazelhes added this to the v0.5.0 milestone Jun 5, 2019
Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor corrections (I did not have time to look at the tests). Thanks for the huge work, it will be very nice to initialize in a uniform way and will also facilitate the development of future algorithms

The only remaining point (besides making the tests pass) is to have both prior and init for LSML and SDML (with default init set to the value of prior)

@@ -335,28 +341,37 @@ def check_collapsed_pairs(pairs):
def _check_sdp_from_eigen(w, tol=None):
"""Checks if some of the eigenvalues given are negative, up to a tolerance
level, with a default value of the tolerance depending on the eigenvalues.
It also returns whether the matrix is definite.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should say "positive definite", and in fact this is also up to tolerance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, done, yes I used the same tolerance as for checking if the eigenvalues are negative (the tolerance is a sort of threshold to detect zeros eigenvalues that can be used in any side of 0, + or -). Do you think I should allow to set a different threshold for returning if the matrix is PSD ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I think it is fine like this

Returns
-------
is_definite : bool
Whether the matrix is definite or not.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done


'auto'
Depending on ``num_dims``, the most reasonable initialization will
be chosen. If ``num_dims <= n_classes`` we use 'lda' (if possible,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "if possible, "?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done

be chosen. If ``num_dims <= n_classes`` we use 'lda' (if possible,
see the description of 'lda' init), as it uses labels information.
If not, but ``num_dims < min(n_features, n_samples)``, we use
'pca', as it projects data in meaningful directions (those of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in --> onto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, done

This initialization is possible only if `has_classes == True`.

'identity'
If ``num_dims`` is strictly smaller than the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although it should be obvious, maybe start by clearly saying that that this uses identity matrix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, done

The (pseudo-)inverse of the covariance matrix.

'random'
The initial transformation will be a random SPD matrix of shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again

Copy link
Member Author

@wdevazelhes wdevazelhes Jun 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

random_state : int or numpy.RandomState or None, optional (default=None)
A pseudo random number generator object or a seed for it if int. If
``init='random'``, ``random_state`` is used to initialize the random
transformation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

The inverse covariance matrix.

'random'
The initial transformation will be a random PD matrix of shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

msg = ("Warning, as of version 0.5.0, the default prior is now "
"'identity', instead of 'covariance'. If you still want to use "
"the inverse of the covariance matrix as a prior, "
"set 'prior'=='covariance' (it was the default in previous "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the content of the parenthesis "it was the default..." is redundant, you can remove it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's right, done

self.A_ /= 10.0
else:
self.A_ = check_array(self.A0)
msg = ("Warning, as of version 0.5.0, the default prior is now "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init instead of prior

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, done

@wdevazelhes
Copy link
Member Author

@bellet Regarding the init that default as prior if possible, I think I have to parse a bit the algorithms to understand which part in the computations is the actual prior and which part is the initial matrix for the iterative algorithms: I'd be more comfortable in making a guess at it in a new PR so that it's easier to comment and talk about it, without polluting this PR, what do you think we merge this PR first and do this in a next PR ? Also the PR has become quite big now so I guess it would improve the readability of the next changes

@wdevazelhes
Copy link
Member Author

If you agree, I addressed all your comments, so I guess we can merge this PR

if tol < 0:
raise ValueError("tol should be positive.")
if any(w < - tol):
raise ValueError("Matrix is not positive semidefinite (PSD).")
raise NonPSDError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing parens here: raise NonPSDError()

Copy link
Member Author

@wdevazelhes wdevazelhes Jun 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried with a command line python 2 and 3 and it worked without the parenthesis, is it a best practice to put the parenthesis ? Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without parens you're raise-ing the class itself, rather than an instance of the class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, it makes sense

random_state = check_random_state(random_state)
if isinstance(init, np.ndarray):
return init
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially for long functions with lots of nesting like this one, I prefer the "no else" style:

if condition:
  return foo
# everything else at original indent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, it's better, done

return init
else:
n_samples = input.shape[0]
if init == 'auto':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be simpler to test if we broke out pieces into standalone functions. For example, the "auto-select" logic could be it's own function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, done, and tested the function

@@ -19,18 +19,61 @@
from six.moves import xrange
from sklearn.metrics import euclidean_distances
from sklearn.base import TransformerMixin

from metric_learn._util import _initialize_transformer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from ._util import ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done

@@ -102,6 +156,8 @@ def fit(self, X, y):
self._loss_grad(X, L, dfG, impostors, 1, k, reg, target_neighbors, df,
a1, a2))

it = 1 # we already made one iteration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a no-op line. Maybe just update the "main loop" comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand, I think the it=1 still useful for coherence, if one puts max_iter=1, it would break otherwise (variable not defined)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't see that we're referencing it after the loop.

"the inverse of the covariance matrix as a prior, "
"set 'prior'=='covariance'. This warning will disappear in "
"v0.6.0.")
warnings.warn(msg, ChangedBehaviorWarning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning will be annoying for users that want to use identity initialization intentionally. We could instead keep using None as the default and only warn when the user hasn't set the prior explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, let me know what you think of the new error messages (says that init was not set because is None, and that in the next version it will be set to 'identity' for instance, and that the warning will disappear)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is good. One small nit: instead of "set 'prior'=='covariance'.", we should do "set prior='covariance'."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"'auto', instead of 'pca'. If you still want to use "
"PCA as an init, set 'init'=='pca'. This warning will "
"disappear in v0.6.0.")
warnings.warn(msg, ChangedBehaviorWarning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here with intentional / explicit parameter vs default value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set init='pca'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@wdevazelhes
Copy link
Member Author

@perimosocordiae thanks for your review ! I addressed all your comments (except #195 (comment))
If you all agree, I think it's good to merge

@bellet
Copy link
Member

bellet commented Jun 7, 2019

@bellet Regarding the init that default as prior if possible, I think I have to parse a bit the algorithms to understand which part in the computations is the actual prior and which part is the initial matrix for the iterative algorithms: I'd be more comfortable in making a guess at it in a new PR so that it's easier to comment and talk about it, without polluting this PR, what do you think we merge this PR first and do this in a next PR ? Also the PR has become quite big now so I guess it would improve the readability of the next changes

Sounds good. This is not a major feature anyway (in most cases it is natural to use the prior as the init)

Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, once the conflicts from merging #193 are fixed

# Conflicts:
#	metric_learn/_util.py
#	metric_learn/covariance.py
#	metric_learn/itml.py
#	metric_learn/lmnn.py
#	metric_learn/lsml.py
#	metric_learn/mlkr.py
#	metric_learn/mmc.py
#	metric_learn/nca.py
#	metric_learn/sdml.py
#	test/metric_learn_test.py
#	test/test_base_metric.py
#	test/test_utils.py
Copy link
Contributor

@perimosocordiae perimosocordiae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to merge once minor docstring tweaks are made.

@wdevazelhes
Copy link
Member Author

@bellet @perimosocordiae I addressed all your comments so I guess the PR is ready to merge now :)

@perimosocordiae perimosocordiae merged commit 130cbad into scikit-learn-contrib:master Jun 7, 2019
@perimosocordiae
Copy link
Contributor

Merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants