-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] FIX: make proposal for sdml formulation #162
[MRG] FIX: make proposal for sdml formulation #162
Conversation
I think this is correct except for the initialization: M_0 should be either identity or the empirical covariance matrix (see SDML paper, beginning of page 7, where these two possible initializations are mentioned). Hence in this case we should compute Or more generally (this is probably how things should be written): Once this has been updated, if it does not fix the problem, here are some next things to try:
|
Ah that's right, I didn't see that. But isn't it weird to use the covariance matrix as a prior ? I thought we would usually take the inverse covariance matrix (Mahalanobis Matrix)? Like for instance if we take the balance parameter to zero the learned metric would be a sparse covariance matrix, wouldn't we want a sparse precision matrix (mahalanobis matrix) instead ?
Yes I agree I should have written things like that |
I agree, I'll test this and report the result here |
Yes in fact you are right, it must be a typo. M_0 should be the inverse covariance matrix so that both terms of |
An update on this topic: using the package http://github.com/skggm/skggm, replacing |
Until the sklearn version is made more robust (as intended in scikit-learn/scikit-learn#12228), I would vote for temporarily switching to skggm so that we have a working implementation |
One of the skggm developers here. Do let us know how using skggm works out for you. |
test/metric_learn_test.py
Outdated
@@ -141,7 +141,7 @@ def test_iris(self): | |||
sdml = SDML_Supervised(num_constraints=1500) | |||
sdml.fit(self.iris_points, self.iris_labels, random_state=rs) | |||
csep = class_separation(sdml.transform(self.iris_points), self.iris_labels) | |||
self.assertLess(csep, 0.25) | |||
self.assertLess(csep, 0.20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The score that I had when running the example was 0.1933487414312185 so I thought let's be a bit more conservative that 0.25 so that if it fails we would be warned
# Conflicts: # metric_learn/sdml.py
…matrix) as initialization
Note that cython is also needed as a dependency for skggm |
We shouldn't need to specify Cython as a dependency unless we have Cython code in metric-learn itself. |
Ah yes that's right, I could install it without explicitly installing cython indeed, even if skggm installs cython itself (among other packages) |
Oh but it worked locally, with python 3.7. With python 3.4 (like travis), is fails indeed. Maybe that's because skggm say they are compatible for python 3.6.x ? Maybe that's the occasion to move to python 3.6.x for metric learn ? According to pip messages I have when installing on python 3.4: "Please upgrade your Python as Python 3.4 won't be maintained after March 2019 (cf PEP 429)" |
metric_learn/constraints.py
Outdated
@@ -96,6 +96,6 @@ def wrap_pairs(X, constraints): | |||
c = np.array(constraints[2]) | |||
d = np.array(constraints[3]) | |||
constraints = np.vstack((np.column_stack((a, b)), np.column_stack((c, d)))) | |||
y = np.vstack([np.ones((len(a), 1)), - np.ones((len(c), 1))]) | |||
y = np.hstack([np.ones((len(a),)), - np.ones((len(c),))]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe be simpler to do:
y = np.concatenate([np.ones_like(a), -np.ones_like(c)])
metric_learn/_util.py
Outdated
@@ -15,6 +15,14 @@ def vector_norm(X): | |||
return np.linalg.norm(X, axis=1) | |||
|
|||
|
|||
def has_installed_skggm(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs to be a function, as the answer isn't going to change during execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right thanks
metric_learn/sdml.py
Outdated
|
||
from .base_metric import MahalanobisMixin, _PairsClassifierMixin | ||
from .constraints import Constraints, wrap_pairs | ||
from ._util import transformer_from_metric | ||
from ._util import transformer_from_metric, has_installed_skggm | ||
if has_installed_skggm(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer a simpler conditional import here:
try:
from inverse_covariance import quic
except ImportError:
HAS_SKGGM = False
else:
HAS_SKGGM = True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need to duplicate the logic in the tests, but I'm fine with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
metric_learn/sdml.py
Outdated
"version.)") | ||
warnings.warn(msg) | ||
else: | ||
print("SDML will use skggm's solver.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should only print this if self.verbose
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe clarify: skggm's graphical lasso solver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and maybe print something similar when sklearn solver is used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
metric_learn/sdml.py
Outdated
"scikit-learn's graphical_lasso method. It can fail to converge" | ||
"on some non SPD matrices where skggm would converge. If so, " | ||
"try to install skggm. (see the README.md for the right " | ||
"version.)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can catch the case where scikit-learn's version fails and emit the warning then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, here is the new version I'll commit. When using scikit-learn's graphical lasso, we try it and if an error is returned or the result is not finite, we raise a warning that will be printed before the error (if there is an error), or before returning M (if there is no error but there are NaNs) : Tell me what you think
if HAS_SKGGM:
theta0 = pinvh(sigma0)
M, _, _, _, _, _ = quic(emp_cov, lam=self.sparsity_param,
msg=self.verbose,
Theta0=theta0, Sigma0=sigma0)
else:
try:
_, M = graphical_lasso(emp_cov, alpha=self.sparsity_param,
verbose=self.verbose,
cov_init=sigma0)
error = None
except FloatingPointError as e:
error = e
if not np.isfinite(M).all() or error is not None:
msg = ("Scikit-learn's graphical lasso has failed to converge. "
"Package skggm's graphical lasso can sometimes converge on "
"non SPD cases where scikit-learn's graphical lasso fails to "
"converge. Try to install skggm and rerun the algorithm. (see "
"the README.md for the right version.)")
warnings.warn(msg)
if error is not None:
raise(error)
EDIT:
if HAS_SKGGM:
theta0 = pinvh(sigma0)
M, _, _, _, _, _ = quic(emp_cov, lam=self.sparsity_param,
msg=self.verbose,
Theta0=theta0, Sigma0=sigma0)
else:
try:
_, M = graphical_lasso(emp_cov, alpha=self.sparsity_param,
verbose=self.verbose,
cov_init=sigma0)
except FloatingPointError as e:
msg = ("Scikit-learn's graphical lasso has failed to converge. "
"Package skggm's graphical lasso can sometimes converge on "
"non SPD cases where scikit-learn's graphical lasso fails to "
"converge. Try to install skggm and rerun the algorithm. (see "
"the README.md for the right version of skggm.)")
warnings.warn(msg)
raise(e)
(in fact it's skggm's graphical lasso that throws nans, scikit-learn's graphical lasso will return FloatingPointError in case of error (i didn't find cases where it would give nans) so it's better to stick to the case we know)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, trying to come up with the right tests, I realized the following:
in pathological diagonal cases (see above), scikit-learn's graphical lasso can return no error and return a result that is not SPD (in the below example, a diagonal matrix with a negative coefficient on the diagonal). But then a NAN will appear when taking the transformer_from_metric
. So maybe if using scikit-learn, I should additionally check that the result is indeed SPD and if not raise the error "SDML has failed to converge to an SPD matrix using scikit-learn's graphical lasso"
I don't know if skggm guarantees to return an SPD matrix, so maybe I should do the check there too ? (and then later if we dig deeper into that we could remove this if the test is not/[no more] needed)
Example (go in debug mode or put a print statement in SDML to see the result of the graphical lasso)
from metric_learn import SDML
import numpy as np
pairs = np.array([[[-10., 0.], [10., 0.]], [[0., 50.], [0., -60]]])
y_pairs = [1, -1]
sdml = SDML(use_cov=False, balance_param=100,verbose=True)
diff = pairs[:, 0] - pairs[:, 1]
emp_cov = np.identity(pairs.shape[2]) + 100 * (diff.T * y_pairs).dot(diff)
print(emp_cov)
sdml.fit(pairs, y_pairs)
print(sdml.get_mahalanobis_matrix())
Returns:
[[ 40001. 0.]
[ 0. -1209999.]]
SDML will use scikit-learn's graphical lasso solver.
/home/will/Code/metric-learn/metric_learn/sdml.py:92: ConvergenceWarning: Warning, the input matrix of graphical lasso is not positive semi-definite (PSD). The algorithm may diverge, and lead to degenerate solutions. To prevent that, try to decrease the balance parameter `balance_param` and/or to set use_covariance=False.
ConvergenceWarning)
[graphical_lasso] Iteration 0, cost inf, dual gap 0.000e+00
/home/will/Code/metric-learn/metric_learn/_util.py:346: RuntimeWarning: invalid value encountered in sqrt
return np.sqrt(metric)
[[2.4999375e-05 nan]
[ nan nan]]
And if we print the result of graphical lasso (note that it's the inverse of the initial matrix):
[[ 2.49993750e-05 0.00000000e+00]
[ 0.00000000e+00 -8.26446964e-07]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this:
1- first check if the matrix given as input is PSD. If not, show the warning that the algorithm may diverge etc
2- regardless of the solver: if an error is thrown by the solver, or the solution is not a PSD matrix, show error and abort (as there is no result or the result does not correspond to a valid distance)
3- if we fall in case 2- and we use sklearn solver, say that the user could try to install skggm as its graphical lasso solver is more stable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I'll go for something like this:
try:
if HAS_SKGGM:
theta0 = pinvh(sigma0)
M, _, _, _, _, _ = quic(emp_cov, lam=self.sparsity_param,
msg=self.verbose,
Theta0=theta0, Sigma0=sigma0)
else:
_, M = graphical_lasso(emp_cov, alpha=self.sparsity_param,
verbose=self.verbose,
cov_init=sigma0)
raised_error = None
w_mahalanobis, _ = np.linalg.eigh(M)
not_spd = any(w_mahalanobis < 0.)
except Exception as e:
raised_error = e
not_spd = False # not_spd not applicable so we set to False
if raised_error is not None or not_spd:
msg = ("There was a problem in SDML when using {}'s graphical "
"lasso.").format("skggm" if HAS_SKGGM else "scikit-learn")
if not HAS_SKGGM:
skggm_advice = ("skggm's graphical lasso can sometimes converge "
"on non SPD cases where scikit-learn's graphical "
"lasso fails to converge. Try to install skggm and "
"rerun the algorithm. (See the README.md for the "
"right version of skggm.)")
msg += skggm_advice
if raised_error is not None:
msg += "The following error message was thrown: {}.".format(
raised_error)
raise RuntimeError(msg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add infinite values that can be returned by skggm as a failure case too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, in commit eb95719
metric_learn/sdml.py
Outdated
_, M = graph_lasso(emp_cov, self.sparsity_param, verbose=self.verbose) | ||
|
||
self.transformer_ = transformer_from_metric(M) | ||
emp_cov = pinvh(prior) + self.balance_param * loss_matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems odd to round-trip the covariance matrix through two pinvh
calls in the self.use_cov
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, done
"and lead to degenerate solutions. " | ||
"To prevent that, try to decrease the balance parameter " | ||
"`balance_param` and/or to set use_covariance=False.", | ||
ConvergenceWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with it.
metric_learn/sdml.py
Outdated
"To prevent that, try to decrease the balance parameter " | ||
"`balance_param` and/or to set use_covariance=False.", | ||
ConvergenceWarning) | ||
sigma0 = (V * (w - min(0, np.min(w)) + 1e-10)).dot(V.T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe simpler:
min_eigval = w.min()
if min_eigval < 0:
warnings.warn(...)
min_eigval = 0
w += 1e-10 - min_eigval
sigma0 = (V * w).dot(V.T)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the heuristic if min_eigval > 0
we would want to keep the matrix untouched (with just an epsilon added), (so I don't think with your solution it would be the case since we would substract the min eigval) but I agree something simpler would be better, so something like:
min_eigval = w.min()
if min_eigval < 0:
warnings.warn(...)
w -= min_eigval # we translate the eigenvalues to make them all positive
w += 1e-10 # we add a small offset to avoid definiteness problems
sigma0 = (V * w).dot(V.T)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
README.rst
Outdated
@@ -20,7 +20,7 @@ Metric Learning algorithms in Python. | |||
**Dependencies** | |||
|
|||
- Python 2.7+, 3.4+ | |||
- numpy, scipy, scikit-learn | |||
- numpy, scipy, scikit-learn, and skggm (commit `a0ed406 <https://github.com/skggm/skggm/commit/a0ed406586c4364ea3297a658f415e13b5cbdaf8>`_) for `SDML` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call out skggm separately as an optional dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I agree, I forgot to change that after making skggm optional
setup.py
Outdated
@@ -38,6 +38,7 @@ | |||
extras_require=dict( | |||
docs=['sphinx', 'shinx_rtd_theme', 'numpydoc'], | |||
demo=['matplotlib'], | |||
sdml=['skggm'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we specify a commit hash here? Or maybe since their latest release is 0.2.8, we could specify skggm>=0.2.9
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried indeed but didn't manage to make it work. But good idea, skggm>=0.2.9 seems is better than nothing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few minor changes to make in addition to those requested by @perimosocordiae. But otherwise it looks good! Nice work
Have you verified that it fixes #138 and #154 ? If so please indicate this in the PR text
doc/getting_started.rst
Outdated
@@ -15,7 +15,7 @@ Alternately, download the source repository and run: | |||
**Dependencies** | |||
|
|||
- Python 2.7+, 3.4+ | |||
- numpy, scipy, scikit-learn | |||
- numpy, scipy, scikit-learn, and skggm (commit `a0ed406 <https://github.com/skggm/skggm/commit/a0ed406586c4364ea3297a658f415e13b5cbdaf8>`_) for `SDML` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here as https://github.com/metric-learn/metric-learn/pull/162/files#r264683385
metric_learn/sdml.py
Outdated
_, M = graph_lasso(emp_cov, self.sparsity_param, verbose=self.verbose) | ||
|
||
self.transformer_ = transformer_from_metric(M) | ||
emp_cov = pinvh(prior) + self.balance_param * loss_matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
metric_learn/sdml.py
Outdated
"version.)") | ||
warnings.warn(msg) | ||
else: | ||
print("SDML will use skggm's solver.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe clarify: skggm's graphical lasso solver
test/test_mahalanobis_mixin.py
Outdated
for slice_idx in slices[trunc_data.shape[1]]: | ||
pairs = trunc_data[:, slice_idx, :] | ||
diffs = pairs[:, 1, :] - pairs[:, 0, :] | ||
to_keep = np.nonzero(diffs.ravel()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit difficult to parse. Why do we need these slices? I am a bit lazy to check but maybe it should be made more clear even if it is less efficient (this is a small 1D dataset anyway so we don't care)
also maybe removing things that are very close to being the same (as opposed to exactly the same) would be more robust
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's difficult to parse, I'll change the test copying/pasting for the quadruplets/pairs case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, in commit 31072d3
What do you think ?
@bellet I just checked and it does not directly fixes them, since the two issues in fact are in cases where the matrix is not SPD and there are convergence problems. So I think to solve them we should change the balance parameter. I'll open a different PR for that |
Travis is failing because of a weird bug, I think related to pytest: when I run |
…nd not just the test
This will be able to be fixed as soon as #186 is merged (i.e. if we go to the latest pytest) |
…t file and not just the test" This reverts commit 6f5666b.
I think it should be ready to merge now ! :) |
Digging into SDML's code and paper, I don't understand some parts of the implementation. I think it should be as proposed in this PR. Tell me if I'm wrong
Looking at these this paper about SDML https://icml.cc/Conferences/2009/papers/46.pdf, line , and this paper about Graphical Lasso http://statweb.stanford.edu/~tibs/ftp/graph.pdf: it seems that in SDML, we want to optimize equation 8, which can indeed be done with Graphical Lasso according to the paper on Graphical Lasso. In fact, equation 8 in SDML's paper is the same as equation 1 in Graphical Lasso's paper (up to a minus sign). The following variables are equivalent:
where in SDML's paper,
P = M_0^-1 + etha * X.L.X^T
.But note that in SDML's paper, M_0^-1 is the inverse of the a priori Mahalanobis matrix, which can be indeed initialized to the inverse of the covariance matrix. So then M_0^-1 will be the inverse of the inverse of the covariance matrix hence the covariance matrix itself.
So we should just compute
P = emp. covariance matrix + self.balance_param * loss_matrix
and do graphical lasso on this (and not: inverse the covariance matrix, doP = this_inverse + self.balance_param * loss_matrix
, inverse the result, and compute Graphical Lasso on this as it is done currently)And in both cases, we want to evaluate the sparse inverse of M/theta, so things are OK for that
Also, I didn't get the hack to ensure positive semidefinite, doesn't it change the result ?
This PR's modification does not fix the issues we had (like
plot_sandwich.py
does not have better results with this). So maybe let's not merge it until we have a whole fix for SDML.There are other things that could explain why SDML doesn't work, like choosing the optimization parameter alpha, and also that
graph_lasso
seems to sometimes work badly, see scikit-learn/scikit-learn#6887 and scikit-learn/scikit-learn#11417)So first tell me if you agree with this modification (not on merging it or not, just whether it's the right formula or not), so that we can look elsewhere to see how to fix SDML's error.
TODO:
SDML