[MRG] Remove preprocessing the data for RCA #194

wdevazelhes · 2019-04-23T06:54:19Z

Fixes #125

This PR removes the preprocessing of the data in the algorithms.

TODO:

when the uniformized init is merged [MRG] Uniformize initialization for all algorithms #195, ensure that the ChangedBehaviorMessage explains that in some cases 'pca' init is not used as before, and in order to use it one should set the 'init' parameter --> done in [MRG] Uniformize initialization for all algorithms #195
add a small test to test the newly changed warning for RCA when the inner covariance matrix is singular

wdevazelhes · 2019-04-23T07:07:02Z

metric_learn/rca.py

-    if self.pca_comps is not None:
-      pca = decomposition.PCA(n_components=self.pca_comps)
-      X_t = pca.fit_transform(X)
-      M_pca = pca.components_


Note that this code was giving a PCA initialization at the same time, so for now we'll remove it, but I think I'll do the PR about initialization before merging this PR into master, and then we can merge it into this PR to keep the same possibility of initialization with PCA

bellet · 2019-04-23T12:14:54Z

There is still a warning that mentions pca_comps to update. if the matrix is low-rank (due to high dimension compared to the number of points) , we can suggest the user to use a dimensionality reduction technique such as PCA as preprocessing. Unless we have a better idea

# Conflicts: # metric_learn/rca.py # test/metric_learn_test.py # test/test_base_metric.py

wdevazelhes · 2019-06-06T15:57:46Z

I think this PR is ready to merge as soon as the initialization PR is merged (#195)

# Conflicts: # metric_learn/rca.py # test/metric_learn_test.py # test/test_base_metric.py

wdevazelhes · 2019-06-11T07:36:56Z

@bellet @perimosocordiae I guess this PR is ready to merge now

bellet · 2019-06-11T08:22:15Z

metric_learn/rca.py

-      X_t = pca.fit_transform(X)
-      M_pca = pca.components_
-    else:
-      X_t = X - X.mean(axis=0)


why is this centering step gone?

I guess because we should remove any pre-processing step, but I agree I didn't talk about it at all, maybe we should keep the ChangedBehaviorWarning message below, but rather replace "no longer trained on a preprocessed version" by "no longer trained on centered data by default", and encourage to use a "StandardScaler" if needed ?

Fair enough (I double-checked and this centering is not part of standard RCA)
Maybe keep ChangedBehaviorWarning but change to "no longer center the data before training RCA" (no need to mention scaler I think)
And in the deprecation warning, add the fact that PCA preprocessing should now be done by the user

Finally, have you checked the influence of removing the centering step on the examples?

bellet · 2019-06-11T08:24:34Z

metric_learn/rca.py

-    else:
-      X_t = X - X.mean(axis=0)
-      M_pca = None
+    warnings.warn("RCA will no longer be trained on a preprocessed version "


Do we need a ChangedBehaviorWarning? The default behavior was pca_comps=None
Maybe the information in this warning should rather go to the deprecation warning?

bellet · 2019-06-12T14:16:33Z

I think addressing #194 (comment) and then we can merge

wdevazelhes · 2019-06-12T14:38:34Z

I think addressing #194 (comment) and then we can merge

Done, I'll just check quickly what it changes on iris for instance, as you recommended for check-proof

wdevazelhes · 2019-06-12T14:54:22Z

With the same example as plot_metric_learning_examples.py, it seems that no centering does fine:

from metric_learn import RCA_Supervised
from sklearn.datasets import make_classification
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

X, y = make_classification(n_samples=100, n_classes=3, n_clusters_per_class=2,
                           n_informative=3, class_sep=4., n_features=5,
                           n_redundant=0, shuffle=True, random_state=42,
                           scale=[1, 1, 20, 20, 20])

tsne = TSNE()
X_e = tsne.fit_transform(X)
plt.figure()
plt.scatter(X_e[:, 0], X_e[:, 1], c=y)
plt.show()
rca = RCA_Supervised(num_chunks=30, chunk_size=2)
X_r = rca.fit_transform(X, y)
X_er = tsne.fit_transform(X_r)
plt.figure()
plt.scatter(X_er[:, 0], X_er[:, 1], c=y)
plt.show()

Here are the tsne plots after RCA transformation:

for this PR:

for master:

wdevazelhes · 2019-06-12T14:54:51Z

So I guess we're OK to merge now

William de Vazelhes added 3 commits April 23, 2019 08:43

Remove initialization of the data for RCA

e21e856

Add deprecated flag for supervised version too

3882b51

Remove comment saying we'll do PCA

b6c216e

wdevazelhes commented Apr 23, 2019

View reviewed changes

William de Vazelhes added 2 commits April 23, 2019 09:22

Add ChangedBehaviorWarning and do tests

32b86c0

improve change behavior warning

d6cdb8f

William de Vazelhes added 6 commits April 23, 2019 17:07

Update message in case covariance matrix is not invertible

2497366

FIX: still ignore testing RCA while fixed in scikit-learn-contrib#198

ea68d0d

Merge branch 'master' into maint/remove_preprocessing

a78767e

# Conflicts: # metric_learn/rca.py # test/metric_learn_test.py # test/test_base_metric.py

Some reformatting

27985ec

Fix test string

ddee294

TST: add test for warning message when covariance is not definite

8021089

Merge branch 'master' into maint/remove_preprocessing

a05782f

# Conflicts: # metric_learn/rca.py # test/metric_learn_test.py # test/test_base_metric.py

wdevazelhes requested review from perimosocordiae, bellet and nvauquie June 11, 2019 07:37

wdevazelhes changed the title ~~[WIP] Remove preprocessing the data for RCA~~ [MRG] Remove preprocessing the data for RCA Jun 11, 2019

bellet reviewed Jun 11, 2019

View reviewed changes

perimosocordiae approved these changes Jun 11, 2019

View reviewed changes

Address scikit-learn-contrib#194 (comment)

6d6a38a

bellet approved these changes Jun 12, 2019

View reviewed changes

bellet merged commit 8518517 into scikit-learn-contrib:master Jun 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Remove preprocessing the data for RCA #194

[MRG] Remove preprocessing the data for RCA #194

wdevazelhes commented Apr 23, 2019 •

edited

Loading

wdevazelhes Apr 23, 2019

bellet commented Apr 23, 2019

wdevazelhes commented Jun 6, 2019

wdevazelhes commented Jun 11, 2019

bellet Jun 11, 2019

wdevazelhes Jun 11, 2019

bellet Jun 11, 2019

bellet Jun 11, 2019

wdevazelhes Jun 11, 2019

bellet commented Jun 12, 2019

wdevazelhes commented Jun 12, 2019

wdevazelhes commented Jun 12, 2019

wdevazelhes commented Jun 12, 2019

[MRG] Remove preprocessing the data for RCA #194

[MRG] Remove preprocessing the data for RCA #194

Conversation

wdevazelhes commented Apr 23, 2019 • edited Loading

wdevazelhes Apr 23, 2019

Choose a reason for hiding this comment

bellet commented Apr 23, 2019

wdevazelhes commented Jun 6, 2019

wdevazelhes commented Jun 11, 2019

bellet Jun 11, 2019

Choose a reason for hiding this comment

wdevazelhes Jun 11, 2019

Choose a reason for hiding this comment

bellet Jun 11, 2019

Choose a reason for hiding this comment

bellet Jun 11, 2019

Choose a reason for hiding this comment

wdevazelhes Jun 11, 2019

Choose a reason for hiding this comment

bellet commented Jun 12, 2019

wdevazelhes commented Jun 12, 2019

wdevazelhes commented Jun 12, 2019

wdevazelhes commented Jun 12, 2019

wdevazelhes commented Apr 23, 2019 •

edited

Loading