Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1. score_pairs refactor #333

Merged
merged 22 commits into from
Oct 21, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 0 additions & 23 deletions doc/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -123,26 +123,3 @@ to the following resources:
Survey <http://dx.doi.org/10.1561/2200000019>`_ (2012)
- **Book:** `Metric Learning
<http://dx.doi.org/10.2200/S00626ED1V01Y201501AIM030>`_ (2015)

.. Methods [TO MOVE TO SUPERVISED/WEAK SECTIONS]
.. =============================================

.. Currently, each metric learning algorithm supports the following methods:

.. - ``fit(...)``, which learns the model.
.. - ``get_mahalanobis_matrix()``, which returns a Mahalanobis matrix
.. - ``get_metric()``, which returns a function that takes as input two 1D
arrays and outputs the learned metric score on these two points
.. :math:`M = L^{\top}L` such that distance between vectors ``x`` and
.. ``y`` can be computed as :math:`\sqrt{\left(x-y\right)M\left(x-y\right)}`.
.. - ``components_from_metric(metric)``, which returns a transformation matrix
.. :math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a
.. data matrix :math:`X \in \mathbb{R}^{n \times d}` to the
.. :math:`D`-dimensional learned metric space :math:`X L^{\top}`,
.. in which standard Euclidean distances may be used.
.. - ``transform(X)``, which applies the aforementioned transformation.
.. - ``pair_distance(pairs)`` which returns the distance between pairs of
.. points. ``pairs`` should be a 3D array-like of pairs of shape ``(n_pairs,
.. 2, n_features)``, or it can be a 2D array-like of pairs indicators of
.. shape ``(n_pairs, 2)`` (see section :ref:`preprocessor_section` for more
.. details).
29 changes: 9 additions & 20 deletions doc/supervised.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,28 +82,18 @@ array([0.49627072, 3.65287282, 6.06079877])
>>> metric_fun([3.5, 3.6], [5.6, 2.4])
0.4962707194621285

- Alternatively, you can use `pair_similarity` to return the **score** between
points, the more the **score**, the closer the pairs and vice-versa. For
Mahalanobis learners, it is equal to the inverse of the distance.
- Alternatively, you can use `pair_score` to return the **score** between
pairs of points, the larger the **score**, the more similar the pair
and vice-versa. For Mahalanobis learners, it is equal to the opposite
of the distance.

>>> score = nca.pair_similarity([[[3.5, 3.6], [5.6, 2.4]], [[1.2, 4.2], [2.1, 6.4]], [[3.3, 7.8], [10.9, 0.1]]])
>>> score = nca.pair_score([[[3.5, 3.6], [5.6, 2.4]], [[1.2, 4.2], [2.1, 6.4]], [[3.3, 7.8], [10.9, 0.1]]])
>>> score
array([-0.49627072, -3.65287282, -6.06079877])

This is useful because `pair_similarity` matches the **score** sematic of
scikit-learn's `Classification matrics <https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics>`_.
For instance, given a labeled data, you can pass the labels and the
**score** of your data to get the ROC curve.

>>> from sklearn.metrics import roc_curve
>>> fpr, tpr, thresholds = roc_curve(['dog', 'cat', 'dog'], score, pos_label='dog')
>>> fpr
array([0., 0., 1., 1.])
>>> tpr
array([0. , 0.5, 0.5, 1. ])
>>>
>>> thresholds
array([ 0.50372928, -0.49627072, -3.65287282, -6.06079877])
This is useful because `pair_score` matches the **score** semantic of
scikit-learn's `Classification metrics
<https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics>`_.

.. note::

Expand All @@ -116,7 +106,6 @@ array([ 0.50372928, -0.49627072, -3.65287282, -6.06079877])
array([[0.43680409, 0.89169412],
[0.89169412, 1.9542479 ]])

.. TODO: remove the "like it is the case etc..." if it's not the case anymore

Scikit-learn compatibility
--------------------------
Expand All @@ -128,7 +117,7 @@ All supervised algorithms are scikit-learn estimators
scikit-learn model selection routines
(`sklearn.model_selection.cross_val_score`,
`sklearn.model_selection.GridSearchCV`, etc).
You can also use methods from `sklearn.metrics` that rely on y_scores.
You can also use some of the scoring functions from `sklearn.metrics`.

Algorithms
==========
Expand Down
34 changes: 11 additions & 23 deletions doc/weakly_supervised.rst
Original file line number Diff line number Diff line change
Expand Up @@ -175,28 +175,18 @@ array([7.27607365, 0.88853014])
>>> metric_fun([3.5, 3.6, 5.2], [5.6, 2.4, 6.7])
7.276073646278203

- Alternatively, you can use `pair_similarity` to return the **score** between
points, the more the **score**, the closer the pairs and vice-versa. For
Mahalanobis learners, it is equal to the inverse of the distance.
- Alternatively, you can use `pair_score` to return the **score** between
pairs of points, the larger the **score**, the more similar the pair
and vice-versa. For Mahalanobis learners, it is equal to the opposite
of the distance.

>>> score = mmc.pair_similarity([[[3.5, 3.6], [5.6, 2.4]], [[1.2, 4.2], [2.1, 6.4]], [[3.3, 7.8], [10.9, 0.1]]])
>>> score = mmc.pair_score([[[3.5, 3.6], [5.6, 2.4]], [[1.2, 4.2], [2.1, 6.4]], [[3.3, 7.8], [10.9, 0.1]]])
>>> score
array([-0.49627072, -3.65287282, -6.06079877])

This is useful because `pair_similarity` matches the **score** sematic of
scikit-learn's `Classification matrics <https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics>`_.
For instance, given a labeled data, you can pass the labels and the
**score** of your data to get the ROC curve.

>>> from sklearn.metrics import roc_curve
>>> fpr, tpr, thresholds = roc_curve(['dog', 'cat', 'dog'], score, pos_label='dog')
>>> fpr
array([0., 0., 1., 1.])
>>> tpr
array([0. , 0.5, 0.5, 1. ])
>>>
>>> thresholds
array([ 0.50372928, -0.49627072, -3.65287282, -6.06079877])
This is useful because `pair_score` matches the **score** semantic of
scikit-learn's `Classification metrics
<https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics>`_.

.. note::

Expand All @@ -210,8 +200,6 @@ array([[ 0.58603894, -5.69883982, -1.66614919],
[-5.69883982, 55.41743549, 16.20219519],
[-1.66614919, 16.20219519, 4.73697721]])

.. TODO: remove the "like it is the case etc..." if it's not the case anymore

.. _sklearn_compat_ws:

Prediction and scoring
Expand Down Expand Up @@ -368,7 +356,7 @@ returns the `sklearn.metrics.roc_auc_score` (which is threshold-independent).
.. note::
See :ref:`fit_ws` for more details on metric learners functions that are
not specific to learning on pairs, like `transform`, `pair_distance`,
`get_metric` and `get_mahalanobis_matrix`.
`pair_score`, `get_metric` and `get_mahalanobis_matrix`.

Algorithms
----------
Expand Down Expand Up @@ -715,7 +703,7 @@ of triplets that have the right predicted ordering.
.. note::
See :ref:`fit_ws` for more details on metric learners functions that are
not specific to learning on pairs, like `transform`, `pair_distance`,
`get_metric` and `get_mahalanobis_matrix`.
`pair_score`, `get_metric` and `get_mahalanobis_matrix`.



Expand Down Expand Up @@ -883,7 +871,7 @@ of quadruplets have the right predicted ordering.
.. note::
See :ref:`fit_ws` for more details on metric learners functions that are
not specific to learning on pairs, like `transform`, `pair_distance`,
`get_metric` and `get_mahalanobis_matrix`.
`pair_score`, `get_metric` and `get_mahalanobis_matrix`.



Expand Down
97 changes: 51 additions & 46 deletions metric_learn/base_metric.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,20 +29,22 @@ def __init__(self, preprocessor=None):
@abstractmethod
def score_pairs(self, pairs):
"""
.. deprecated:: 0.6.3 Refer to `pair_distance` and `pair_similarity`.
.. deprecated:: 0.7.0 Refer to `pair_distance` and `pair_similarity`.

.. warning::
This method will be deleted in 0.6.4. Please refer to `pair_distance`
This method will be removed in 0.8.0. Please refer to `pair_distance`
or `pair_similarity`. This change will occur in order to add learners
that don't necessarly learn a Mahalanobis distance.
that don't necessarily learn a Mahalanobis distance.

Returns the score between pairs
(can be a similarity, or a distance/metric depending on the algorithm)

Parameters
----------
pairs : `numpy.ndarray`, shape=(n_samples, 2, n_features)
3D array of pairs.
pairs : array-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2)
3D Array of pairs to score, with each row corresponding to two points,
for 2D array of indices of pairs if the metric learner uses a
preprocessor.

Returns
-------
Expand All @@ -52,27 +54,29 @@ def score_pairs(self, pairs):
See Also
--------
get_metric : a method that returns a function to compute the metric between
two points. The difference with `score_pairs` is that it works on two
1D arrays and cannot use a preprocessor. Besides, the returned function
is independent of the metric learner and hence is not modified if the
metric learner is.
two points. The difference between `pair_score` and `pair_distance` is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, this is in the "see also" of pair_score, so it should only be "The difference with pair_score"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually of the old score_pairs, so I changed it for "The difference with score_pairs".

pair_score and pair_distance 's "See Also" are ok

that it works on two 1D arrays and cannot use a preprocessor. Besides,
the returned function is independent of the metric learner and hence is
not modified if the metric learner is.
"""

@abstractmethod
def pair_similarity(self, pairs):
def pair_score(self, pairs):
"""
.. versionadded:: 0.6.3 Compute the similarity score bewteen pairs
.. versionadded:: 0.7.0 Compute the similarity score between pairs

Returns the similarity score between pairs. Depending on the algorithm,
this method can return the learned similarity score between pairs,
or the inverse of the distance learned between two pairs. The more the
score, the more similar the pairs. All learners have access to this
Returns the similarity score between pairs of points. Depending on the
algorithm, this method can return the learned similarity score between
pairs, or the opposite of the distance learned between pairs. The larger
the score, the more similar the pair. All learners have access to this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: this is a bit heavy. I would recommend simply:
"Returns the similarity score between pairs of points (the larger the score, the more similar the pair). For metric learners that learn a distance, the score is simply the opposite of the distance between pairs."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed and kept the last sentence "All learners have access to this method".

method.

Parameters
----------
pairs : `numpy.ndarray`, shape=(n_samples, 2, n_features)
3D array of pairs.
pairs : array-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2)
3D Array of pairs to score, with each row corresponding to two points,
for 2D array of indices of pairs if the metric learner uses a
preprocessor.

Returns
-------
Expand All @@ -82,7 +86,7 @@ def pair_similarity(self, pairs):
See Also
--------
get_metric : a method that returns a function to compute the metric between
two points. The difference with `pair_similarity` is that it works on two
two points. The difference with `pair_score` is that it works on two
1D arrays and cannot use a preprocessor. Besides, the returned function
is independent of the metric learner and hence is not modified if the
metric learner is.
Expand All @@ -91,17 +95,18 @@ def pair_similarity(self, pairs):
@abstractmethod
def pair_distance(self, pairs):
"""
.. versionadded:: 0.6.3 Compute the distance score between pairs
.. versionadded:: 0.7.0 Compute the distance between pairs

Returns the distance score between pairs. For Mahalanobis learners, it
returns the pseudo-distance bewtween pairs. It is not available for
learners that does not learn a distance or pseudo-distance, an error
will be shown instead.
Returns the (pseudo) distance between pairs, when available. For metric
learners that do not learn a (pseudo) distance, an error is thrown
instead.

Parameters
----------
pairs : `numpy.ndarray`, shape=(n_samples, 2, n_features)
3D array of pairs.
pairs : array-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2)
3D Array of pairs to score, with each row corresponding to two points,
for 2D array of indices of pairs if the metric learner uses a
preprocessor.

Returns
-------
Expand Down Expand Up @@ -170,10 +175,10 @@ def _prepare_inputs(self, X, y=None, type_of_inputs='classic',

@abstractmethod
def get_metric(self):
"""Returns a function that takes as input two 1D arrays and outputs the
learned metric score on these two points. Depending on the algorithm, it
can return the distance or similarity function between pairs. It always
returns what the specific algorithm learns.
"""Returns a function that takes as input two 1D arrays and outputs
the value of the learned metric on these two points. Depending on the
algorithm, it can return a distance or a score function between pairs.
It always returns what the specific algorithm learns.

This function will be independent from the metric learner that learned it
(it will not be modified if the initial metric learner is modified),
Expand Down Expand Up @@ -206,13 +211,13 @@ def get_metric(self):

See Also
--------
pair_distance : a method that returns the distance score between several
pair_distance : a method that returns the distance between several
pairs of points. Unlike `get_metric`, this is a method of the metric
learner and therefore can change if the metric learner changes. Besides,
it can use the metric learner's preprocessor, and works on concatenated
arrays.

pair_similarity : a method that returns the similarity score between
pair_score : a method that returns the similarity score between
several pairs of points. Unlike `get_metric`, this is a method of the
metric learner and therefore can change if the metric learner changes.
Besides, it can use the metric learner's preprocessor, and works on
Expand Down Expand Up @@ -260,13 +265,13 @@ class MahalanobisMixin(BaseMetricLearner, MetricTransformer,

def score_pairs(self, pairs):
r"""
.. deprecated:: 0.6.3
.. deprecated:: 0.7.0
This method is deprecated. Please use `pair_distance` instead.

.. warning::
This method will be deleted in 0.6.4. Please refer to `pair_distance`
or `pair_similarity`. This change will occur in order to add learners
that don't necessarly learn a Mahalanobis distance.
This method will be removed in 0.8.0. Please refer to `pair_distance`
or `pair_score`. This change will occur in order to add learners
that don't necessarily learn a Mahalanobis distance.

Returns the learned Mahalanobis distance between pairs.

Expand Down Expand Up @@ -301,15 +306,15 @@ def score_pairs(self, pairs):
:ref:`mahalanobis_distances` : The section of the project documentation
that describes Mahalanobis Distances.
"""
dpr_msg = ("score_pairs will be deprecated in release 0.6.3. "
"Use pair_similarity to compute similarities, or "
dpr_msg = ("score_pairs will be deprecated in release 0.7.0. "
"Use pair_score to compute similarity scores, or "
"pair_distances to compute distances.")
warnings.warn(dpr_msg, category=FutureWarning)
return self.pair_distance(pairs)

def pair_similarity(self, pairs):
def pair_score(self, pairs):
"""
Returns the inverse of the learned Mahalanobis distance between pairs.
Returns the opposite of the learned Mahalanobis distance between pairs.

Parameters
----------
Expand All @@ -321,12 +326,12 @@ def pair_similarity(self, pairs):
Returns
-------
scores : `numpy.ndarray` of shape=(n_pairs,)
The inverse of the learned Mahalanobis distance for every pair.
The opposite of the learned Mahalanobis distance for every pair.

See Also
--------
get_metric : a method that returns a function to compute the metric between
two points. The difference with `pair_similarity` is that it works on two
two points. The difference with `pair_score` is that it works on two
1D arrays and cannot use a preprocessor. Besides, the returned function
is independent of the metric learner and hence is not modified if the
metric learner is.
Expand Down Expand Up @@ -517,7 +522,7 @@ def decision_function(self, pairs):
pairs = check_input(pairs, type_of_inputs='tuples',
preprocessor=self.preprocessor_,
estimator=self, tuple_size=self._tuple_size)
return self.pair_similarity(pairs)
return self.pair_score(pairs)

def score(self, pairs, y):
"""Computes score of pairs similarity prediction.
Expand Down Expand Up @@ -787,8 +792,8 @@ def decision_function(self, triplets):
triplets = check_input(triplets, type_of_inputs='tuples',
preprocessor=self.preprocessor_,
estimator=self, tuple_size=self._tuple_size)
return (self.pair_similarity(triplets[:, :2]) -
self.pair_similarity(triplets[:, [0, 2]]))
return (self.pair_score(triplets[:, :2]) -
self.pair_score(triplets[:, [0, 2]]))

def score(self, triplets):
"""Computes score on input triplets.
Expand Down Expand Up @@ -872,8 +877,8 @@ def decision_function(self, quadruplets):
quadruplets = check_input(quadruplets, type_of_inputs='tuples',
preprocessor=self.preprocessor_,
estimator=self, tuple_size=self._tuple_size)
return (self.pair_similarity(quadruplets[:, :2]) -
self.pair_similarity(quadruplets[:, 2:]))
return (self.pair_score(quadruplets[:, :2]) -
self.pair_score(quadruplets[:, 2:]))

def score(self, quadruplets):
"""Computes score on input quadruplets
Expand Down
6 changes: 3 additions & 3 deletions test/test_base_metric.py
Original file line number Diff line number Diff line change
Expand Up @@ -284,12 +284,12 @@ def test_score_pairs_warning(estimator, build_dataset):
model = clone(estimator)
set_random_state(model)

# we fit the metric learner on it and then we call score_apirs on some
# We fit the metric learner on it and then we call score_pairs on some
# points
model.fit(*remove_y(model, input_data, labels))

msg = ("score_pairs will be deprecated in release 0.6.3. "
"Use pair_similarity to compute similarities, or "
msg = ("score_pairs will be deprecated in release 0.7.0. "
"Use pair_score to compute similarity scores, or "
"pair_distances to compute distances.")
with pytest.warns(FutureWarning) as raised_warning:
score = model.score_pairs([[X[0], X[1]], ])
Expand Down
Loading