Skip to content

Add MRR, MAP, DCG, nDCG #46

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Add MRR, MAP, DCG, nDCG #46

wants to merge 20 commits into from

Conversation

abheesht17
Copy link
Collaborator

@abheesht17 abheesht17 commented Apr 9, 2025

TODO:

  • Add UTs.
  • Add documentation.
  • Verify doc-strings.
  • Some small TODOs in the code, which can be decided after code review.
  • Add get_config() for DCG and nDCG

@abheesht17 abheesht17 closed this Apr 9, 2025
@abheesht17 abheesht17 reopened this Apr 9, 2025
@abheesht17 abheesht17 changed the title Add MRR Add MRR, MAP, DCG, nDCG Apr 10, 2025
@abheesht17 abheesht17 marked this pull request as ready for review April 11, 2025 06:56
@abheesht17 abheesht17 requested a review from hertschuh April 11, 2025 06:56
@abheesht17
Copy link
Collaborator Author

MRR ready for review

Copy link
Collaborator

@hertschuh hertschuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Partial review... to be continued

from keras_rs.src.utils.ranking_metrics_utils import sort_by_scores


@keras_rs_export("keras_rs.losses.MeanReciprocalRank")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be keras_rs.metrics.MeanReciprocalRank.

Returns:
List of sorted tensors (`tensors_to_sort`), sorted using `scores`.
"""
# TODO: Consider exposing `shuffle_ties` to the user.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's done, remove TODO for shuffle_ties

shape `(list_size)` or `(batch_size, list_size)`. Defaults to
`None`.
"""
# TODO (abheesht): Should `y_true` be a dict, with `"mask"` as one key
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean as an option? Right now, there's no way to pass a mask, right?

if isinstance(y_pred, list):
y_pred = ops.convert_to_tensor(y_pred)
# `sample_weight` can be a scalar too.
if isinstance(sample_weight, (list, float, int)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all 3 if isinstance(...) and just call ops.convert_to_tensor, it does the if for you.

elif sample_weight_rank == 2:
check_shapes_compatible(sample_weight_shape, y_true_shape)

# Want to make sure `sample_weight` is of the same shape as
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning you should add a check here?

def check_rank(
x_rank: int,
allowed_ranks: tuple[int, ...] = (1, 2),
tensor_name: Optional[str] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It look like tensor_name is not really optional.

# Check ranks and shapes.
def check_rank(
x_rank: int,
allowed_ranks: tuple[int, ...] = (1, 2),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't provide a default for allowed_ranks. When I'm reading the code, I don't want to have to go to the definition to know what ranks I'm checking, I should have them right there where it's called.

from keras_rs.src.utils.keras_utils import check_shapes_compatible


def process_inputs(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This have a more specific name, maybe something like standardize_ranks or check_ranks_and_shapes.

"""
Utility function for processing inputs for losses and metrics.

This utility function does three things:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add Args and Return section.

if k is None:
k = max_possible_k
else:
k = ops.minimum(k, max_possible_k)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split this in 2, because JAX will want a static int value and TensorFlow may need a dynamic value when max_possible_k is a scalar tensor:

    if k is None:
        k = max_possible_k
    elif isinstance(max_possible_k, int):
        k = min(k, max_possible_k)
    else:
        k = ops.minimum(k, max_possible_k)

Copy link
Collaborator

@hertschuh hertschuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this big PR. A lot of work obviously went into this!

More comments...

everywhere, even for lists without any relevant examples because
`sum(per_list_weights) == num(sum(relevance) != 0)`. This handles the
standard ranking metrics where the weights are all
1.0.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, move to previous line.

num(sum(relevance) != 0) / num(lists)
```

The rest have weights 1.0 / num(lists).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: put formula between single quotes.

nonzero_relevance = ops.where(
nonzero_weights,
ops.cast(nonzero_relevance_condition, "float32"),
ops.zeros_like(per_list_relevance),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to do zeros_like rather than just 0.0? It should give the same result. But is it for ragged tensors or something?


# Identify lists where both weights and relevance sums are non-zero.
nonzero_relevance_condition = ops.greater(per_list_relevance, 0.0)
nonzero_relevance = ops.where(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is just a ops.logical_and between nonzero_relevance and nonzero_weights.

A logical_and or even a multiply should be faster than a where.


# Calculate the per-list weights using the core formula
# Numerator: sum(weights * relevance) per list
numerator = ops.sum(weights * relevance, axis=1, keepdims=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ops.multiply(weights, relevance)



def default_rank_discount_fn(rank: types.Tensor) -> types.Tensor:
return ops.divide(ops.log(2.0), ops.log1p(rank))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring above says it's equivalent to lambda rank: log2(rank + 1).

So that would be reversing the arguments of the divide. Also, it would be clearer if written as ops.log2(rank + 1.0).



@keras_rs_export("keras_rs.metrics.nDCG")
class nDCG(RankingMetric):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention is to always use an upper case letter for the first letter of a class, so it should be NDCG, even if it's written as nDCG in papers.

Also the file should be ndcg.py.

I don't actually understand why it's written nDCG since all the letters form an acronym anyway.

@@ -0,0 +1,240 @@
from typing import Callable, Optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of the concept of a utils folder in general, and I typically prefer the utils files to be in the same folder as where it's used. So for instance, I think this could be in keras_rs/src/metrics.

What do you think? Note that keras itself is not consistent on this.

@@ -1,9 +1,8 @@
from typing import Callable, Optional
from typing import Callable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of the concept of a utils folder in general, and I typically prefer the utils files to be in the same folder as where it's used. So for instance, I think this could be in keras_rs/src/losses.

What do you think? Note that keras itself is not consistent on this.

@@ -0,0 +1,62 @@
from typing import Optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of the concept of a utils folder in general, and I typically prefer the utils files to be in the same folder as where it's used. This one is a bit funny because it's for both metrics and losses, but I think it wouldn't be too shocking to put it in keras_rs/src/metrics since a loss is just a kind of metric.

What do you think? Note that keras itself is not consistent on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants