nDCG / README.md
JP-SystemsX
Updated README
318c91b
|
raw
history blame
4.68 kB

Metric Card for nDCG

Metric Description

The Discounted Cumulative Gain is a measure of ranking quality. It is used to evaluate Information Retrieval Systems under the 2 assumptions:

  1. Highly relevant documents/Labels are more useful when appearing earlier in the results
  2. Documents/Labels are relevant to different degrees

It is defined as the sum over all relevances of the retrieved documents reduced logarithmically proportional to the position in which they were retrieved. The Normalized DCG (nDCG) divides the resulting value by the optimal value that can be achieved to get a value between 0 and 1 s.t. a perfect retrieval achieves a nDCG of 1.0

How to Use

At minimum, this metric takes as input two lists of lists, each containing floats: predictions and references.

import evaluate
nDCG_metric = evaluate.load('JP-SystemsX/nDCG')
results = nDCG_metric.compute(references=[[0, 1]], predictions=[[0, 1]])
print(results)
["{'nDCG@2': 1.0}"]

Inputs:

references ('list' of 'float'): True relevance

predictions ('list' of 'float'): Either predicted relevance, probability estimates or confidence values

k (int): If set to a value only the k highest scores in the ranking will be considered, else considers all outputs. Defaults to None.

sample_weight (list of float): Sample weights Defaults to None.

ignore_ties ('boolean'): If set to true, assumes that there are no ties (this is likely if predictions are continuous) for efficiency gains. Defaults to False.

Output:

normalized_discounted_cumulative_gain ('float'): The averaged nDCG scores for all samples. Minimum possible value is 0.0 Maximum possible value is 1.0

Output Example(s):

{'nDCG@5': 1.0}

This metric outputs a dictionary, containing the nDCG score

Examples:

Example 1-A simple example
    >>> nDCG_metric = evaluate.load("JP-SystemsX/nDCG")
    >>> results = nDCG_metric.compute(references=[[10, 0, 0, 1, 5]], predictions=[[.1, .2, .3, 4, 70]])
    >>> print(results)
    {'nDCG': 0.6956940443813076}
Example 2-The same as Example 1, except with k set to 3.
    >>> nDCG_metric = evaluate.load("JP-SystemsX/nDCG")
    >>> results = nDCG_metric.compute(references=[[10, 0, 0, 1, 5]], predictions=[[.1, .2, .3, 4, 70]], k=3)
    >>> print(results)
    {'nDCG@3': 0.4123818817534531}
Example 3-There is only one relevant label, but there is a tie and the model can't decide which one is the one.
    >>> accuracy_metric = evaluate.load("accuracy")
    >>> results = nDCG_metric.compute(references=[[1, 0, 0, 0, 0]], predictions=[[1, 1, 0, 0, 0]], k=1)
    >>> print(results)
    {'nDCG@1': 0.5}
    >>> #That is it calculates both and returns the average of both
Example 4-The Same as 3, except ignore_ties is set to True.
    >>> accuracy_metric = evaluate.load("accuracy")
    >>> results = nDCG_metric.compute(references=[[1, 0, 0, 0, 0]], predictions=[[1, 1, 0, 0, 0]], k=1, ignore_ties=True)
    >>> print(results)
    {'nDCG@1': 0.0}
    >>> # Alternative Result: {'nDCG@1': 1.0}
    >>> # That is it chooses one of the 2 candidates and calculates the score only for this one
    >>> # That means the score may vary depending on which one was chosen

Citation(s)

@article{scikit-learn,
  title={Scikit-learn: Machine Learning in {P}ython},
  author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
         and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
         and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
         Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
  journal={Journal of Machine Learning Research},
  volume={12},
  pages={2825--2830},
  year={2011}
}