Spaces:
Runtime error
Runtime error
title: nDCG | |
emoji: 👁 | |
colorFrom: red | |
colorTo: blue | |
sdk: gradio | |
sdk_version: 3.9.1 | |
app_file: app.py | |
pinned: false | |
license: mit | |
tags: | |
- evaluate | |
- metric | |
- ranking | |
description: >- | |
The Discounted Cumulative Gain is a measure of ranking quality. | |
It is used to evaluate Information Retrieval Systems under the following 2 assumptions: | |
1. Highly relevant documents/Labels are more useful when appearing earlier in the results | |
2. Documents/Labels are relevant to different degrees | |
It is defined as the Sum over all relevances of the retrieved documents reduced logarithmically proportional to | |
the position in which they were retrieved. | |
The Normalized DCG (nDCG) divides the resulting value by the best possible value to get a value between | |
0 and 1 s.t. a perfect retrieval achieves a nDCG of 1. | |
# Metric Card for nDCG | |
## Metric Description | |
The Discounted Cumulative Gain is a measure of ranking quality. | |
It is used to evaluate Information Retrieval Systems under the 2 assumptions: | |
1. Highly relevant documents/Labels are more useful when appearing earlier in the results | |
2. Documents/Labels are relevant to different degrees | |
It is defined as the sum over all relevances of the retrieved documents reduced logarithmically proportional to | |
the position in which they were retrieved. | |
The Normalized DCG (nDCG) divides the resulting value by the optimal value that can be achieved to get a value between | |
0 and 1 s.t. a perfect retrieval achieves a nDCG of 1.0 | |
## How to Use | |
At minimum, this metric takes as input two `list`s of `list`s, each containing `float`s: predictions and references. | |
```python | |
import evaluate | |
nDCG_metric = evaluate.load('JP-SystemsX/nDCG') | |
results = nDCG_metric.compute(references=[[0, 1]], predictions=[[0, 1]]) | |
print(results) | |
["{'nDCG@2': 1.0}"] | |
``` | |
### Inputs: | |
**references** (`list` of `float`): True relevance | |
**predictions** (`list` of `float`): Either predicted relevance, probability estimates or confidence values | |
**k** (`int`): If set to a value only the k highest scores in the ranking will be considered, else considers all outputs. | |
Defaults to None. | |
**sample_weight** (`list` of `float`): Sample weights Defaults to None. | |
**ignore_ties** (`boolean`): If set to true, assumes that there are no ties (this is likely if predictions are continuous) | |
for efficiency gains. Defaults to False. | |
### Output: | |
**normalized_discounted_cumulative_gain** (`float`): The averaged nDCG scores for all samples. | |
Minimum possible value is 0.0 Maximum possible value is 1.0 | |
Output Example(s): | |
```python | |
{'nDCG@5': 1.0} | |
{'nDCG': 0.876} | |
``` | |
This metric outputs a dictionary, containing the nDCG score | |
### Examples: | |
Example 1-A simple example | |
>>> nDCG_metric = evaluate.load("JP-SystemsX/nDCG") | |
>>> results = nDCG_metric.compute(references=[[10, 0, 0, 1, 5]], predictions=[[.1, .2, .3, 4, 70]]) | |
>>> print(results) | |
{'nDCG': 0.6956940443813076} | |
Example 2-The same as Example 1, except with k set to 3. | |
>>> nDCG_metric = evaluate.load("JP-SystemsX/nDCG") | |
>>> results = nDCG_metric.compute(references=[[10, 0, 0, 1, 5]], predictions=[[.1, .2, .3, 4, 70]], k=3) | |
>>> print(results) | |
{'nDCG@3': 0.4123818817534531} | |
Example 3-There is only one relevant label, but there is a tie and the model can't decide which one is the one. | |
>>> accuracy_metric = evaluate.load("accuracy") | |
>>> results = nDCG_metric.compute(references=[[1, 0, 0, 0, 0]], predictions=[[1, 1, 0, 0, 0]], k=1) | |
>>> print(results) | |
{'nDCG@1': 0.5} | |
>>> #That is it calculates both and returns the average of both | |
Example 4-The Same as 3, except ignore_ties is set to True. | |
>>> accuracy_metric = evaluate.load("accuracy") | |
>>> results = nDCG_metric.compute(references=[[1, 0, 0, 0, 0]], predictions=[[1, 1, 0, 0, 0]], k=1, ignore_ties=True) | |
>>> print(results) | |
{'nDCG@1': 0.0} | |
>>> # Alternative Result: {'nDCG@1': 1.0} | |
>>> # That is it chooses one of the 2 candidates and calculates the score only for this one | |
>>> # That means the score may vary depending on which one was chosen | |
## Citation(s) | |
```bibtex | |
@article{scikit-learn, | |
title={Scikit-learn: Machine Learning in {P}ython}, | |
author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V. | |
and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. | |
and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and | |
Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.}, | |
journal={Journal of Machine Learning Research}, | |
volume={12}, | |
pages={2825--2830}, | |
year={2011} | |
} |