Spaces:
Paused
Paused
title: Test ParaScore | |
emoji: 🤗 | |
colorFrom: blue | |
colorTo: red | |
sdk: gradio | |
sdk_version: 3.0.2 | |
app_file: app.py | |
pinned: false | |
tags: | |
- evaluate | |
- metric | |
description: >- | |
ParaScore is a new metric to scoring the performance of paraphrase generation tasks | |
See the project at https://github.com/shadowkiller33/ParaScore for more information. | |
# Metric Card for ParaScore | |
## Metric description | |
ParaScore is a new metric to scoring the performance of paraphrase generation tasks | |
## How to use | |
```python | |
from evaluate import load | |
bertscore = load("transZ/test_parascore") | |
predictions = ["hello there", "general kenobi"] | |
references = ["hello there", "general kenobi"] | |
results = bertscore.compute(predictions=predictions, references=references, lang="en") | |
``` | |
## Output values | |
ParaScore outputs a dictionary with the following values: | |
`score`: Range from 0.0 to 1.0 | |
## Limitations and bias | |
The [original ParaScore paper](https://arxiv.org/abs/2202.08479) showed that ParaScore correlates well with human judgment on sentence-level and system-level evaluation, but this depends on the model and language pair selected. | |
## Citation | |
```bibtex | |
@article{Shen2022, | |
archivePrefix = {arXiv}, | |
arxivId = {2202.08479}, | |
author = {Shen, Lingfeng and Liu, Lemao and Jiang, Haiyun and Shi, Shuming}, | |
journal = {EMNLP 2022 - 2022 Conference on Empirical Methods in Natural Language Processing, Proceedings}, | |
eprint = {2202.08479}, | |
month = {feb}, | |
number = {1}, | |
pages = {3178--3190}, | |
title = {{On the Evaluation Metrics for Paraphrase Generation}}, | |
url = {http://arxiv.org/abs/2202.08479}, | |
year = {2022} | |
} | |
``` | |
## Further References | |
- [Offcial implementation](https://github.com/shadowkiller33/parascore_toolkit) |