|
## Small print |
|
|
|
<p style="background-color: #fff9f9; border: 1px solid #ff0000; padding: 10px;"> |
|
Warning: This demo is highly experimental and not ready for production use. |
|
</p> |
|
|
|
This demo is a proof of concept for visualizing the semantic differences between two text documents. |
|
The input documents may or may not be written in the same language. |
|
|
|
In our paper, we evaluate three simple, unsupervised approaches based on BERT-like encoder models. |
|
This demo implements the approaches `DiffAlign` and `DiffDel` using the model [ZurichNLP/unsup-simcse-xlm-roberta-base](https://huggingface.co/ZurichNLP/unsup-simcse-xlm-roberta-base). See the model tags for a list of the ~100 supported languages. |
|
|
|
- `DiffAlign` aligns the words of the two documents using cosine similarity between the word embeddings (cf. [SimAlign](http://dx.doi.org/10.18653/v1/2020.findings-emnlp.147), [BERTScore](https://openreview.net/forum?id=SkeHuCVFDr)). Words with low similarity are highlighted. |
|
- `DiffDel` calculates sentence similarity between the two input documents (cf. [SimCSE](http://dx.doi.org/10.18653/v1/2021.emnlp-main.552)). The algorithm highlights words whose deletion has a positive effect on the similarity score. |
|
|
|
More resources: |
|
- Paper: https://arxiv.org/abs/2305.13303 |
|
- Code: https://github.com/ZurichNLP/recognizing-semantic-differences |
|
|
|
## Citation |
|
```bibtex |
|
@inproceedings{vamvas-sennrich-2023-rsd, |
|
title={Towards Unsupervised Recognition of Token-level Semantic Differences in Related Documents}, |
|
author={Jannis Vamvas and Rico Sennrich}, |
|
month = dec, |
|
year = "2023", |
|
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing", |
|
address = "Singapore", |
|
publisher = "Association for Computational Linguistics", |
|
} |
|
``` |
|
|