Spaces:

ZurichNLP
/

unsupervised-semantic-diff

Sleeping

App Files Files Community

unsupervised-semantic-diff / description.md

jvamvas

Update citation

3f0c0dd about 1 year ago

preview code

raw

history blame contribute delete

1.81 kB

	## Small print

	<p style="background-color: #fff9f9; border: 1px solid #ff0000; padding: 10px;">
	Warning: This demo is highly experimental and not ready for production use.
	</p>

	This demo is a proof of concept for visualizing the semantic differences between two text documents.
	The input documents may or may not be written in the same language.

	In our paper, we evaluate three simple, unsupervised approaches based on BERT-like encoder models.
	This demo implements the approaches `DiffAlign` and `DiffDel` using the model [ZurichNLP/unsup-simcse-xlm-roberta-base](https://huggingface.co/ZurichNLP/unsup-simcse-xlm-roberta-base). See the model tags for a list of the ~100 supported languages.

	- `DiffAlign` aligns the words of the two documents using cosine similarity between the word embeddings (cf. [SimAlign](http://dx.doi.org/10.18653/v1/2020.findings-emnlp.147), [BERTScore](https://openreview.net/forum?id=SkeHuCVFDr)). Words with low similarity are highlighted.
	- `DiffDel` calculates sentence similarity between the two input documents (cf. [SimCSE](http://dx.doi.org/10.18653/v1/2021.emnlp-main.552)). The algorithm highlights words whose deletion has a positive effect on the similarity score.

	More resources:
	- Paper: https://arxiv.org/abs/2305.13303
	- Code: https://github.com/ZurichNLP/recognizing-semantic-differences

	## Citation
	```bibtex
	@inproceedings{vamvas-sennrich-2023-rsd,
	title={Towards Unsupervised Recognition of Token-level Semantic Differences in Related Documents},
	author={Jannis Vamvas and Rico Sennrich},
	month = dec,
	year = "2023",
	booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
	address = "Singapore",
	publisher = "Association for Computational Linguistics",
	}
	```