|
--- |
|
pipeline_tag: sentence-similarity |
|
language: fr |
|
datasets: |
|
- stsb_multi_mt |
|
tags: |
|
- sentence-transformers |
|
- feature-extraction |
|
- sentence-similarity |
|
- transformers |
|
license: mit |
|
model-index: |
|
- name: sentence-croissant-llm-base by Wissam Siblini |
|
results: |
|
- task: |
|
name: Sentence-Embedding |
|
type: Text Similarity |
|
dataset: |
|
name: Text Similarity fr |
|
type: stsb_multi_mt |
|
args: fr |
|
metrics: |
|
- name: Test Pearson correlation coefficient |
|
type: Pearson_correlation_coefficient |
|
value: xx.xx |
|
--- |
|
|
|
# Overview |
|
|
|
The model [sentence-croissant-llm-base](https://huggingface.co/Wissam42/sentence-croissant-llm-base) is designed to generate French text embeddings. It has been fine-tuned using the very recent pre-trained LLM [croissantllm/CroissantLLMBase](https://huggingface.co/croissantllm/CroissantLLMBase) with the strategy of Siamese-BERT implemented in the library ['sentences-transformers'](https://www.sbert.net/). The fine tuning dataset used is the French training split of [stsb](https://huggingface.co/datasets/stsb_multi_mt/viewer/fr/train). |
|
|
|
## Usage (Sentence-Transformers) |
|
|
|
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed: |
|
|
|
``` |
|
pip install -U sentence-transformers |
|
``` |
|
|
|
Then you can use the model like this: |
|
|
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
model = SentenceTransformer("Wissam42/sentence-croissant-llm-base") |
|
sentences = ["Le chat mange la souris", "Un felin devore un rongeur", "Je travaille sur un ordinateur", "Je developpe sur mon pc"] |
|
embeddings = model.encode(sentences) |
|
print(embeddings) |
|
``` |
|
|
|
## Citing & Authors |
|
|
|
@article{faysse2024croissantllm, |
|
title={CroissantLLM: A Truly Bilingual French-English Language Model}, |
|
author={Faysse, Manuel and Fernandes, Patrick and Guerreiro, Nuno and Loison, Ant{\'o}nio and Alves, Duarte and Corro, Caio and Boizard, Nicolas and Alves, Jo{\~a}o and Rei, Ricardo and Martins, Pedro and others}, |
|
journal={arXiv preprint arXiv:2402.00786}, |
|
year={2024} |
|
} |
|
|
|
@article{reimers2019sentence, |
|
title={Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks}, |
|
author={Nils Reimers, Iryna Gurevych}, |
|
journal={https://arxiv.org/abs/1908.10084}, |
|
year={2019} |
|
} |