File size: 2,290 Bytes

dbae7ad
 
96c483a
 
 
dbae7ad
 
 
 
 
96c483a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dbae7ad
 
c6a00ec
dbae7ad
96c483a
dbae7ad
 
 
 
 
 
 
 
 
 
 
 
 
96c483a
 
dbae7ad
 
 
 
 
 
96c483a

---
pipeline_tag: sentence-similarity
language: fr
datasets:
- stsb_multi_mt
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
license: mit
model-index:
- name: sentence-croissant-llm-base by Wissam Siblini
  results:
  - task:
      name: Sentence-Embedding
      type: Text Similarity
    dataset:
      name: Text Similarity fr
      type: stsb_multi_mt
      args: fr
    metrics:
    - name: Test Pearson correlation coefficient
      type: Pearson_correlation_coefficient
      value: xx.xx
---

# Overview

The model [sentence-croissant-llm-base](https://huggingface.co/Wissam42/sentence-croissant-llm-base) is designed to generate French text embeddings. It has been fine-tuned using the very recent pre-trained LLM [croissantllm/CroissantLLMBase](https://huggingface.co/croissantllm/CroissantLLMBase) with the strategy of Siamese-BERT implemented in the library ['sentences-transformers'](https://www.sbert.net/). The fine tuning dataset used is the French training split of [stsb](https://huggingface.co/datasets/stsb_multi_mt/viewer/fr/train).

## Usage (Sentence-Transformers)

Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:

```
pip install -U sentence-transformers
```

Then you can use the model like this:

```python
from sentence_transformers import SentenceTransformer
model =  SentenceTransformer("Wissam42/sentence-croissant-llm-base")
sentences = ["Le chat mange la souris", "Un felin devore un rongeur", "Je travaille sur un ordinateur", "Je developpe sur mon pc"]
embeddings = model.encode(sentences)
print(embeddings)
```

## Citing & Authors

	@article{faysse2024croissantllm,
        title={CroissantLLM: A Truly Bilingual French-English Language Model},
        author={Faysse, Manuel and Fernandes, Patrick and Guerreiro, Nuno and Loison, Ant{\'o}nio and Alves, Duarte and Corro, Caio and Boizard, Nicolas and Alves, Jo{\~a}o and Rei, Ricardo and Martins, Pedro and others},
        journal={arXiv preprint arXiv:2402.00786},
        year={2024}
    }

	@article{reimers2019sentence,
	   title={Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks},
	   author={Nils Reimers, Iryna Gurevych},
	   journal={https://arxiv.org/abs/1908.10084},
	   year={2019}
	}