Propicto
/

t2p-t5-large-orfeo

text2text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

cecilemacaire commited on Jul 4

Commit

07920fc

•

1 Parent(s): bca387e

Update README.md

Files changed (1) hide show

README.md +38 -0

README.md CHANGED Viewed

@@ -28,12 +28,50 @@ widget:
 ### Datasets
 ### Parameters
 ### Evaluation
 ### Results
 ### Environmental Impact

 ### Datasets
+We used the [Propicto-orféo dataset](https://www.ortolang.fr/market/corpora/propicto), which was created from the CEFC-Orféo corpus.
+This dataset was presented in the research paper titled ["A Multimodal French Corpus of Aligned Speech, Text, and Pictogram Sequences for Speech-to-Pictogram Machine Translation](https://aclanthology.org/2024.lrec-main.76/)" at LREC-Coling 2024. The dataset was split into training, validation, and test sets.
+| **Split** | **Number of utterances** |
+|-----------|:-----------------------:|
+| train | 231,374 |
+| valid | 28,796 |
+| test | 29,009 |
 ### Parameters
+A full list of the parameters is available in the config.json file. We specified the following arguments in the training pipeline :
+```python
+training_args = Seq2SeqTrainingArguments(
+    output_dir="checkpoints_orfeo/",
+    evaluation_strategy="epoch",
+    save_strategy="epoch",
+    learning_rate=2e-5,
+    per_device_train_batch_size=32,
+    per_device_eval_batch_size=32,
+    weight_decay=0.01,
+    save_total_limit=3,
+    num_train_epochs=40,
+    predict_with_generate=True,
+    fp16=True,
+    load_best_model_at_end=True
+)
+```
 ### Evaluation
+The model was evaluated with [sacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu/blob/d94719691d29f7adf7151c8b1471de579a78a280/sacrebleu.py), where we compared the reference pictogram translation with the model hypothesis.
 ### Results
+Comparison to other translation models :
+| **Model** | **validation** | **test** |
+|-----------|:-----------------------:|:-----------------------:|
+| **t2p-t5-large-orféo** | 85.2 | 85.8 |
+| t2p-nmt-orféo | **87.2** | **87.4** |
+| t2p-mbart-large-cc25-orfeo | 75.2 | 75.6 |
+| t2p-nllb-200-distilled-600M-orfeo | 86.3 | 86.9 |
 ### Environmental Impact