cecilemacaire commited on
Commit
07920fc
1 Parent(s): bca387e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -0
README.md CHANGED
@@ -28,12 +28,50 @@ widget:
28
 
29
  ### Datasets
30
 
 
 
 
 
 
 
 
 
 
31
  ### Parameters
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ### Evaluation
34
 
 
 
35
  ### Results
36
 
 
 
 
 
 
 
 
 
37
  ### Environmental Impact
38
 
39
 
 
28
 
29
  ### Datasets
30
 
31
+ We used the [Propicto-orféo dataset](https://www.ortolang.fr/market/corpora/propicto), which was created from the CEFC-Orféo corpus.
32
+ This dataset was presented in the research paper titled ["A Multimodal French Corpus of Aligned Speech, Text, and Pictogram Sequences for Speech-to-Pictogram Machine Translation](https://aclanthology.org/2024.lrec-main.76/)" at LREC-Coling 2024. The dataset was split into training, validation, and test sets.
33
+
34
+ | **Split** | **Number of utterances** |
35
+ |-----------|:-----------------------:|
36
+ | train | 231,374 |
37
+ | valid | 28,796 |
38
+ | test | 29,009 |
39
+
40
  ### Parameters
41
 
42
+ A full list of the parameters is available in the config.json file. We specified the following arguments in the training pipeline :
43
+
44
+ ```python
45
+ training_args = Seq2SeqTrainingArguments(
46
+ output_dir="checkpoints_orfeo/",
47
+ evaluation_strategy="epoch",
48
+ save_strategy="epoch",
49
+ learning_rate=2e-5,
50
+ per_device_train_batch_size=32,
51
+ per_device_eval_batch_size=32,
52
+ weight_decay=0.01,
53
+ save_total_limit=3,
54
+ num_train_epochs=40,
55
+ predict_with_generate=True,
56
+ fp16=True,
57
+ load_best_model_at_end=True
58
+ )
59
+ ```
60
+
61
  ### Evaluation
62
 
63
+ The model was evaluated with [sacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu/blob/d94719691d29f7adf7151c8b1471de579a78a280/sacrebleu.py), where we compared the reference pictogram translation with the model hypothesis.
64
+
65
  ### Results
66
 
67
+ Comparison to other translation models :
68
+ | **Model** | **validation** | **test** |
69
+ |-----------|:-----------------------:|:-----------------------:|
70
+ | **t2p-t5-large-orféo** | 85.2 | 85.8 |
71
+ | t2p-nmt-orféo | **87.2** | **87.4** |
72
+ | t2p-mbart-large-cc25-orfeo | 75.2 | 75.6 |
73
+ | t2p-nllb-200-distilled-600M-orfeo | 86.3 | 86.9 |
74
+
75
  ### Environmental Impact
76
 
77