cecilemacaire
commited on
Commit
•
a63edff
1
Parent(s):
0e1dea6
Update README.md
Browse files
README.md
CHANGED
@@ -23,6 +23,7 @@ widget:
|
|
23 |
# t2p-t5-large-orféo
|
24 |
|
25 |
*t2p-t5-large-orféo* is a text-to-pictograms translation model built by fine-tuning the [t5-large](https://huggingface.co/google-t5/t5-large) model on a dataset of pairs of transcriptions / pictogram token sequence (each token is linked to a pictogram image from [ARASAAC](https://arasaac.org/)).
|
|
|
26 |
|
27 |
## Training details
|
28 |
|
@@ -30,7 +31,6 @@ widget:
|
|
30 |
|
31 |
The [Propicto-orféo dataset](https://www.ortolang.fr/market/corpora/propicto) is used, which was created from the CEFC-Orféo corpus.
|
32 |
This dataset was presented in the research paper titled ["A Multimodal French Corpus of Aligned Speech, Text, and Pictogram Sequences for Speech-to-Pictogram Machine Translation](https://aclanthology.org/2024.lrec-main.76/)" at LREC-Coling 2024. The dataset was split into training, validation, and test sets.
|
33 |
-
|
34 |
| **Split** | **Number of utterances** |
|
35 |
|:-----------:|:-----------------------:|
|
36 |
| train | 231,374 |
|
|
|
23 |
# t2p-t5-large-orféo
|
24 |
|
25 |
*t2p-t5-large-orféo* is a text-to-pictograms translation model built by fine-tuning the [t5-large](https://huggingface.co/google-t5/t5-large) model on a dataset of pairs of transcriptions / pictogram token sequence (each token is linked to a pictogram image from [ARASAAC](https://arasaac.org/)).
|
26 |
+
The model is used only for **inference**.
|
27 |
|
28 |
## Training details
|
29 |
|
|
|
31 |
|
32 |
The [Propicto-orféo dataset](https://www.ortolang.fr/market/corpora/propicto) is used, which was created from the CEFC-Orféo corpus.
|
33 |
This dataset was presented in the research paper titled ["A Multimodal French Corpus of Aligned Speech, Text, and Pictogram Sequences for Speech-to-Pictogram Machine Translation](https://aclanthology.org/2024.lrec-main.76/)" at LREC-Coling 2024. The dataset was split into training, validation, and test sets.
|
|
|
34 |
| **Split** | **Number of utterances** |
|
35 |
|:-----------:|:-----------------------:|
|
36 |
| train | 231,374 |
|