Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,3 +1,32 @@
 ---
 license: mit
 ---

 ---
+language: fr
 license: mit
+tags:
+- bert
+- language-model
+- flaubert
+- french
+- flaubert-base
+- uncased
+- asr
+- speech
+- oral
+- natural language understanding
+- NLU
+- spoken language understanding
+- SLU
+- understanding
 ---
+# FlauBERT-Oral models: Using ASR-Generated Text for Spoken Language Modeling
+**FlauBERT-Oral** are French BERT models trained on a very large amount of automatically transcribed speech from 350,000 hours of diverse French TV shows. They were trained with the [**FlauBERT software**](https://github.com/getalp/Flaubert) using the same parameters as the [flaubert-base-uncased](https://huggingface.co/flaubert/flaubert_base_uncased) model (12 layers, 12 attention heads, 768 dims, 137M parameters, uncased).
+## Available FlauBERT-Oral models
+- `flaubert-oral-asr` : trained from scratch on ASR data, keeping the BPE tokenizer and vocabulary of flaubert-base-uncased
+- `flaubert-oral-asr_nb` : trained from scratch on ASR data, BPE tokenizer is also trained on the same corpus
+- `flaubert-oral-mixed` : trained from scratch on a mixed corpus of ASR and text data, BPE tokenizer is also trained on the same corpus
+- `flaubert-oral-ft` : fine-tuning of flaubert-base-uncased for a few epochs on ASR data