nherve commited on
Commit
c853a3e
1 Parent(s): f134806

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md CHANGED
@@ -1,3 +1,32 @@
1
  ---
 
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: fr
3
  license: mit
4
+ tags:
5
+ - bert
6
+ - language-model
7
+ - flaubert
8
+ - french
9
+ - flaubert-base
10
+ - uncased
11
+ - asr
12
+ - speech
13
+ - oral
14
+ - natural language understanding
15
+ - NLU
16
+ - spoken language understanding
17
+ - SLU
18
+ - understanding
19
  ---
20
+
21
+ # FlauBERT-Oral models: Using ASR-Generated Text for Spoken Language Modeling
22
+
23
+ **FlauBERT-Oral** are French BERT models trained on a very large amount of automatically transcribed speech from 350,000 hours of diverse French TV shows. They were trained with the [**FlauBERT software**](https://github.com/getalp/Flaubert) using the same parameters as the [flaubert-base-uncased](https://huggingface.co/flaubert/flaubert_base_uncased) model (12 layers, 12 attention heads, 768 dims, 137M parameters, uncased).
24
+
25
+ ## Available FlauBERT-Oral models
26
+
27
+ - `flaubert-oral-asr` : trained from scratch on ASR data, keeping the BPE tokenizer and vocabulary of flaubert-base-uncased
28
+ - `flaubert-oral-asr_nb` : trained from scratch on ASR data, BPE tokenizer is also trained on the same corpus
29
+ - `flaubert-oral-mixed` : trained from scratch on a mixed corpus of ASR and text data, BPE tokenizer is also trained on the same corpus
30
+ - `flaubert-oral-ft` : fine-tuning of flaubert-base-uncased for a few epochs on ASR data
31
+
32
+