Text-to-Speech
speechbrain
English
TTS
speech-synthesis
Tacotron2
Mirco commited on
Commit
56bef66
1 Parent(s): cd2eec6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -70
README.md CHANGED
@@ -19,77 +19,8 @@ metrics:
19
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
20
  <br/><br/>
21
 
22
- # wav2vec 2.0 with CTC/Attention trained on CommonVoice Italian (No LM)
23
 
24
- This repository provides all the necessary tools to perform automatic speech
25
- recognition from an end-to-end system pretrained on CommonVoice (Italian Language) within
26
- SpeechBrain. For a better experience, we encourage you to learn more about
27
- [SpeechBrain](https://speechbrain.github.io).
28
-
29
- The performance of the model is the following:
30
-
31
- | Release | Test WER | GPUs |
32
- |:--------------:|:--------------:| :--------:|
33
- | 03-06-21 | 9.86 | 2xV100 32GB |
34
-
35
- ## Pipeline description
36
-
37
- This ASR system is composed of 2 different but linked blocks:
38
- - Tokenizer (unigram) that transforms words into subword units and trained with
39
- the train transcriptions (train.tsv) of CommonVoice (EN).
40
- - Acoustic model (wav2vec2.0 + CTC/Attention). A pretrained wav2vec 2.0 model ([facebook/wav2vec2-large-it-voxpopuli](https://huggingface.co/facebook/wav2vec2-large-it-voxpopuli)) is combined with two DNN layers and finetuned on CommonVoice En.
41
- The obtained final acoustic representation is given to the CTC and attention decoders.
42
-
43
- The system is trained with recordings sampled at 16kHz (single channel).
44
- The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *transcribe_file* if needed.
45
-
46
- ## Install SpeechBrain
47
-
48
- First of all, please install tranformers and SpeechBrain with the following command:
49
-
50
- ```
51
- pip install speechbrain transformers
52
- ```
53
-
54
- Please notice that we encourage you to read our tutorials and learn more about
55
- [SpeechBrain](https://speechbrain.github.io).
56
-
57
- ### Transcribing your own audio files (in Italian)
58
-
59
- ```python
60
- from speechbrain.pretrained import EncoderDecoderASR
61
-
62
- asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-wav2vec2-commonvoice-it", savedir="pretrained_models/asr-wav2vec2-commonvoice-it")
63
- asr_model.transcribe_file("speechbrain/asr-wav2vec2-commonvoice-it/example-it.wav")
64
-
65
- ```
66
- ### Inference on GPU
67
- To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
68
-
69
- ## Parallel Inference on a Batch
70
- Please, [see this Colab notebook](https://colab.research.google.com/drive/1hX5ZI9S4jHIjahFCZnhwwQmFoGAi3tmu?usp=sharing) to figure out how to transcribe in parallel a batch of input sentences using a pre-trained model.
71
-
72
- ### Training
73
- The model was trained with SpeechBrain.
74
- To train it from scratch follow these steps:
75
- 1. Clone SpeechBrain:
76
- ```bash
77
- git clone https://github.com/speechbrain/speechbrain/
78
- ```
79
- 2. Install it:
80
- ```bash
81
- cd speechbrain
82
- pip install -r requirements.txt
83
- pip install -e .
84
- ```
85
-
86
- 3. Run Training:
87
- ```bash
88
- cd recipes/CommonVoice/ASR/seq2seq
89
- python train_with_wav2vec.py hparams/train_it_with_wav2vec.yaml --data_folder=your_data_folder
90
- ```
91
-
92
- You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1tjz6IZmVRkuRE97E7h1cXFoGTer7pT73?usp=sharing).
93
 
94
  ### Limitations
95
  The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
 
19
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
20
  <br/><br/>
21
 
22
+ # Work-in-Progress
23
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  ### Limitations
26
  The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.