johnBamma commited on
Commit
55c8e68
1 Parent(s): 8e2fd3a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -1
README.md CHANGED
@@ -5,4 +5,55 @@ language:
5
  pipeline_tag: automatic-speech-recognition
6
  tags:
7
  - icefall
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  pipeline_tag: automatic-speech-recognition
6
  tags:
7
  - icefall
8
+ ---
9
+
10
+ See https://github.com/k2-fsa/icefall/pull/1651
11
+
12
+ # icefall-asr-ksponspeech-pruned-transducer-stateless7-streaming-2024-06-12
13
+
14
+ KsponSpeech is a large-scale spontaneous speech corpus of Korean.
15
+ This corpus contains 969 hours of open-domain dialog utterances,
16
+ spoken by about 2,000 native Korean speakers in a clean environment.
17
+
18
+ All data were constructed by recording the dialogue of two people
19
+ freely conversing on a variety of topics and manually transcribing the utterances.
20
+
21
+ The transcription provides a dual transcription consisting of orthography and pronunciation,
22
+ and disfluency tags for spontaneity of speech, such as filler words, repeated words, and word fragments.
23
+
24
+ The original audio data has a pcm extension.
25
+ During preprocessing, it is converted into a file in the flac extension and saved anew.
26
+
27
+ KsponSpeech is publicly available on an open data hub site of the Korea government.
28
+ The dataset must be downloaded manually.
29
+
30
+ For more details, please visit:
31
+
32
+ - Dataset: https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=123
33
+ - Paper: https://www.mdpi.com/2076-3417/10/19/6936
34
+
35
+ ### Streaming Zipformer-Transducer (Pruned Stateless Transducer + Streaming Zipformer)
36
+
37
+ Number of model parameters: 79,022,891, i.e., 79.02 M
38
+
39
+ #### Training on KsponSpeech (with MUSAN)
40
+
41
+ The CERs are:
42
+
43
+ | decoding method | chunk size | eval_clean | eval_other | comment | decoding mode |
44
+ |----------------------|------------|------------|------------|---------------------|----------------------|
45
+ | greedy search | 320ms | 10.21 | 11.07 | --epoch 30 --avg 9 | simulated streaming |
46
+ | greedy search | 320ms | 10.22 | 11.07 | --epoch 30 --avg 9 | chunk-wise |
47
+ | fast beam search | 320ms | 10.21 | 11.04 | --epoch 30 --avg 9 | simulated streaming |
48
+ | fast beam search | 320ms | 10.25 | 11.08 | --epoch 30 --avg 9 | chunk-wise |
49
+ | modified beam search | 320ms | 10.13 | 10.88 | --epoch 30 --avg 9 | simulated streaming |
50
+ | modified beam search | 320ms | 10.1 | 10.93 | --epoch 30 --avg 9 | chunk-size |
51
+ | greedy search | 640ms | 9.94 | 10.82 | --epoch 30 --avg 9 | simulated streaming |
52
+ | greedy search | 640ms | 10.04 | 10.85 | --epoch 30 --avg 9 | chunk-wise |
53
+ | fast beam search | 640ms | 10.01 | 10.81 | --epoch 30 --avg 9 | simulated streaming |
54
+ | fast beam search | 640ms | 10.04 | 10.7 | --epoch 30 --avg 9 | chunk-wise |
55
+ | modified beam search | 640ms | 9.91 | 10.72 | --epoch 30 --avg 9 | simulated streaming |
56
+ | modified beam search | 640ms | 9.92 | 10.72 | --epoch 30 --avg 9 | chunk-size |
57
+
58
+ Note: `simulated streaming` indicates feeding full utterance during decoding using `decode.py`,
59
+ while `chunk-size` indicates feeding certain number of frames at each time using `streaming_decode.py`.