File size: 3,184 Bytes
55d46a2
2045faa
 
55d46a2
2045faa
55d46a2
2045faa
55d46a2
2045faa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
title: CL-KWS_202408_v1
app_file: demo.py
sdk: gradio
sdk_version: 3.34.0
---
### Datasets

* [LibriPhrase]
  LibriSpeech corpus : https://www.openslr.org/12
  Recipe for LibriPhrase : https://github.com/gusrud1103/LibriPhrase
  
* [Google Speech Commands]
  http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz
  http://download.tensorflow.org/data/speech_commands_test_set_v0.02.tar.gz
  https://www.tensorflow.org/datasets/catalog/speech_commands

* [Qualcomm Keyword Speech]
  https://www.qualcomm.com/developer/software/keyword-speech-dataset

*[noise][musan]
  https://www.openslr.org/17/

## Getting started

### Environment

```bash
#python=3.7
conda create --name [name] python=3.7
conda install -c "nvidia/label/cuda-11.6.0" cuda-nvcc
conda install -c conda-forge cudnn=8.2.1.32
pip install -r requirements.txt
pip install numpy==1.18.5
pip install tensorflow-model-optimization==0.6.0
cd /miniconda3/envs/[name]/lib
ln -s libcusolver.so.11 libcusolver.so.10
# export export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/homes/yiting/miniconda3/envs/pho/lib
```

### Training
```bash
python train_guided_CTC.py\
        --epoch 23 \
        --lr 1e-3 \
        --loss_weight 1.0 1.0 0.2\
        --audio_input both \
        --text_input phoneme \
        --comment 'user comments for each experiment'
```

```bash
python train.py  \
        --epoch 18 \
        --lr 1e-3 \
        --loss_weight 1.0 1.0 \
        --audio_input both \
        --text_input phoneme \
        --comment 'user comments for each experiment'
```

### Fine-tuning
checkpoint: ./checkpoint_results/checkpoint_guided_ctc/20240725-011006
```bash
python train_guided_ctc_clap.py \
        --epoch 5 \
        --lr 1e-3 \
        --loss_weight 1.0 1.0 0.01 0.01 \
        --audio_input both \
        --text_input phoneme \
	--load_checkpoint_path '/home/DB/checkpoint_results/checkpoint_guided_ctc/date-time' \
        --comment 'user comments for each experiment'
```

```bash
python train_CLKWS.py \
        --epoch 4 \
        --lr 1e-3 \
        --loss_weight 1.0 1.0 \
        --audio_input both \
        --text_input phoneme \
	      --load_checkpoint_path '/home/DB/checkpoint_results/checkpoint/date-time' \
        --comment 'user comments for each experiment'
```

### Inference
keyword list is target_list in google_infe202405.py

```bash
python inference.py      --audio_input both         --text_input phoneme    --load_checkpoint_path 'home/DB/checkpoint_results/checkpoint/20240515-111757'   
```


### Demo
checkpoint:checkpoint: ./checkpoint_results/checkpoint_guided_ctc/20240725-011006
                       ./checkpoint_results/checkpoint_gctc_clap/20240725-154258

```bash
python demo.py      --audio_input both         --text_input phoneme    --load_checkpoint_path '/home/DB/checkpoint_results/checkpoint_guided_ctc/20240725-011006' --keyword_list_length 8
```

Demo website :Running on public URL
upload file: MONO, WAV, 256kbps, 22050hz
dataset/dataloader_demo.py :  self.maxlen_a = 56000


### Monitoring

```bash
tensorboard --logdir ./log/ --bind_all
```

### Acknownoledge
We acknowledge the following code repositories:
https://github.com/ncsoft/PhonMatchNet