Spaces:
Runtime error
Runtime error
File size: 3,184 Bytes
55d46a2 2045faa 55d46a2 2045faa 55d46a2 2045faa 55d46a2 2045faa |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
---
title: CL-KWS_202408_v1
app_file: demo.py
sdk: gradio
sdk_version: 3.34.0
---
### Datasets
* [LibriPhrase]
LibriSpeech corpus : https://www.openslr.org/12
Recipe for LibriPhrase : https://github.com/gusrud1103/LibriPhrase
* [Google Speech Commands]
http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz
http://download.tensorflow.org/data/speech_commands_test_set_v0.02.tar.gz
https://www.tensorflow.org/datasets/catalog/speech_commands
* [Qualcomm Keyword Speech]
https://www.qualcomm.com/developer/software/keyword-speech-dataset
*[noise][musan]
https://www.openslr.org/17/
## Getting started
### Environment
```bash
#python=3.7
conda create --name [name] python=3.7
conda install -c "nvidia/label/cuda-11.6.0" cuda-nvcc
conda install -c conda-forge cudnn=8.2.1.32
pip install -r requirements.txt
pip install numpy==1.18.5
pip install tensorflow-model-optimization==0.6.0
cd /miniconda3/envs/[name]/lib
ln -s libcusolver.so.11 libcusolver.so.10
# export export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/homes/yiting/miniconda3/envs/pho/lib
```
### Training
```bash
python train_guided_CTC.py\
--epoch 23 \
--lr 1e-3 \
--loss_weight 1.0 1.0 0.2\
--audio_input both \
--text_input phoneme \
--comment 'user comments for each experiment'
```
```bash
python train.py \
--epoch 18 \
--lr 1e-3 \
--loss_weight 1.0 1.0 \
--audio_input both \
--text_input phoneme \
--comment 'user comments for each experiment'
```
### Fine-tuning
checkpoint: ./checkpoint_results/checkpoint_guided_ctc/20240725-011006
```bash
python train_guided_ctc_clap.py \
--epoch 5 \
--lr 1e-3 \
--loss_weight 1.0 1.0 0.01 0.01 \
--audio_input both \
--text_input phoneme \
--load_checkpoint_path '/home/DB/checkpoint_results/checkpoint_guided_ctc/date-time' \
--comment 'user comments for each experiment'
```
```bash
python train_CLKWS.py \
--epoch 4 \
--lr 1e-3 \
--loss_weight 1.0 1.0 \
--audio_input both \
--text_input phoneme \
--load_checkpoint_path '/home/DB/checkpoint_results/checkpoint/date-time' \
--comment 'user comments for each experiment'
```
### Inference
keyword list is target_list in google_infe202405.py
```bash
python inference.py --audio_input both --text_input phoneme --load_checkpoint_path 'home/DB/checkpoint_results/checkpoint/20240515-111757'
```
### Demo
checkpoint:checkpoint: ./checkpoint_results/checkpoint_guided_ctc/20240725-011006
./checkpoint_results/checkpoint_gctc_clap/20240725-154258
```bash
python demo.py --audio_input both --text_input phoneme --load_checkpoint_path '/home/DB/checkpoint_results/checkpoint_guided_ctc/20240725-011006' --keyword_list_length 8
```
Demo website :Running on public URL
upload file: MONO, WAV, 256kbps, 22050hz
dataset/dataloader_demo.py : self.maxlen_a = 56000
### Monitoring
```bash
tensorboard --logdir ./log/ --bind_all
```
### Acknownoledge
We acknowledge the following code repositories:
https://github.com/ncsoft/PhonMatchNet |