Spaces:
Runtime error
Runtime error
title: CL-KWS_202408_v1 | |
app_file: demo.py | |
sdk: gradio | |
sdk_version: 3.34.0 | |
### Datasets | |
* [LibriPhrase] | |
LibriSpeech corpus : https://www.openslr.org/12 | |
Recipe for LibriPhrase : https://github.com/gusrud1103/LibriPhrase | |
* [Google Speech Commands] | |
http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz | |
http://download.tensorflow.org/data/speech_commands_test_set_v0.02.tar.gz | |
https://www.tensorflow.org/datasets/catalog/speech_commands | |
* [Qualcomm Keyword Speech] | |
https://www.qualcomm.com/developer/software/keyword-speech-dataset | |
*[noise][musan] | |
https://www.openslr.org/17/ | |
## Getting started | |
### Environment | |
```bash | |
#python=3.7 | |
conda create --name [name] python=3.7 | |
conda install -c "nvidia/label/cuda-11.6.0" cuda-nvcc | |
conda install -c conda-forge cudnn=8.2.1.32 | |
pip install -r requirements.txt | |
pip install numpy==1.18.5 | |
pip install tensorflow-model-optimization==0.6.0 | |
cd /miniconda3/envs/[name]/lib | |
ln -s libcusolver.so.11 libcusolver.so.10 | |
# export export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/homes/yiting/miniconda3/envs/pho/lib | |
``` | |
### Training | |
```bash | |
python train_guided_CTC.py\ | |
--epoch 23 \ | |
--lr 1e-3 \ | |
--loss_weight 1.0 1.0 0.2\ | |
--audio_input both \ | |
--text_input phoneme \ | |
--comment 'user comments for each experiment' | |
``` | |
```bash | |
python train.py \ | |
--epoch 18 \ | |
--lr 1e-3 \ | |
--loss_weight 1.0 1.0 \ | |
--audio_input both \ | |
--text_input phoneme \ | |
--comment 'user comments for each experiment' | |
``` | |
### Fine-tuning | |
checkpoint: ./checkpoint_results/checkpoint_guided_ctc/20240725-011006 | |
```bash | |
python train_guided_ctc_clap.py \ | |
--epoch 5 \ | |
--lr 1e-3 \ | |
--loss_weight 1.0 1.0 0.01 0.01 \ | |
--audio_input both \ | |
--text_input phoneme \ | |
--load_checkpoint_path '/home/DB/checkpoint_results/checkpoint_guided_ctc/date-time' \ | |
--comment 'user comments for each experiment' | |
``` | |
```bash | |
python train_CLKWS.py \ | |
--epoch 4 \ | |
--lr 1e-3 \ | |
--loss_weight 1.0 1.0 \ | |
--audio_input both \ | |
--text_input phoneme \ | |
--load_checkpoint_path '/home/DB/checkpoint_results/checkpoint/date-time' \ | |
--comment 'user comments for each experiment' | |
``` | |
### Inference | |
keyword list is target_list in google_infe202405.py | |
```bash | |
python inference.py --audio_input both --text_input phoneme --load_checkpoint_path 'home/DB/checkpoint_results/checkpoint/20240515-111757' | |
``` | |
### Demo | |
checkpoint:checkpoint: ./checkpoint_results/checkpoint_guided_ctc/20240725-011006 | |
./checkpoint_results/checkpoint_gctc_clap/20240725-154258 | |
```bash | |
python demo.py --audio_input both --text_input phoneme --load_checkpoint_path '/home/DB/checkpoint_results/checkpoint_guided_ctc/20240725-011006' --keyword_list_length 8 | |
``` | |
Demo website :Running on public URL | |
upload file: MONO, WAV, 256kbps, 22050hz | |
dataset/dataloader_demo.py : self.maxlen_a = 56000 | |
### Monitoring | |
```bash | |
tensorboard --logdir ./log/ --bind_all | |
``` | |
### Acknownoledge | |
We acknowledge the following code repositories: | |
https://github.com/ncsoft/PhonMatchNet |