--- title: CL-KWS_202408_v1 app_file: demo.py sdk: gradio sdk_version: 3.34.0 --- ### Datasets * [LibriPhrase] LibriSpeech corpus : https://www.openslr.org/12 Recipe for LibriPhrase : https://github.com/gusrud1103/LibriPhrase * [Google Speech Commands] http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz http://download.tensorflow.org/data/speech_commands_test_set_v0.02.tar.gz https://www.tensorflow.org/datasets/catalog/speech_commands * [Qualcomm Keyword Speech] https://www.qualcomm.com/developer/software/keyword-speech-dataset *[noise][musan] https://www.openslr.org/17/ ## Getting started ### Environment ```bash #python=3.7 conda create --name [name] python=3.7 conda install -c "nvidia/label/cuda-11.6.0" cuda-nvcc conda install -c conda-forge cudnn=8.2.1.32 pip install -r requirements.txt pip install numpy==1.18.5 pip install tensorflow-model-optimization==0.6.0 cd /miniconda3/envs/[name]/lib ln -s libcusolver.so.11 libcusolver.so.10 # export export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/homes/yiting/miniconda3/envs/pho/lib ``` ### Training ```bash python train_guided_CTC.py\ --epoch 23 \ --lr 1e-3 \ --loss_weight 1.0 1.0 0.2\ --audio_input both \ --text_input phoneme \ --comment 'user comments for each experiment' ``` ```bash python train.py \ --epoch 18 \ --lr 1e-3 \ --loss_weight 1.0 1.0 \ --audio_input both \ --text_input phoneme \ --comment 'user comments for each experiment' ``` ### Fine-tuning checkpoint: ./checkpoint_results/checkpoint_guided_ctc/20240725-011006 ```bash python train_guided_ctc_clap.py \ --epoch 5 \ --lr 1e-3 \ --loss_weight 1.0 1.0 0.01 0.01 \ --audio_input both \ --text_input phoneme \ --load_checkpoint_path '/home/DB/checkpoint_results/checkpoint_guided_ctc/date-time' \ --comment 'user comments for each experiment' ``` ```bash python train_CLKWS.py \ --epoch 4 \ --lr 1e-3 \ --loss_weight 1.0 1.0 \ --audio_input both \ --text_input phoneme \ --load_checkpoint_path '/home/DB/checkpoint_results/checkpoint/date-time' \ --comment 'user comments for each experiment' ``` ### Inference keyword list is target_list in google_infe202405.py ```bash python inference.py --audio_input both --text_input phoneme --load_checkpoint_path 'home/DB/checkpoint_results/checkpoint/20240515-111757' ``` ### Demo checkpoint:checkpoint: ./checkpoint_results/checkpoint_guided_ctc/20240725-011006 ./checkpoint_results/checkpoint_gctc_clap/20240725-154258 ```bash python demo.py --audio_input both --text_input phoneme --load_checkpoint_path '/home/DB/checkpoint_results/checkpoint_guided_ctc/20240725-011006' --keyword_list_length 8 ``` Demo website :Running on public URL upload file: MONO, WAV, 256kbps, 22050hz dataset/dataloader_demo.py : self.maxlen_a = 56000 ### Monitoring ```bash tensorboard --logdir ./log/ --bind_all ``` ### Acknownoledge We acknowledge the following code repositories: https://github.com/ncsoft/PhonMatchNet