daekeun-ml commited on
Commit
6ab8bd8
·
1 Parent(s): 9933717

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - ko
5
+ metrics:
6
+ - wer
7
+ - cer
8
+ tags:
9
+ - transcribe
10
+ - whisper
11
+ ---
12
+
13
+ # Fine-tune Whisper-small for Korean Speech Recognition sample data (PoC)
14
+
15
+ Fine-tuning was performed using sample voices recorded from this csv data(https://github.com/hyeonsangjeon/job-transcribe/blob/main/meta_voice_data_3922.csv).
16
+ We do not publish sample voices, so if you want to fine-tune yourself from scratch, please record separately or use a public dataset.
17
+
18
+ Fine tuning training based on the guide at https://huggingface.co/blog/fine-tune-whisper
19
+
20
+ ## Training
21
+
22
+ ### Base model
23
+
24
+ OpenAI's `whisper-small` (https://huggingface.co/openai/whisper-small)
25
+
26
+ ### Parameters
27
+ We used heuristic parameters without separate hyperparameter tuning. The sampling rate is set to 16,000Hz.
28
+ - learning_rate = 2e-5
29
+ - epochs = 5
30
+ - gradient_accumulation_steps = 4
31
+ - per_device_train_batch_size = 4
32
+ - fp16 = True
33
+ - gradient_checkpointing = True
34
+ - generation_max_length = 225
35
+
36
+ ## Usage
37
+ You need to install librosa package in order to convert wave to Mel Spectrogram. (`pip install librosa`)
38
+
39
+ ### inference.py
40
+
41
+ ```python
42
+ import librosa
43
+ file = "nlp-voice-3922/data/0002d3428f0ddfa5a48eec5cc351daa8.wav"
44
+ arr, sampling_rate = librosa.load(file, sr=16000)
45
+
46
+ from transformers import WhisperProcessor, WhisperForConditionalGeneration
47
+ from datasets import load_dataset
48
+ import torch
49
+
50
+ # load model and processor
51
+ processor = WhisperProcessor.from_pretrained("openai/whisper-small")
52
+ model = WhisperForConditionalGeneration.from_pretrained("daekeun-ml/whisper-small-ko-finetuned-single-speaker-3922samples")
53
+
54
+ input_features = processor(arr, return_tensors="pt", sampling_rate=sampling_rate).input_features
55
+ forced_decoder_ids = processor.get_decoder_prompt_ids(language="ko", task="transcribe")
56
+ predicted_ids = model.generate(input_features, forced_decoder_ids = forced_decoder_ids)
57
+ transcription = processor.batch_decode(predicted_ids, skip_special_tokens = True)
58
+
59
+ print(transcription)
60
+ ```