daekeun-ml
/

whisper-small-ko-finetuned-single-speaker-3922samples

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

daekeun-ml commited on Jan 11, 2023

Commit

6ab8bd8

·

1 Parent(s): 9933717

Create README.md

Files changed (1) hide show

README.md +60 -0

README.md ADDED Viewed

	@@ -0,0 +1,60 @@

+---
+license: mit
+language:
+- ko
+metrics:
+- wer
+- cer
+tags:
+- transcribe
+- whisper
+---
+# Fine-tune Whisper-small for Korean Speech Recognition sample data (PoC)
+Fine-tuning was performed using sample voices recorded from this csv data(https://github.com/hyeonsangjeon/job-transcribe/blob/main/meta_voice_data_3922.csv).
+We do not publish sample voices, so if you want to fine-tune yourself from scratch, please record separately or use a public dataset.
+Fine tuning training based on the guide at https://huggingface.co/blog/fine-tune-whisper
+## Training
+### Base model
+OpenAI's `whisper-small` (https://huggingface.co/openai/whisper-small)
+### Parameters
+We used heuristic parameters without separate hyperparameter tuning. The sampling rate is set to 16,000Hz.
+- learning_rate = 2e-5
+- epochs = 5
+- gradient_accumulation_steps = 4
+- per_device_train_batch_size = 4
+- fp16 = True
+- gradient_checkpointing = True
+- generation_max_length = 225
+## Usage
+You need to install librosa package in order to convert wave to Mel Spectrogram. (`pip install librosa`)
+### inference.py
+```python
+import librosa
+file = "nlp-voice-3922/data/0002d3428f0ddfa5a48eec5cc351daa8.wav"
+arr, sampling_rate = librosa.load(file, sr=16000)
+from transformers import WhisperProcessor, WhisperForConditionalGeneration
+from datasets import load_dataset
+import torch
+# load model and processor
+processor = WhisperProcessor.from_pretrained("openai/whisper-small")
+model = WhisperForConditionalGeneration.from_pretrained("daekeun-ml/whisper-small-ko-finetuned-single-speaker-3922samples")
+input_features = processor(arr, return_tensors="pt", sampling_rate=sampling_rate).input_features
+forced_decoder_ids = processor.get_decoder_prompt_ids(language="ko", task="transcribe")
+predicted_ids = model.generate(input_features, forced_decoder_ids = forced_decoder_ids)
+transcription = processor.batch_decode(predicted_ids, skip_special_tokens = True)
+print(transcription)
+```