Configuration Parsing
Warning:
In adapter_config.json: "peft.task_type" must be a string
Model Details
Model Description
Whisper large-v3 trained on common-voice-13 Hindi dataset using LoRA
Model Sources
- Base Repository: openai/whisper-large-v3
Uses
- Automatic Speech Recognition (ASR)
Direct Use
from peft import PeftModel, PeftConfig
from transformers import WhisperForConditionalGeneration, WhisperProcessor
peft_model_id = "kasunw/whisper-large-v3-hindi"
peft_config = PeftConfig.from_pretrained(peft_model_id)
model = WhisperForConditionalGeneration.from_pretrained(
peft_config.base_model_name_or_path, device_map="auto", torch_dtype=torch.float16
)
model = PeftModel.from_pretrained(model, peft_model_id)
model.config.use_cache = True
processor = WhisperProcessor.from_pretrained(peft_config.base_model_name_or_path, language="Hindi", task="transcribe")
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=128,
chunk_length_s=30,
batch_size=16,
return_timestamps=True,
torch_dtype=torch_dtype,
device=model.device,
)
path_to_audio = "audio.mp3"
result = pipe(path_to_audio)
print(result["text"])
Training Details
Training Data
common-voice-13.0 Hindi Portion
Training Procedure
Followed the instruction given in this notebook
Training Hyperparameters
- per_device_train_batch_size=16
- gradient_accumulation_steps=1
- learning_rate=1e-5
- warmup_steps=50
- fp16=True
- max_steps=1000
Metrics
- word error rate (WER)
- Downloads last month
- 2
Inference API (serverless) does not yet support peft models for this pipeline type.