Model Card for Model ID

Japanese transcription, testing in progress to see results, main personal use cases are japanese comedy

usage 9GB vram with this Lora

Model Details

Model Description

openai-whisper-large-v2-LORA-ja

Developed by: FZNX
Model type: PEFT LORA
Language(s) (NLP): Fine tune Japanese on whisper common 16
License: [More Information Needed]
Finetuned from model [optional]: Whisper Large V2

How to Get Started with the Model

import torch from transformers import ( AutomaticSpeechRecognitionPipeline, WhisperForConditionalGeneration, WhisperTokenizer, WhisperProcessor, ) from peft import PeftModel, PeftConfig

peft_model_id = "fznx92/openai-whisper-large-v2-ja-transcribe-colab" sample = "insert mp3 file location here"

language = "japanese" task = "transcribe"

peft_config = PeftConfig.from_pretrained(peft_model_id) model = WhisperForConditionalGeneration.from_pretrained( peft_config.base_model_name_or_path, ) model = PeftModel.from_pretrained(model, peft_model_id) model.to("cuda").half()

processor = WhisperProcessor.from_pretrained(peft_config.base_model_name_or_path, language=language, task=task)

pipe = AutomaticSpeechRecognitionPipeline(model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, batch_size=8, torch_dtype=torch.float16, device="cuda:0")

def transcribe(audio, return_timestamps=False): text = pipe(audio, chunk_length_s=30, return_timestamps=return_timestamps, generate_kwargs={"language": language, "task": task})["text"] return text

transcript = transcribe(sample) print(transcript)

Training Data

Common Voice 16 dataset

Training Procedure

via Google Colab T5 @ 6 hours

Evaluation

Framework versions

PEFT 0.7.1

fznx92
/

openai-whisper-large-v2-ja-transcribe-colab