efwkjn/whisper-ja-anime-v0.1 · How to output timestamp?

11 days ago

•

I use example codes from kotoba-whisper and add generate_kwargs={ "return_timestamps": True } but still get plain rext.
This model performs better than kotoba-whisper-v2.2 and anime-whisper on long time audios without timestamp.

Below are my codes:

import torch
from transformers import pipeline

# config
model_id = "E:/AI/VoiceRecognition/models/whisper-ja-anime-v0.1"
torch_dtype = torch.float16 # torch.float16 if torch.cuda.is_available() else torch.float32
device = "cuda:0" if torch.cuda.is_available() else "cpu"

model_kwargs = {"attn_implementation": "sdpa"} #if torch.cuda.is_available() else {}

generate_kwargs = {
    "language": "Japanese",
    "no_repeat_ngram_size": 5,
    "repetition_penalty": 1.0,
    "return_timestamps": True,
}

# load model
pipe = pipeline(
    task="automatic-speech-recognition",
    model=model_id,
    torch_dtype=torch_dtype,
    device=device,
    model_kwargs=model_kwargs,
    batch_size=8,
    trust_remote_code=True,
)

# run inference
result = pipe("7.mp3", chunk_length_s=30, generate_kwargs=generate_kwargs) #add_punctuation=True, 
print(result)

RoadToNowhere changed discussion status to closed 11 days ago

RoadToNowhere changed discussion status to open 11 days ago

efwkjn

Owner 10 days ago

Put it here not generate_kwargs
result = pipe("7.mp3", chunk_length_s=30, return_timestamps=True, generate_kwargs=generate_kwargs)
Timestamp frequency is lower than openai, I hope to fix this next version. If this is an issue use return_timestamps='word' and make segments manually.

RoadToNowhere

9 days ago

Put it here not generate_kwargs
result = pipe("7.mp3", chunk_length_s=30, return_timestamps=True, generate_kwargs=generate_kwargs)
Timestamp frequency is lower than openai, I hope to fix this next version. If this is an issue use return_timestamps='word' and make segments manually.

thanks for your answer
by the way, are you uploading Japanese corpus to your dataset these days?

RoadToNowhere changed discussion status to closed 9 days ago

RoadToNowhere changed discussion status to open 9 days ago

efwkjn

Owner 9 days ago

Yes for cloud training