How to output timestamp?

#1
by RoadToNowhere - opened

I use example codes from kotoba-whisper and add generate_kwargs={ "return_timestamps": True } but still get plain rext.
This model performs better than kotoba-whisper-v2.2 and anime-whisper on long time audios without timestamp.

Below are my codes:

import torch
from transformers import pipeline

# config
model_id = "E:/AI/VoiceRecognition/models/whisper-ja-anime-v0.1"
torch_dtype = torch.float16 # torch.float16 if torch.cuda.is_available() else torch.float32
device = "cuda:0" if torch.cuda.is_available() else "cpu"

model_kwargs = {"attn_implementation": "sdpa"} #if torch.cuda.is_available() else {}

generate_kwargs = {
    "language": "Japanese",
    "no_repeat_ngram_size": 5,
    "repetition_penalty": 1.0,
    "return_timestamps": True,
}

# load model
pipe = pipeline(
    task="automatic-speech-recognition",
    model=model_id,
    torch_dtype=torch_dtype,
    device=device,
    model_kwargs=model_kwargs,
    batch_size=8,
    trust_remote_code=True,
)

# run inference
result = pipe("7.mp3", chunk_length_s=30, generate_kwargs=generate_kwargs) #add_punctuation=True, 
print(result)
RoadToNowhere changed discussion status to closed
RoadToNowhere changed discussion status to open

Put it here not generate_kwargs
result = pipe("7.mp3", chunk_length_s=30, return_timestamps=True, generate_kwargs=generate_kwargs)
Timestamp frequency is lower than openai, I hope to fix this next version. If this is an issue use return_timestamps='word' and make segments manually.

Put it here not generate_kwargs
result = pipe("7.mp3", chunk_length_s=30, return_timestamps=True, generate_kwargs=generate_kwargs)
Timestamp frequency is lower than openai, I hope to fix this next version. If this is an issue use return_timestamps='word' and make segments manually.

thanks for your answer
by the way, are you uploading Japanese corpus to your dataset these days?

RoadToNowhere changed discussion status to closed
RoadToNowhere changed discussion status to open

Yes for cloud training

Sign up or log in to comment