How to output timestamp?
I use example codes from kotoba-whisper and add generate_kwargs={ "return_timestamps": True } but still get plain rext.
This model performs better than kotoba-whisper-v2.2 and anime-whisper on long time audios without timestamp.
Below are my codes:
import torch
from transformers import pipeline
# config
model_id = "E:/AI/VoiceRecognition/models/whisper-ja-anime-v0.1"
torch_dtype = torch.float16 # torch.float16 if torch.cuda.is_available() else torch.float32
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model_kwargs = {"attn_implementation": "sdpa"} #if torch.cuda.is_available() else {}
generate_kwargs = {
"language": "Japanese",
"no_repeat_ngram_size": 5,
"repetition_penalty": 1.0,
"return_timestamps": True,
}
# load model
pipe = pipeline(
task="automatic-speech-recognition",
model=model_id,
torch_dtype=torch_dtype,
device=device,
model_kwargs=model_kwargs,
batch_size=8,
trust_remote_code=True,
)
# run inference
result = pipe("7.mp3", chunk_length_s=30, generate_kwargs=generate_kwargs) #add_punctuation=True,
print(result)
Put it here not generate_kwargsresult = pipe("7.mp3", chunk_length_s=30, return_timestamps=True, generate_kwargs=generate_kwargs)
Timestamp frequency is lower than openai, I hope to fix this next version. If this is an issue use return_timestamps='word' and make segments manually.
Put it here not generate_kwargs
result = pipe("7.mp3", chunk_length_s=30, return_timestamps=True, generate_kwargs=generate_kwargs)
Timestamp frequency is lower than openai, I hope to fix this next version. If this is an issue use return_timestamps='word' and make segments manually.
thanks for your answer
by the way, are you uploading Japanese corpus to your dataset these days?
Yes for cloud training