WhisperX-Thai
Collection
Whisper-th but 70x speed with WhisperX compatible.
•
3 items
•
Updated
whisper-th-large-ct2 is the CTranslate2 format of biodatlab/whisper-th-large-combined, comparable with WhisperX and faster-whisper, which enables:
!pip install git+https://github.com/m-bain/whisperx.git
import whisperx
import time
# Setting
device = "cuda"
audio_file = "audio.mp3"
batch_size = 16
compute_type = "float16"
"""
Your Hugging Face token for the Diarization model is required.
Additionally, you need to accept the terms and conditions before use.
Please visit the model page here.
https://huggingface.co/pyannote/segmentation-3.0
"""
HF_TOKEN = ""
# load model and transcript
model = whisperx.load_model("Thaweewat/whisper-th-large-ct2", device, compute_type=compute_type)
st_time = time.time()
audio = whisperx.load_audio(audio_file)
result = model.transcribe(audio, batch_size=batch_size)
# Assign speaker labels
diarize_model = whisperx.DiarizationPipeline(use_auth_token=HF_TOKEN, device=device)
diarize_segments = diarize_model(audio)
result = whisperx.assign_word_speakers(diarize_segments, result)
# Combine pure text if needed
combined_text = ' '.join(segment['text'] for segment in result['segments'])
print(f"Response time: {time.time() - st_time} seconds")
print(diarize_segments)
print(result)
print(combined_text)
Base model
biodatlab/whisper-th-large-combined