Mel mismatch with faster-whisper
I have a very limited understanding, mainly experimenting. I understood that openai whisper v3 use 128 mel.
When I convert this model from safetensors to ct2 with a py script like this:
import ctranslate2 # type: ignore
from ctranslate2.converters import TransformersConverter # type: ignore
model_name_or_path = "Finnish-NLP/whisper-large-finnish-v3"
output_dir = "ct2/whisper-large-finnish-v3"
converter = TransformersConverter(model_name_or_path)
converter.convert(
output_dir,
quantization="float16"
)
Using the resulting model with faster-whisper fails due to mel mismatch:
ValueError: Invalid input features shape: expected an input with shape (1, 128, 3000), but got an input with shape (1, 80, 3000) instead
Does something go wrong with the conversion or what could be the cause for this.
Edit: seems using the -ct2 model you had available fixed this, so perhaps the converter just needs more parameters if you do it manually.
I think I used the cli version to convert from some of the instructions.
I think it might have been so that we first needed to convert our finetuned one from the Huggingface format to the Original OpenAI format and then to CT2.
But yeah there is the ct2 version available already converted.