The size of tensor a (449) must match the size of tensor b (448) at non-singleton dimension 1

#18

by Mr1gh - opened May 10, 2023

May 10, 2023

for wav more than 6 s this problem occurs, when I search I get that "Whisper decoder uses a learned position embedding which has the max length of 448 tokens. Therefore it cannot decode any transcription of more than 448 label ids." is that mean that whisper can be trained on only fixed max length of tokens, and it can't be changed?

SabaKhupenia

Nov 28, 2024

hi did you find any answer?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment