"Token indices sequence length is longer than the specified maximum sequence length" when using HHEM-2.1-open

#13

by ytangccc - opened Aug 2, 2024

Aug 2, 2024

I see the message "Token indices sequence length is longer than the specified maximum sequence length for this model (624 > 512). Running this sequence through the model will result in indexing errors" when using HHEM-2.1-open. It still ran through but I'm just wondering since HHEM-2.1 should have an unlimited context length?

forrest-vectara

Vectara org Aug 2, 2024

Don't worry about it. This is a notification inherited from the foundation, T5-base.

maverick84

Aug 19, 2024

Is there a restriction on using a different tokenizer?

forrest-vectara

Vectara org Aug 19, 2024

•

edited Aug 19, 2024

You have to use the same tokenizer that Google T5 uses. Otherwise, tokens will be mapped to different integer indexes and then mapped to wrong token embeddings. This is a limitation to any Transformer-based models or any model that relies on an embedding layer.

forrest-vectara changed discussion status to closed Oct 7, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment