use_cache=False changes behavior

#14

by edmond - opened Jun 22, 2024

Jun 22, 2024

Hello, generating sequences token by token changes the result (it finally gives the right result, the argmax of each new token). use_cache=False also generates the proper result.
I know it was already mentioned here https://github.com/huggingface/transformers/issues/31425,
but I was wondering if anyone already knew a quickfix here, I really need to know how to generate proper sequences without sacrifiying inference speed.

edmond

Jun 25, 2024

•

edited Jun 25, 2024

Its being taken good care of now :
https://github.com/huggingface/transformers/issues/31425#ref-pullrequest-2372177926

edmond changed discussion status to closed Jun 25, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment