use_cache=False changes behavior
#14
by
edmond
- opened
Hello, generating sequences token by token changes the result (it finally gives the right result, the argmax of each new token). use_cache=False also generates the proper result.
I know it was already mentioned here https://github.com/huggingface/transformers/issues/31425,
but I was wondering if anyone already knew a quickfix here, I really need to know how to generate proper sequences without sacrifiying inference speed.
Its being taken good care of now :
https://github.com/huggingface/transformers/issues/31425#ref-pullrequest-2372177926
edmond
changed discussion status to
closed