Default eos_token_id=2 is incorrect, needs to be 11
This issue affects all the falcon repositories.
Default eos_token_id=2 is specified here: https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ/blob/main/configuration_RW.py#L41
Looking at https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ/raw/main/tokenizer.json, token=2 is >>INTRODUCTION<<
and we're looking for token=11 <|endoftext|>
If we dont want to update the default (this is upstream code right?), the eos_token_id
parameter can also be correctly passed at generation time:
model.generate(input_ids=tokens, max_new_tokens=512, do_sample=True, eos_token_id=11, temperature=0.8)
This solves the issue of the model output continuing right past <|endoftext|> tokens :D
Oh interesting! I assume you've told them about it too?
If this is materially affecting inference and has been reported upstream then I'll change it.
Would you mind PRing the fix to this repo?