Switch from PreTrainedTokenizerFast to GPT2TokenizerFast and add eos_token & bos_token
#15
by
loubnabnl
HF staff
- opened
PreTrainedTokenizerFast
returns token_type_ids
by default and santacoder is not trained on them so passing model(tokenizer(text))
can result in weird behavior in some cases. We'll use GPT2TokenizerFast
instead.
loubnabnl
changed pull request status to
merged