Tokens overrides (added_tokens_decoder)
#1
by
dranger003
- opened
Hey there, have you been able to apply the token overrides for tokens 106/107 (i.e. <|im_start|> and <|im_end|>)?
What does it looks like when you print the token IDs from tokenizing the template?
EDIT: I was able to get them mapped and properly decoded but I had to edit convert-hf-to-gguf.py
to a different _set_vocab()
.
If you have a diff or PR I can apply and rerun the quants, I'd appreciate it. I have not followed the token override issue you've mentioned above.
I just put in an issue with the details. I don't think this is non-trivial to fix and if I get some guidance I can craft a PR.