There are doubts about the settings of eos_token, bos_token, pad_token
#7
by
cl-modelcloud
- opened
The three token settings in the tokenizer_config.json
file are as follows,
"eos_token": "<|end_of_text|>",
"bos_token": "<|begin_of_text|>",
"pad_token": "<|end_of_text|>",
but in the config.json file
,
"bos_token_id": 0,
"eos_token_id": 11,
"pad_token_id": 0,
These three token_ids correspond to
"bos_token_id": ">>TITLE<<",
"eos_token_id": "<|end_of_text|>",
"pad_token_id": ">>TITLE<<",
Which setting is correct?
Hi,
Thanks for spotting this ambiguity
It has been corrected now
Gkunsch
changed discussion status to
closed