I think this is actually just 0.1
#1
by
bartowski
- opened
According to the git repo which includes a link to this model's tar download, they added earlier today:
- Important:
mixtral-8x22B-Instruct-v0.3.tar
is exactly the same as Mixtral-8x22B-Instruct-v0.1, only stored in.safetensors
formatmixtral-8x22B-v0.3.tar
is the same as Mixtral-8x22B-v0.1, but has an extended vocabulary of 32756 tokens.
https://github.com/mistralai/mistral-inference?tab=readme-ov-file#model-download
mixtral-8x22B-v0.3.tar is the same as Mixtral-8x22B-v0.1, but has an extended vocabulary of 32756 tokens.
Then it isn't the same, is it?
for the non-instruct, yes
but for instruct, it's completely identical, the original 0.1 model on HF has 32756 tokens in the vocab
yes that's true. But, Mistral gets to decide, if they make a v0.3 that's exactly the same as v0.1. I think we are more like the historians that record such things. Not creators. this v0.3 matches theirs, which also happens to be the same as v0.1. I think it does no harm, and it does some good.