conversion to HF
I'm aware of the script.
How to use it to convert 8x22b is far from self evident.
@ehartford https://huggingface.co/v2ray/Mixtral-8x22B-v0.1/blob/main/convert.py
python convert.py --input-dir /path/to/original --model-size 22B --output-dir /path/to/save
Thanks!
I will do this immediately
max_position_embeddings = params["max_seq_len"]
~~~~~~^^^^^^^^^^^^^^^
It wants "max_seq_len"
I see there isn't one in params.json
{
"dim": 6144,
"n_layers": 56,
"head_dim": 128,
"hidden_dim": 16384,
"n_heads": 48,
"n_kv_heads": 8,
"norm_eps": 1e-05,
"vocab_size": 32768,
"rope_theta": 1000000.0,
"moe": {
"num_experts": 8,
"num_experts_per_tok": 2
}
}
I will try setting it to 32768
I thought it was 64k?
Ok thank you 😊
ok that worked, but didn't create a tokenizer
it came with this file
tokenizer.model.v3
and no tokenizer.config file
ok looks like maybe I need to rename that to tokenizer.model then rerun
@ehartford
I just copied the tokenizer from 8x7B when I did conversion for 8x22B v0.1 since it's the same one.
Wait a minute v0.3?!
nope that didn't do it
oh yeah I could copy the tokenizer from mistral-7b-v0.3
ok I think I got it. Uploading
@ehartford I just copied the tokenizer from 8x7B since it's the same one.
Wait a minute v0.3?!
yeah - they say it's the same but with a new tokenizer
finished uploading mistral-community/mixtral-8x22B-v0.3