configs.json error

#1
by RaccoonOnion - opened

In the configs.json of this model, why shows inconsistent structure?

  "architectures": [
    "MistralForCausalLM"
  ], 

And

"model_type": "mistral",

When I load the model through .from_pretrained, I got errors:

You are using a model of type mistral to instantiate a model of type phi3. This is not supported for all configurations of models and can yield errors.

and

Some weights of Phi3ForSFT were not initialized from the model checkpoint at unsloth/Phi-3-medium-4k-instruct-bnb-4bit and are newly initialized: ['model.layers.0.mlp.gate_up_proj.weight', 'model.layers.0.self_attn.qkv_proj.weight', 'model.layers.1.mlp.gate_up_proj.weight', 'model.layers.1.self_attn.qkv_proj.weight', 'model.layers.10.mlp.gate_up_proj.weight', 'model.layers.10.self_attn.qkv_proj.weight', 'model.layers.11.mlp.gate_up_proj.weight', 'model.layers.11.self_attn.qkv_proj.weight', 'model.layers.12.mlp.gate_up_proj.weight', 'model.layers.12.self_attn.qkv_proj.weight', 'model.layers.13.mlp.gate_up_proj.weight', 'model.layers.13.self_attn.qkv_proj.weight', 'model.layers.14.mlp.gate_up_proj.weight', 'model.layers.14.self_attn.qkv_proj.weight', 'model.layers.15.mlp.gate_up_proj.weight', 'model.layers.15.self_attn.qkv_proj.weight', 'model.layers.16.mlp.gate_up_proj.weight', 'model.layers.16.self_attn.qkv_proj.weight', 'model.layers.17.mlp.gate_up_proj.weight', 'model.layers.17.self_attn.qkv_proj.weight', 'model.layers.18.mlp.gate_up_proj.weight', 'model.layers.18.self_attn.qkv_proj.weight', 'model.layers.19.mlp.gate_up_proj.weight', 'model.layers.19.self_attn.qkv_proj.weight', 'model.layers.2.mlp.gate_up_proj.weight', 'model.layers.2.self_attn.qkv_proj.weight', 'model.layers.20.mlp.gate_up_proj.weight', 'model.layers.20.self_attn.qkv_proj.weight', 'model.layers.21.mlp.gate_up_proj.weight', 'model.layers.21.self_attn.qkv_proj.weight', 'model.layers.22.mlp.gate_up_proj.weight', 'model.layers.22.self_attn.qkv_proj.weight', 'model.layers.23.mlp.gate_up_proj.weight', 'model.layers.23.self_attn.qkv_proj.weight', 'model.layers.24.mlp.gate_up_proj.weight', 'model.layers.24.self_attn.qkv_proj.weight', 'model.layers.25.mlp.gate_up_proj.weight', 'model.layers.25.self_attn.qkv_proj.weight', 'model.layers.26.mlp.gate_up_proj.weight', 'model.layers.26.self_attn.qkv_proj.weight', 'model.layers.27.mlp.gate_up_proj.weight', 'model.layers.27.self_attn.qkv_proj.weight', 'model.layers.28.mlp.gate_up_proj.weight', 'model.layers.28.self_attn.qkv_proj.weight', 'model.layers.29.mlp.gate_up_proj.weight', 'model.layers.29.self_attn.qkv_proj.weight', 'model.layers.3.mlp.gate_up_proj.weight', 'model.layers.3.self_attn.qkv_proj.weight', 'model.layers.30.mlp.gate_up_proj.weight', 'model.layers.30.self_attn.qkv_proj.weight', 'model.layers.31.mlp.gate_up_proj.weight', 'model.layers.31.self_attn.qkv_proj.weight', 'model.layers.32.mlp.gate_up_proj.weight', 'model.layers.32.self_attn.qkv_proj.weight', 'model.layers.33.mlp.gate_up_proj.weight', 'model.layers.33.self_attn.qkv_proj.weight', 'model.layers.34.mlp.gate_up_proj.weight', 'model.layers.34.self_attn.qkv_proj.weight', 'model.layers.35.mlp.gate_up_proj.weight', 'model.layers.35.self_attn.qkv_proj.weight', 'model.layers.36.mlp.gate_up_proj.weight', 'model.layers.36.self_attn.qkv_proj.weight', 'model.layers.37.mlp.gate_up_proj.weight', 'model.layers.37.self_attn.qkv_proj.weight', 'model.layers.38.mlp.gate_up_proj.weight', 'model.layers.38.self_attn.qkv_proj.weight', 'model.layers.39.mlp.gate_up_proj.weight', 'model.layers.39.self_attn.qkv_proj.weight', 'model.layers.4.mlp.gate_up_proj.weight', 'model.layers.4.self_attn.qkv_proj.weight', 'model.layers.5.mlp.gate_up_proj.weight', 'model.layers.5.self_attn.qkv_proj.weight', 'model.layers.6.mlp.gate_up_proj.weight', 'model.layers.6.self_attn.qkv_proj.weight', 'model.layers.7.mlp.gate_up_proj.weight', 'model.layers.7.self_attn.qkv_proj.weight', 'model.layers.8.mlp.gate_up_proj.weight', 'model.layers.8.self_attn.qkv_proj.weight', 'model.layers.9.mlp.gate_up_proj.weight', 'model.layers.9.self_attn.qkv_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Note that Phi3ForSFT is just a wrapper over Phi3Model. Any chance wrong models are uploaded to this repo?

Saw issues from another phi3 repo: unsloth "mistralize" the model. Would be helpful if you can put a notice on page for future users not to confuse.

RaccoonOnion changed discussion status to closed

Sign up or log in to comment