Merged model loading issues?

#78
by adxisme - opened

so when i am load the merged model in colab i am facing this issue:
/usr/local/lib/python3.10/dist-packages/transformers/quantizers/quantizer_bnb_4bit.py in create_quantized_param(self, model, param_value, param_name, target_device, state_dict, unexpected_keys)
189 param_name + ".quant_state.bitsandbytes__nf4" not in state_dict
190 ):
--> 191 raise ValueError(
192 f"Supplied state dict for {param_name} does not contain bitsandbytes__* and possibly other quantized_stats components."
193 )

ValueError: Supplied state dict for model.layers.15.self_attn.k_proj.weight does not contain bitsandbytes__* and possibly other quantized_stats components.

can any one help out in this?

my target layers for lora was [
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
"lm_head",
]

Can you share how you merged it?

Can you share how you merged it?

Assuming you have already trained your model and have the trainer object

adapter_model = trainer.model
merged_model = adapter_model.merge_and_unload()

Retrieve the trained tokenizer

trained_tokenizer = trainer.tokenizer

Define the directory where you want to save the model and tokenizer

save_directory = "/content/merge"

Save the merged model

merged_model.save_pretrained(save_directory)

Save the tokenizer

trained_tokenizer.save_pretrained(save_directory)

this is how i merged it

also at times when i am trying to run inference on the peft adpater i am facing issues like

ValueError: Unrecognized model in /content/drive/MyDrive/final. Should have a model_type key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deformable_detr, deit, depth_anything, deta, detr, dinat, dinov2, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, git, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, graphormer, grounding-dino, groupvit, hiera, hubert, ibert, idefics, idefics2, imagegpt, informer, instructblip, instructblipvideo, jamba, jetmoe, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llava, llava-next-video, llava_next, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, me…

this seems very new error as even yesterday i was able to run inference. my transformers model is 4.42.0.

Can you share what is in your config.json file?

Can you confirm that the filesize of the weights makes sense? If you saved it in 4-bit precision, it should be about 6GB total. If it is 24GB, then it was saved in fp16 or bf16

hey the model safe tensors are around 11gb combined totally. and here is my config json:

{
"_name_or_path": "mistralai/Mistral-Nemo-Instruct-2407",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 14336,
"llm_int8_enable_fp32_cpu_offload": false,
"max_position_embeddings": 1024000,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 40,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"quantization_config": {
"_load_in_4bit": true,
"_load_in_8bit": false,
"bnb_4bit_compute_dtype": "bfloat16",
"bnb_4bit_quant_storage": "uint8",
"bnb_4bit_quant_type": "nf4",
"bnb_4bit_use_double_quant": false,
"llm_int8_enable_fp32_cpu_offload": false,
"llm_int8_has_fp16_weight": false,
"llm_int8_skip_modules": null,
"llm_int8_threshold": 6.0,
"load_in_4bit": true,
"load_in_8bit": false,
"quant_method": "bitsandbytes"
},
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.45.0.dev0",
"use_cache": false,
"vocab_size": 131072
}

thanks for the reply

  1. It isn't recommended to merge at 4-bit because rounding errors can make the results bad
  2. If you have the adapters saved, I would try first loading the base model in 4bit precision, and then adding the trained adapters, and then merging and saving the model.

Example:


from peft import PeftConfig, PeftModel
from transformers import AutoModelForCausalLM
import torch

config = PeftConfig.from_pretrained("smangrul/tinyllama_lora_norobots")
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, load_in_4bit=True, device_map="auto").eval()

model = PeftModel.from_pretrained(model, "smangrul/tinyllama_lora_norobots")

merged_model = model.merge_and_unload()
merged_model.save_pretrained("merged")

Also make sure to upgrade bitsandbytes and transformers to newest versions

  1. It isn't recommended to merge at 4-bit because rounding errors can make the results bad
  2. If you have the adapters saved, I would try first loading the base model in 4bit precision, and then adding the trained adapters, and then merging and saving the model.

Example:


from peft import PeftConfig, PeftModel
from transformers import AutoModelForCausalLM
import torch

config = PeftConfig.from_pretrained("smangrul/tinyllama_lora_norobots")
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, load_in_4bit=True, device_map="auto").eval()

model = PeftModel.from_pretrained(model, "smangrul/tinyllama_lora_norobots")

merged_model = model.merge_and_unload()
merged_model.save_pretrained("merged")

Also make sure to upgrade bitsandbytes and transformers to newest versions

hey thanks for the reply as u said i did load my base model in 4bit version and then merged it with the adapters .
once i did that i was thrown with the below when loading the model:
ValueError: Supplied state dict for model.layers.15.self_attn.k_proj.weight does not contain bitsandbytes__* and possibly other quantized_stats components.

it maked me feel that this particluar version doesnt allow me to add the lora adpaters to the base model as there is a change in parameters.let me know what you think thanks

adxisme changed discussion status to closed

You could open an issue in transformers on github

Sign up or log in to comment