silence09's picture
Update README.md
4d99473 verified
metadata
license: apache-2.0
base_model:
  - Qwen/Qwen2.5-7B-Instruct

Converted LLaMA from QWEN2-7B-Instruct

Descritpion

This is a converted model from Qwen2-7B-Instruct to LLaMA format. This conversion allows you to use Qwen2-7B-Instruct as if it were a LLaMA model, which is convenient for some inference use cases. The precision is excatly the same as the original model.

Usage

You can load the model using the LlamaForCausalLM class as shown below:

from transformers import AutoTokenizer, LlamaForCausalLM

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
# we still use the original tokenizer from Qwen2-7B-Instruct
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-7B-Instruct")
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text],return_tensors="pt").cuda()

# Converted LlaMA model
llama_model = LlamaForCausalLM.from_pretrained(
    "silence09/Qwen2-7B-Instruct-Converted-Llama",
    torch_dtype='auto').cuda()
llama_generated_ids = llama_model.generate(model_inputs.input_ids, max_new_tokens=32, do_sample=False)
llama_generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, llama_generated_ids)
]
llama_response = tokenizer.batch_decode(llama_generated_ids, skip_special_tokens=True)[0]
print(llama_response)

Precision Guarantee

To comare result with the original model, you can use this code

More Info

It was converted using the python script available at this repository