Update README.md

d99a0cf verified 2 months ago

4.28 kB

	---
	license: apache-2.0
	base_model:
	- internlm/internlm2_5-7b-chat
	---
	# Converted LLaMA from InternLM2.5-7B-Chat

	## Descritpion
	This is a converted model from [InternLM2.5-7B-Chat](https://huggingface.co/internlm/internlm2_5-7b-chat) to __LLaMA__ format. This conversion allows you to use InternLM2.5-7B-Chat as if it were a LLaMA model, which is convenient for some inference use cases. The __precision__ is __excatly the same__ as the original model.

	## Usage
	You can load the model using the `LlamaForCausalLM` class as shown below:
	```python
	from transformers import AutoTokenizer, LlamaForCausalLM
	device = "cpu" # cpu is exacatly the same
	attn_impl = 'eager' # the attention implementation to use
	meta_instruction = ("You are an AI assistant whose name is InternLM (书生·浦语).\n"
	"- InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory "
	"(上海人工智能实验室). It is designed to be helpful, honest, and harmless.\n"
	"- InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such "
	"as English and 中文."
	)
	prompt1 = "介绍下你自己"
	prompt2 = "介绍下上海人工智能实验室"

	def build_inputs(tokenizer, query: str, history: List[Tuple[str, str]] = None, meta_instruction=meta_instruction):
	if history is None:
	history = []
	if tokenizer.add_bos_token:
	prompt = ""
	else:
	prompt = tokenizer.bos_token
	if meta_instruction:
	prompt += f"""<\|im_start\|>system\n{meta_instruction}<\|im_end\|>\n"""
	for record in history:
	prompt += f"""<\|im_start\|>user\n{record[0]}<\|im_end\|>\n<\|im_start\|>assistant\n{record[1]}<\|im_end\|>\n"""
	prompt += f"""<\|im_start\|>user\n{query}<\|im_end\|>\n<\|im_start\|>assistant\n"""
	return tokenizer([prompt], return_tensors="pt")

	@torch.inference_mode()
	def chat(
	model: Union[AutoModelForCausalLM, LlamaForCausalLM],
	tokenizer,
	query: str,
	history: Optional[List[Tuple[str, str]]] = None,
	streamer: Optional[BaseStreamer] = None,
	max_new_tokens: int = 1024,
	do_sample: bool = True,
	temperature: float = 0.8,
	top_p: float = 0.8,
	meta_instruction: str = meta_instruction,
	**kwargs,
	):
	if history is None:
	history = []
	inputs = build_inputs(tokenizer, query, history, meta_instruction)
	inputs = {k: v.to(model.device) for k, v in inputs.items() if torch.is_tensor(v)}
	# also add end-of-assistant token in eos token id to avoid unnecessary generation
	eos_token_id = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids(["<\|im_end\|>"])[0]]
	outputs = model.generate(
	**inputs,
	streamer=streamer,
	max_new_tokens=max_new_tokens,
	do_sample=do_sample,
	temperature=temperature,
	top_p=top_p,
	eos_token_id=eos_token_id,
	**kwargs,
	)
	outputs = outputs[0].cpu().tolist()[len(inputs["input_ids"][0]) :]
	response = tokenizer.decode(outputs, skip_special_tokens=True)
	response = response.split("<\|im_end\|>")[0]
	history = history + [(query, response)]
	return response, history


	# use the official tokenizer
	tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2_5-7b-chat", trust_remote_code=True)
	# use the converted LlaMA model
	llama_model = LlamaForCausalLM.from_pretrained(
	"silence09/InternLM2.5-7B-Chat-Converted-LlaMA",
	torch_dtype='auto',
	attn_implementation=attn_impl).to(device)
	llama_model.eval()
	response_llama_and_splitfunc_1, history = chat(llama_model, tokenizer, prompt1, history=[], do_sample=False)
	print(f"User Input: {prompt1}\nConverted LlaMA Response: {response_llama_and_splitfunc_1}")

	response_llama_and_splitfunc_2, history = chat(llama_model, tokenizer, prompt2, history=history, do_sample=False)
	print(f"User Input: {prompt2}\nConverted LlaMA Response: {response_llama_and_splitfunc_2}")

	```

	## Precision Guarantee
	To comare result with the original model, you can use this [code](https://github.com/silencelamb/naked_llama/blob/main/hf_example/hf_internlm_7b_llama_compare.py)

	## More Info
	It was converted using the python script available at [this repository](https://github.com/silencelamb/naked_llama/blob/main/hf_example/convert_internlm_to_llama_hf.py)