alibaba-pai
/

Qwen2-1.5B-Instruct-Refine

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Qwen2-1.5B-Instruct-Refine / README.md

Bohr's picture

Update README.md

a09ad91 verified 20 days ago

|

history blame contribute delete

3.15 kB

	<img src="https://cdn-uploads.huggingface.co/production/uploads/62aba5ebab9ed4f63c36b1e2/47PZcc9QTR_okQIvKeOLn.png" alt="image/png" style="transform: scale(1);">


	## 📖 Introduction

	Qwen2-7B-Instruct-Refine and Qwen2-1.5B-Instruct-Refine are two powerful large language models that act as proficient prompt engineers. They can optimize and refine the prompts input by users, and the generated optimized instructions can significantly enhance the LLM's ability to produce better and more informative responses for users.

	We fine-tuned Qwen2-7B-Instruct and Qwen2-1.5B-Instruct to obtain Qwen2-7B-Instruct-Refine and Qwen2-1.5B-Instruct-Refine.
	We sampled the dataset from OpenHermes and the LCCD dataset, ensuring a balanced task distribution. For training set annotations, we used Qwen-max with incorporated our handwritten examples as in-context prompts.

	## 🚀 Quick Start

	Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	device = "cuda" # the device to load the model onto

	model = AutoModelForCausalLM.from_pretrained(
	"alibaba-pai/Qwen2-1.5B-Instruct-Refine",
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("alibaba-pai/Qwen2-1.5B-Instruct-Refine")

	prompt = "Give me a short introduction to large language model."
	messages = [
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(device)

	generated_ids = model.generate(
	model_inputs.input_ids,
	max_new_tokens=2048，
	eos_token_id=151645，
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	```

	## 🔍 Evaluation

	We used single-turn instructions from MT-Bench as input for Qwen2-1.5B-Instruct and Qwen2-7B-Instruct. GPT4-turbo is used to evaluate the changes in the level of detail and truthfulness of responses to our model's revised instructions.

	\| Model \| Detail \| Truthfulness \|
	\|:----------------------------:\|:------:\|:------------:\|
	\| Qwen2-1.5B-Instruct \| 50.00% \| 50.00% \|
	\| + Qwen2-1.5B-Instruct-Refine \| 75.63% \| 63.75% \|
	\| + Qwen2-7B-Instruct-Refine \| 76.56% \| 62.19% \|
	\| Qwen2-7B-Instruct \| 50.00% \| 50.00% \|
	\| + Qwen2-1.5B-Instruct-Refine \| 70.94% \| 57.19% \|
	\| + Qwen2-7B-Instruct-Refine \| 74.69% \| 58.44% \|


	## 📜 Citation

	If you find our work helpful, please cite it!

	```
	@misc{data-augmentation-family,
	title={Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud},
	author={Yuanhao Yue and Chengyu Wang and Jun Huang and Peng Wang},
	year={2024},
	eprint={2412.04871},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2412.04871},
	}
	```