--- license: apache-2.0 language: - zh - en pipeline_tag: text-generation ---

MiniCPM Repo | MiniCPM Paper | MiniCPM-V Repo | Join us in Discord and WeChat

## Introduction MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models. Compared to MiniCPM1.0/MiniCPM2.0, MiniCPM3-4B has a more powerful and versatile skill set to enable more general usage. MiniCPM3-4B supports function call, along with code interpreter. Please refer to [Advanced Features](https://github.com/OpenBMB/MiniCPM/tree/main?tab=readme-ov-file#%E8%BF%9B%E9%98%B6%E5%8A%9F%E8%83%BD) for usage guidelines. MiniCPM3-4B has a 32k context window. Equipped with LLMxMapReduce, MiniCPM3-4B can handle infinite context theoretically, without requiring huge amount of memory. ## Usage ### Inference with Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch path = "openbmb/MiniCPM3-4B-GPTQ-Int4" device = "cuda" tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.float16, device_map=device, trust_remote_code=True) messages = [ {"role": "user", "content": "推荐5个北京的景点。"}, ] model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(device) model_outputs = model.generate( model_inputs, max_new_tokens=1024, top_p=0.7, temperature=0.7 ) output_token_ids = [ model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs)) ] responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0] print(responses) ``` ### Inference with [vLLM](https://github.com/vllm-project/vllm) For now, you need to install our forked version of vLLM. ```bash pip install git+https://github.com/OpenBMB/vllm.git@minicpm3 ``` ```python from transformers import AutoTokenizer from vllm import LLM, SamplingParams model_name = "openbmb/MiniCPM3-4B-GPTQ-Int4" prompt = [{"role": "user", "content": "推荐5个北京的景点。"}] tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True) llm = LLM( model=model_name, trust_remote_code=True, tensor_parallel_size=1, quantization='gptq' ) sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02) outputs = llm.generate(prompts=input_text, sampling_params=sampling_params) print(outputs[0].outputs[0].text) ``` ## Evaluation Results

Benchmark	Qwen2-7B-Instruct	GLM-4-9B-Chat	Gemma2-9B-it	Llama3.1-8B-Instruct	GPT-3.5-Turbo-0125	Phi-3.5-mini-Instruct(3.8B)	MiniCPM3-4B
English
MMLU	70.5	72.4	72.6	69.4	69.2	68.4	67.2
BBH	64.9	76.3	65.2	67.8	70.3	68.6	70.2
MT-Bench	8.41	8.35	7.88	8.28	8.17	8.60	8.41
IFEVAL (Prompt Strict-Acc.)	51.0	64.5	71.9	71.5	58.8	49.4	68.4
Chinese
CMMLU	80.9	71.5	59.5	55.8	54.5	46.9	73.3
CEVAL	77.2	75.6	56.7	55.2	52.8	46.1	73.6
AlignBench v1.1	7.10	6.61	7.10	5.68	5.82	5.73	6.74
FollowBench-zh (SSR)	63.0	56.4	57.0	50.6	64.6	58.1	66.8
Math
MATH	49.6	50.6	46.0	51.9	41.8	46.4	46.6
GSM8K	82.3	79.6	79.7	84.5	76.4	82.7	81.1
MathBench	63.4	59.4	45.8	54.3	48.9	54.9	65.6
Code
HumanEval+	70.1	67.1	61.6	62.8	66.5	68.9	68.3
MBPP+	57.1	62.2	64.3	55.3	71.4	55.8	63.2
LiveCodeBench v3	22.2	20.2	19.2	20.4	24.0	19.6	22.6
Function Call
BFCL v2	71.6	70.1	19.2	73.3	75.4	48.4	76.0
Overall
Average	65.3	65.0	57.9	60.8	61.0	57.2	66.3

## Statement * As a language model, MiniCPM3-4B generates content by learning from a vast amount of text. * However, it does not possess the ability to comprehend or express personal opinions or value judgments. * Any content generated by MiniCPM3-4B does not represent the viewpoints or positions of the model developers. * Therefore, when using content generated by MiniCPM3-4B, users should take full responsibility for evaluating and verifying it on their own. ## LICENSE * This repository is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License. * The usage of MiniCPM3-4B model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md). * The models and weights of MiniCPM3-4B are completely free for academic research. after filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, are also available for free commercial use. ## Citation ``` @article{hu2024minicpm, title={MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies}, author={Hu, Shengding and Tu, Yuge and Han, Xu and He, Chaoqun and Cui, Ganqu and Long, Xiang and Zheng, Zhi and Fang, Yewei and Huang, Yuxiang and Zhao, Weilin and others}, journal={arXiv preprint arXiv:2404.06395}, year={2024} } ```