munish0838 commited on
Commit
a9a20bd
·
verified ·
1 Parent(s): bfb8b42

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +174 -0
README.md ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ license: other
4
+ base_model: internlm/internlm2-chat-7b-sft
5
+ ---
6
+ # QuantFactory/internlm2-chat-7b-sft-GGUF
7
+ This is quantized version of [internlm/internlm2-chat-7b-sft](https://huggingface.co/internlm/internlm2-chat-7b-sft) created using llama.cpp
8
+
9
+
10
+ # Model Description
11
+
12
+ <div align="center">
13
+
14
+ <img src="https://github.com/InternLM/InternLM/assets/22529082/b9788105-8892-4398-8b47-b513a292378e" width="200"/>
15
+ <div>&nbsp;</div>
16
+ <div align="center">
17
+ <b><font size="5">InternLM</font></b>
18
+ <sup>
19
+ <a href="https://internlm.intern-ai.org.cn/">
20
+ <i><font size="4">HOT</font></i>
21
+ </a>
22
+ </sup>
23
+ <div>&nbsp;</div>
24
+ </div>
25
+
26
+ [![evaluation](https://github.com/InternLM/InternLM/assets/22529082/f80a2a58-5ddf-471a-8da4-32ab65c8fd3b)](https://github.com/internLM/OpenCompass/)
27
+
28
+ </div>
29
+
30
+
31
+ ## Introduction
32
+
33
+ InternLM2 has open-sourced a 7 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics:
34
+
35
+ - **200K Context window**: Nearly perfect at finding needles in the haystack with 200K-long context, with leading performance on long-context tasks like LongBench and L-Eval. Try it with [LMDeploy](https://github.com/InternLM/lmdeploy) for 200K-context inference.
36
+
37
+ - **Outstanding comprehensive performance**: Significantly better than the last generation in all dimensions, especially in reasoning, math, code, chat experience, instruction following, and creative writing, with leading performance among open-source models in similar sizes. In some evaluations, InternLM2-Chat-20B may match or even surpass ChatGPT (GPT-3.5).
38
+
39
+ - **Code interpreter & Data analysis**: With code interpreter, InternLM2-Chat-20B obtains compatible performance with GPT-4 on GSM8K and MATH. InternLM2-Chat also provides data analysis capability.
40
+
41
+ - **Stronger tool use**: Based on better tool utilization-related capabilities in instruction following, tool selection and reflection, InternLM2 can support more kinds of agents and multi-step tool calling for complex tasks. See [examples](https://github.com/InternLM/lagent).
42
+
43
+
44
+ ## InternLM2-Chat-7B-SFT
45
+
46
+ InternLM2-Chat-7B-SFT is the SFT version based on InternLM2-Base, and InternLM2-Chat-7B is further trained from InternLM2-Chat-7B-SFT by Online RLHF.
47
+ We release the SFT version so that the community can study the influence of RLHF deeply.
48
+
49
+ ### Performance Evaluation
50
+
51
+ We conducted a comprehensive evaluation of InternLM2 using the open-source evaluation tool [OpenCompass](https://github.com/internLM/OpenCompass/). The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the [OpenCompass leaderboard](https://opencompass.org.cn/rank) for more evaluation results.
52
+
53
+ | Dataset\Models | InternLM2-7B | InternLM2-Chat-7B | InternLM2-20B | InternLM2-Chat-20B | ChatGPT | GPT-4 |
54
+ | --- | --- | --- | --- | --- | --- | --- |
55
+ | MMLU | 65.8 | 63.7 | 67.7 | 66.5 | 69.1 | 83.0 |
56
+ | AGIEval | 49.9 | 47.2 | 53.0 | 50.3 | 39.9 | 55.1 |
57
+ | BBH | 65.0 | 61.2 | 72.1 | 68.3 | 70.1 | 86.7 |
58
+ | GSM8K | 70.8 | 70.7 | 76.1 | 79.6 | 78.2 | 91.4 |
59
+ | MATH | 20.2 | 23.0 | 25.5 | 31.9 | 28.0 | 45.8 |
60
+ | HumanEval | 43.3 | 59.8 | 48.8 | 67.1 | 73.2 | 74.4 |
61
+ | MBPP(Sanitized) | 51.8 | 51.4 | 63.0 | 65.8 | 78.9 | 79.0 |
62
+
63
+ - The evaluation results were obtained from [OpenCompass](https://github.com/internLM/OpenCompass/) (some data marked with *, which means come from the original papers), and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/internLM/OpenCompass/).
64
+ - The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/internLM/OpenCompass/), so please refer to the latest evaluation results of [OpenCompass](https://github.com/internLM/OpenCompass/).
65
+
66
+
67
+ **Limitations:** Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination, or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the dissemination of harmful information.
68
+
69
+ ### Import from Transformers
70
+ To load the InternLM 7B Chat model using Transformers, use the following code:
71
+
72
+ ```python
73
+ import torch
74
+ from transformers import AutoTokenizer, AutoModelForCausalLM
75
+ tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2-chat-7b-sft", trust_remote_code=True)
76
+ # Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and cause OOM Error.
77
+ model = AutoModelForCausalLM.from_pretrained("internlm/internlm2-chat-7b-sft", torch_dtype=torch.float16, trust_remote_code=True).cuda()
78
+ model = model.eval()
79
+ response, history = model.chat(tokenizer, "hello", history=[])
80
+ print(response)
81
+ # Hello! How can I help you today?
82
+ response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
83
+ print(response)
84
+ ```
85
+
86
+ The responses can be streamed using `stream_chat`:
87
+
88
+ ```python
89
+ import torch
90
+ from transformers import AutoModelForCausalLM, AutoTokenizer
91
+
92
+ model_path = "internlm/internlm2-chat-7b-sft"
93
+ model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True).cuda()
94
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
95
+
96
+ model = model.eval()
97
+ length = 0
98
+ for response, history in model.stream_chat(tokenizer, "Hello", history=[]):
99
+ print(response[length:], flush=True, end="")
100
+ length = len(response)
101
+ ```
102
+
103
+ ## Deployment
104
+
105
+ ### LMDeploy
106
+
107
+ LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
108
+
109
+ ```bash
110
+ pip install lmdeploy
111
+ ```
112
+
113
+ You can run batch inference locally with the following python code:
114
+
115
+ ```python
116
+ import lmdeploy
117
+ pipe = lmdeploy.pipeline("internlm/internlm2-chat-7b-sft")
118
+ response = pipe(["Hi, pls intro yourself", "Shanghai is"])
119
+ print(response)
120
+ ```
121
+
122
+ Or you can launch an OpenAI compatible server with the following command:
123
+
124
+ ```bash
125
+ lmdeploy serve api_server internlm/internlm2-chat-7b-sft --model-name internlm2-chat-7b-sft --server-port 23333
126
+ ```
127
+
128
+ Then you can send a chat request to the server:
129
+
130
+ ```bash
131
+ curl http://localhost:23333/v1/chat/completions \
132
+ -H "Content-Type: application/json" \
133
+ -d '{
134
+ "model": "internlm2-chat-7b-sft",
135
+ "messages": [
136
+ {"role": "system", "content": "You are a helpful assistant."},
137
+ {"role": "user", "content": "Introduce deep learning to me."}
138
+ ]
139
+ }'
140
+ ```
141
+
142
+ Find more details in the [LMDeploy documentation](https://lmdeploy.readthedocs.io/en/latest/)
143
+
144
+ ### vLLM
145
+
146
+ Launch OpenAI compatible server with `vLLM>=0.3.2`:
147
+
148
+ ```bash
149
+ pip install vllm
150
+ ```
151
+
152
+ ```bash
153
+ python -m vllm.entrypoints.openai.api_server --model internlm/internlm2-chat-7b-sft --served-model-name internlm2-chat-7b-sft --trust-remote-code
154
+ ```
155
+
156
+ Then you can send a chat request to the server:
157
+
158
+ ```bash
159
+ curl http://localhost:8000/v1/chat/completions \
160
+ -H "Content-Type: application/json" \
161
+ -d '{
162
+ "model": "internlm2-chat-7b-sft",
163
+ "messages": [
164
+ {"role": "system", "content": "You are a helpful assistant."},
165
+ {"role": "user", "content": "Introduce deep learning to me."}
166
+ ]
167
+ }'
168
+ ```
169
+
170
+ Find more details in the [vLLM documentation](https://docs.vllm.ai/en/latest/index.html)
171
+
172
+ ## Open Source License
173
+
174
+ The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow **free** commercial usage. To apply for a commercial license, please fill in the [application form (English)](https://wj.qq.com/s2/12727483/5dba/)/[申请表(中文)](https://wj.qq.com/s2/12725412/f7c1/). For other questions or collaborations, please contact <[email protected]>.