Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
- zh
|
5 |
+
license: other
|
6 |
+
tasks:
|
7 |
+
- text-generation
|
8 |
+
---
|
9 |
+
<!-- markdownlint-disable first-line-h1 -->
|
10 |
+
<!-- markdownlint-disable html -->
|
11 |
+
<div align="center">
|
12 |
+
<h1>
|
13 |
+
Baichuan 2 RAG增强 AWQ 量化
|
14 |
+
</h1>
|
15 |
+
</div>
|
16 |
+
|
17 |
+
# <span id="Start">快速开始/Quick Start</span>
|
18 |
+
|
19 |
+
```python
|
20 |
+
from awq import AutoAWQForCausalLM
|
21 |
+
from transformers import AutoTokenizer, TextStreamer
|
22 |
+
import time
|
23 |
+
|
24 |
+
quant_path = "csdc-atl/buffer-baichuan2-13B-rag-awq-int4"
|
25 |
+
# Load model
|
26 |
+
model = AutoAWQForCausalLM.from_quantized(quant_path, fuse_layers=True)
|
27 |
+
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
|
28 |
+
|
29 |
+
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
|
30 |
+
|
31 |
+
prompt_template = """\
|
32 |
+
<s>
|
33 |
+
{context}
|
34 |
+
{question}
|
35 |
+
</s>
|
36 |
+
"""
|
37 |
+
context = '''
|
38 |
+
“温故而知新”有四解:
|
39 |
+
一为“温故才知新”,温习已学的知识,并且由其中获得新的领悟;
|
40 |
+
二为“温故及知新”:一方面要温习典章故事,另一方面又努力撷取新的知识。
|
41 |
+
三为,温故,知新。随着自己阅历的丰富和理解能力的提高,回头再看以前看过的知识,总能从中体会到更多的东西。
|
42 |
+
第四,是指通过回味历史,而可以预见,以及解决未来的问题。这才是一个真正的大师应该具有的能力。
|
43 |
+
合并这四种解法,也许更为完整:在能力范围以内,尽量广泛阅览典籍,反复思考其中的涵义,对已经听闻的知识,也要定期复习,能有心得、有领悟;并且也要尽力吸收新知;如此则进可以开拓人类知识的领域,退也可以为先贤的智能赋予时代的意义。像这样融汇新旧、贯通古今方可称是“温故而知新,可以为师矣”。
|
44 |
+
也有学者以为作“温故及知新”解不太合适,因为按字面上解释,仅做到吸收古今知识而未有领悟心得,只像是知识的买卖者,不足以为师。所以我们就来看看“师”的意义:在论语中师字一共见于14章,其中意义与今日的老师相近者,除本章外还有三章。
|
45 |
+
'''
|
46 |
+
question = '''
|
47 |
+
解释一下‘温故而知新’
|
48 |
+
'''
|
49 |
+
|
50 |
+
start = time.time()
|
51 |
+
tokens = tokenizer(
|
52 |
+
prompt_template.format(context=context, question=question),
|
53 |
+
return_tensors='pt'
|
54 |
+
).input_ids.cuda()
|
55 |
+
|
56 |
+
# Generate output
|
57 |
+
generation_output = model.generate(
|
58 |
+
tokens,
|
59 |
+
streamer=streamer,
|
60 |
+
max_new_tokens=512
|
61 |
+
)
|
62 |
+
end = time.time()
|
63 |
+
elapsed = end-start
|
64 |
+
print('Elapsed time is %f seconds.' % elapsed)
|
65 |
+
|
66 |
+
|
67 |
+
```
|
68 |
+
<hr>
|