create model card
Browse files
README.md
ADDED
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- zh
|
4 |
+
- en
|
5 |
+
tags:
|
6 |
+
- chatglm
|
7 |
+
- glm
|
8 |
+
- onnx
|
9 |
+
- onnxruntime
|
10 |
+
---
|
11 |
+
|
12 |
+
# ChatGLM-6B + ONNX
|
13 |
+
|
14 |
+
This model is exported from [ChatGLM-6b](https://huggingface.co/THUDM/chatglm-6b) with int8 quantization and optimized for [ONNXRuntime](https://onnxruntime.ai/) inference.
|
15 |
+
|
16 |
+
Inference code with ONNXRuntime is uploaded with the model. Install requirements and run `streamlit run web-ui.py` to start chatting. Currently the `MatMulInteger` (for u8s8 data type) and `DynamicQuantizeLinear` operators are only supported on CPU.
|
17 |
+
|
18 |
+
安装依赖并运行 `streamlit run web-ui.py` 预览模型效果。由于 ONNXRuntime 算子支持问题,目前仅能够使用 CPU 进行推理。
|
19 |
+
|
20 |
+
Codes are released under MIT license.
|
21 |
+
|
22 |
+
Model weights are released under the same license as ChatGLM-6b, see [MODEL LICENSE](https://huggingface.co/THUDM/chatglm-6b/blob/main/MODEL_LICENSE).
|