K024
/

ChatGLM-6b-onnx-u8s8

Model card Files Files and versions Community

K024 commited on Apr 28, 2023

Commit

3f466c9

·

1 Parent(s): d1a642c

create model card

Files changed (1) hide show

README.md +22 -0

README.md ADDED Viewed

	@@ -0,0 +1,22 @@

+---
+language:
+- zh
+- en
+tags:
+- chatglm
+- glm
+- onnx
+- onnxruntime
+---
+# ChatGLM-6B + ONNX
+This model is exported from [ChatGLM-6b](https://huggingface.co/THUDM/chatglm-6b) with int8 quantization and optimized for [ONNXRuntime](https://onnxruntime.ai/) inference.
+Inference code with ONNXRuntime is uploaded with the model. Install requirements and run `streamlit run web-ui.py` to start chatting. Currently the `MatMulInteger` (for u8s8 data type) and `DynamicQuantizeLinear` operators are only supported on CPU.
+安装依赖并运行 `streamlit run web-ui.py` 预览模型效果。由于 ONNXRuntime 算子支持问题，目前仅能够使用 CPU 进行推理。
+Codes are released under MIT license.
+Model weights are released under the same license as ChatGLM-6b, see [MODEL LICENSE](https://huggingface.co/THUDM/chatglm-6b/blob/main/MODEL_LICENSE).