neuralmagic
/

MiniChat-3B-pruned50-quant-ds

Text Generation

Model card Files Files and versions Community

mwitiderrick commited on Dec 5, 2023

Commit

0484553

·

1 Parent(s): 68dc6a4

Update README.md

Files changed (1) hide show

README.md +0 -1

README.md CHANGED Viewed

@@ -70,7 +70,6 @@ pip install -e "sparseml[transformers]"
 python sparseml/src/sparseml/transformers/sparsification/obcq/obcq.py GeneZC/MiniChat-3B open_platypus --recipe recipe.yaml --save True
 python sparseml/src/sparseml/transformers/sparsification/obcq/export.py --task text-generation --model_path obcq_deployment
 cp deployment/model.onnx deployment/model-orig.onnx
-python onnx_kv_inject.py --input-file deployment/model-orig.onnx --output-file deployment/model.onnx
 ```
 Run this kv-cache injection to speed up the model at inference by caching the Key and Value states:
 ```python

 python sparseml/src/sparseml/transformers/sparsification/obcq/obcq.py GeneZC/MiniChat-3B open_platypus --recipe recipe.yaml --save True
 python sparseml/src/sparseml/transformers/sparsification/obcq/export.py --task text-generation --model_path obcq_deployment
 cp deployment/model.onnx deployment/model-orig.onnx
 ```
 Run this kv-cache injection to speed up the model at inference by caching the Key and Value states:
 ```python