Update README.md
Browse files
README.md
CHANGED
@@ -15,13 +15,13 @@ tags:
|
|
15 |
|
16 |
# OpenHermes 2.5 Mistral 7B - DeepSparse
|
17 |
|
18 |
-
This repo contains
|
19 |
|
20 |
This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
|
21 |
|
22 |
## Inference
|
23 |
|
24 |
-
Install [DeepSparse LLM](https://github.com/neuralmagic/deepsparse):
|
25 |
```
|
26 |
pip install deepsparse-nightly[llm]
|
27 |
```
|
@@ -52,7 +52,7 @@ That's a difficult question as there are many people who inspire me. However, on
|
|
52 |
|
53 |
## Sparsification
|
54 |
|
55 |
-
|
56 |
|
57 |
```bash
|
58 |
git clone https://github.com/neuralmagic/sparseml
|
@@ -62,6 +62,7 @@ python sparseml/src/sparseml/transformers/sparsification/obcq/export.py --task t
|
|
62 |
cp deployment/model.onnx deployment/model-orig.onnx
|
63 |
```
|
64 |
|
|
|
65 |
```python
|
66 |
import os
|
67 |
import onnx
|
|
|
15 |
|
16 |
# OpenHermes 2.5 Mistral 7B - DeepSparse
|
17 |
|
18 |
+
This repo contains model files for [Teknium's OpenHermes 2.5 Mistral 7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) optimized for [DeepSparse](https://github.com/neuralmagic/deepsparse), a CPU inference runtime for sparse models.
|
19 |
|
20 |
This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
|
21 |
|
22 |
## Inference
|
23 |
|
24 |
+
Install [DeepSparse LLM](https://github.com/neuralmagic/deepsparse) for fast inference on CPUs:
|
25 |
```
|
26 |
pip install deepsparse-nightly[llm]
|
27 |
```
|
|
|
52 |
|
53 |
## Sparsification
|
54 |
|
55 |
+
For details on how this model was sparsified, see the `recipe.yaml` in this repo and follow the instructions below.
|
56 |
|
57 |
```bash
|
58 |
git clone https://github.com/neuralmagic/sparseml
|
|
|
62 |
cp deployment/model.onnx deployment/model-orig.onnx
|
63 |
```
|
64 |
|
65 |
+
Run this kv-cache injection afterwards:
|
66 |
```python
|
67 |
import os
|
68 |
import onnx
|