abhinavkulkarni
commited on
Commit
•
824da08
1
Parent(s):
9b13248
Update README.md
Browse files
README.md
CHANGED
@@ -25,6 +25,8 @@ Please refer to the AWQ quantization license ([link](https://github.com/llm-awq/
|
|
25 |
|
26 |
This model was successfully tested on CUDA driver v530.30.02 and runtime v11.7 with Python v3.10.11. Please note that AWQ requires NVIDIA GPUs with compute capability of 80 or higher.
|
27 |
|
|
|
|
|
28 |
## How to Use
|
29 |
|
30 |
```bash
|
@@ -61,7 +63,7 @@ q_config = {
|
|
61 |
load_quant = hf_hub_download('abhinavkulkarni/mpt-7b-instruct-w4-g128-awq', 'pytorch_model.bin')
|
62 |
|
63 |
with init_empty_weights():
|
64 |
-
model = AutoModelForCausalLM.
|
65 |
torch_dtype=torch.float16, trust_remote_code=True)
|
66 |
|
67 |
real_quantize_model_weight(model, w_bit=w_bit, q_config=q_config, init_only=True)
|
|
|
25 |
|
26 |
This model was successfully tested on CUDA driver v530.30.02 and runtime v11.7 with Python v3.10.11. Please note that AWQ requires NVIDIA GPUs with compute capability of 80 or higher.
|
27 |
|
28 |
+
For Docker users, the `nvcr.io/nvidia/pytorch:23.06-py3` image is runtime v12.1 but otherwise the same as the configuration above and has also been verified to work.
|
29 |
+
|
30 |
## How to Use
|
31 |
|
32 |
```bash
|
|
|
63 |
load_quant = hf_hub_download('abhinavkulkarni/mpt-7b-instruct-w4-g128-awq', 'pytorch_model.bin')
|
64 |
|
65 |
with init_empty_weights():
|
66 |
+
model = AutoModelForCausalLM.from_config(config=config,
|
67 |
torch_dtype=torch.float16, trust_remote_code=True)
|
68 |
|
69 |
real_quantize_model_weight(model, w_bit=w_bit, q_config=q_config, init_only=True)
|