abhinavkulkarni
/

VMware-open-llama-7b-v2-open-instruct-w4-g128-awq

Text Generation

text-generation-inference

Model card Files Files and versions Community

Abhinav Kulkarni commited on Jul 14, 2023

Commit

8798142

•

1 Parent(s): 684f85d

Updated README

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ Please refer to the AWQ quantization license ([link](https://github.com/llm-awq/
 ## CUDA Version
-This model was successfully tested on CUDA driver v530.30.02 and runtime v11.7 with Python v3.10.11. Please note that AWQ requires NVIDIA GPUs with compute capability of 80 or higher.
 For Docker users, the `nvcr.io/nvidia/pytorch:23.06-py3` image is runtime v12.1 but otherwise the same as the configuration above and has also been verified to work.
@@ -85,7 +85,7 @@ output = model.generate(
     repetition_penalty=1.1,
     eos_token_id=tokenizer.eos_token_id
 )
-print(tokenizer.decode(output[0], skip_special_tokens=True))
 ```
 ## Evaluation

 ## CUDA Version
+This model was successfully tested on CUDA driver v530.30.02 and runtime v11.7 with Python v3.10.11. Please note that AWQ requires NVIDIA GPUs with compute capability of `8.0` or higher.
 For Docker users, the `nvcr.io/nvidia/pytorch:23.06-py3` image is runtime v12.1 but otherwise the same as the configuration above and has also been verified to work.
     repetition_penalty=1.1,
     eos_token_id=tokenizer.eos_token_id
 )
+# print(tokenizer.decode(output[0], skip_special_tokens=True))
 ```
 ## Evaluation