Thanks for this model!

thanks for this model, I spent the afternoon working with it.

I've proposed three minor updates to the README:

* Docker compatibility (I was able to confirm it works with 12.1)
* I hit an issue because I build on a different machine then I run on that it tried to target CUDA versions the code doesn't support, being specific about which CUDA versions we target fixes that
* The config for this model hasn't actually got a tokenizer defined, it just has the same name.

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -24,6 +24,8 @@ Please refer to the AWQ quantization license ([link](https://github.com/llm-awq/
 This model was successfully tested on CUDA driver v530.30.02 and runtime v11.7 with Python v3.10.11. Please note that AWQ requires NVIDIA GPUs with compute capability of 80 or higher.
 ## How to Use
 ```bash
@@ -32,6 +34,7 @@ git clone https://github.com/mit-han-lab/llm-awq \
 && git checkout 71d8e68df78de6c0c817b029a568c064bf22132d \
 && pip install -e . \
 && cd awq/kernels \
 && python setup.py install
 ```
@@ -48,7 +51,7 @@ model_name = "tiiuae/falcon-7b-instruct"
 config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
 # Tokenizer
-tokenizer = AutoTokenizer.from_pretrained(config.tokenizer_name)
 # Model
 w_bit = 4

 This model was successfully tested on CUDA driver v530.30.02 and runtime v11.7 with Python v3.10.11. Please note that AWQ requires NVIDIA GPUs with compute capability of 80 or higher.
+For Docker users, the `nvcr.io/nvidia/pytorch:23.06-py3` image is runtime v12.1 but otherwise the same as the configuration above and has also been verified to work.
 ## How to Use
 ```bash
 && git checkout 71d8e68df78de6c0c817b029a568c064bf22132d \
 && pip install -e . \
 && cd awq/kernels \
+&& export TORCH_CUDA_ARCH_LIST='8.0 8.6 8.7 8.9 9.0' \
 && python setup.py install
 ```
 config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
 # Tokenizer
+tokenizer = AutoTokenizer.from_pretrained(model_name)
 # Model
 w_bit = 4