abhinavkulkarni
/

mosaicml-mpt-7b-chat-w4-g128-awq

@@ -11,20 +11,6 @@ inference: false
 This model is a 4-bit 128 group size AWQ quantized model. For more information about AWQ quantization, please click [here](https://github.com/mit-han-lab/llm-awq).
-## Model Date
----
-license: cc-by-sa-3.0
-tags:
-- MosaicML
-- AWQ
-inference: false
----
-# MPT-7B-Chat (4-bit 128g AWQ Quantized)
-[MPT-7B-Chat](https://huggingface.co/mosaicml/mpt-7b-chat) is a chatbot-like model for dialogue generation.
-This model is a 4-bit 128 group size AWQ quantized model. For more information about AWQ quantization, please click [here](https://github.com/mit-han-lab/llm-awq).
 ## Model Date
 July 5, 2023
@@ -47,7 +33,7 @@ git clone https://github.com/mit-han-lab/llm-awq \
 && git checkout 71d8e68df78de6c0c817b029a568c064bf22132d \
 && pip install -e . \
 && cd awq/kernels \
-python setup.py install
 ```
 ```python
@@ -120,123 +106,6 @@ This evaluation was done using [LM-Eval](https://github.com/EleutherAI/lm-evalua
 |        |       |bits_per_byte  | 0.7138|   |      |
-## Acknowledgements
-The MPT model was originally finetuned by Sam Havens and the MosaicML NLP team. Please cite this model using the following format:
-```
-@online{MosaicML2023Introducing,
-    author    = {MosaicML NLP Team},
-    title     = {Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs},
-    year      = {2023},
-    url       = {www.mosaicml.com/blog/mpt-7b},
-    note      = {Accessed: 2023-03-28}, % change this date
-    urldate   = {2023-03-28} % change this date
-}
-```
-The model was quantized with AWQ technique. If you find AWQ useful or relevant to your research, please kindly cite the paper:
-```
-@article{lin2023awq,
-  title={AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration},
-  author={Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Dang, Xingyu and Han, Song},
-  journal={arXiv},
-  year={2023}
-}
-```
-July 5, 2023
-## Model License
-Please refer to original MPT model license ([link](https://huggingface.co/mosaicml/mpt-7b-chat)).
-Please refer to the AWQ quantization license ([link](https://github.com/llm-awq/blob/main/LICENSE)).
-## CUDA Version
-This model was successfully tested on CUDA driver v12.1 and toolkit v11.7 with Python v3.10.11.
-## How to Use
-```bash
-git clone https://github.com/mit-han-lab/llm-awq \
-&& cd llm-awq \
-&& git checkout 71d8e68df78de6c0c817b029a568c064bf22132d \
-&& pip install -e .
-```
-```python
-import torch
-from awq.quantize.quantizer import real_quantize_model_weight
-from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer
-from accelerate import init_empty_weights, load_checkpoint_and_dispatch
-from huggingface_hub import hf_hub_download
-model_name = "mosaicml/mpt-7b-chat"
-# Config
-config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
-# Tokenizer
-tokenizer = AutoTokenizer.from_pretrained(config.tokenizer_name)
-# Model
-w_bit = 4
-q_config = {
-    "zero_point": True,
-    "q_group_size": 128,
-}
-load_quant = hf_hub_download('abhinavkulkarni/mpt-7b-chat-w4-g128-awq', 'pytorch_model.bin')
-with init_empty_weights():
-    model = AutoModelForCausalLM.from_pretrained(model_name, config=config,
-                                                 torch_dtype=torch.float16, trust_remote_code=True)
-real_quantize_model_weight(model, w_bit=w_bit, q_config=q_config, init_only=True)
-model = load_checkpoint_and_dispatch(model, load_quant, device_map="balanced")
-# Inference
-prompt = f'''What is the difference between nuclear fusion and fission?
-###Response:'''
-input_ids = tokenizer(prompt, return_tensors='pt').input_ids.cuda()
-output = model.generate(
-    inputs=input_ids,
-    temperature=0.7,
-    max_new_tokens=512,
-    top_p=0.15,
-    top_k=0,
-    repetition_penalty=1.1,
-    eos_token_id=tokenizer.eos_token_id
-)
-print(tokenizer.decode(output[0]))
-```
-## Evaluation
-This evaluation was done using [LM-Eval](https://github.com/EleutherAI/lm-evaluation-harness).
-[MPT-7B-Chat](https://huggingface.co/mosaicml/mpt-7b-chat)
-|  Task  |Version|    Metric     | Value |   |Stderr|
-|--------|------:|---------------|------:|---|------|
-|wikitext|      1|word_perplexity|13.5936|   |      |
-|        |       |byte_perplexity| 1.6291|   |      |
-|        |       |bits_per_byte  | 0.7040|   |      |
-[MPT-7B-Chat (4-bit 128-group AWQ)](https://huggingface.co/abhinavkulkarni/mpt-7b-chat-w4-g128-awq)
-|  Task  |Version|    Metric     | Value |   |Stderr|
-|--------|------:|---------------|------:|---|------|
-|wikitext|      1|word_perplexity|14.0922|   |      |
-|        |       |byte_perplexity| 1.6401|   |      |
-|        |       |bits_per_byte  | 0.7138|   |      |
 ## Acknowledgements
 The MPT model was originally finetuned by Sam Havens and the MosaicML NLP team. Please cite this model using the following format:

 This model is a 4-bit 128 group size AWQ quantized model. For more information about AWQ quantization, please click [here](https://github.com/mit-han-lab/llm-awq).
 ## Model Date
 July 5, 2023
 && git checkout 71d8e68df78de6c0c817b029a568c064bf22132d \
 && pip install -e . \
 && cd awq/kernels \
+&& python setup.py install
 ```
 ```python
 |        |       |bits_per_byte  | 0.7138|   |      |
 ## Acknowledgements
 The MPT model was originally finetuned by Sam Havens and the MosaicML NLP team. Please cite this model using the following format: