TheBloke
/

starchat-beta-GPTQ

@@ -1,6 +1,11 @@
 ---
 inference: false
-license: other
 ---
 <!-- header start -->
@@ -21,12 +26,12 @@ license: other
 These files are GPTQ 4bit model files for [HuggingFaceH4's Starchat Beta](https://huggingface.co/HuggingFaceH4/starchat-beta).
-It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
 ## Repositories available
 * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/starchat-beta-GPTQ)
-* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/starchat-beta-GGML)
 * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/HuggingFaceH4/starchat-beta)
 ## How to easily download and use this model in text-generation-webui
@@ -58,47 +63,24 @@ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
 import argparse
 model_name_or_path = "TheBloke/starchat-beta-GPTQ"
-model_basename = "gptq_model-4bit--1g"
 use_triton = False
 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
 model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
-        model_basename=model_basename,
         use_safetensors=True,
-        trust_remote_code=True,
         device="cuda:0",
         use_triton=use_triton,
         quantize_config=None)
-print("\n\n*** Generate:")
-input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
-output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
-print(tokenizer.decode(output[0]))
-# Inference can also be done using transformers' pipeline
-# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
-logging.set_verbosity(logging.CRITICAL)
-prompt = "Tell me about AI"
-prompt_template=f'''### Human: {prompt}
-### Assistant:'''
-print("*** Pipeline:")
-pipe = pipeline(
-    "text-generation",
-    model=model,
-    tokenizer=tokenizer,
-    max_new_tokens=512,
-    temperature=0.7,
-    top_p=0.95,
-    repetition_penalty=1.15
-)
-print(pipe(prompt_template)[0]['generated_text'])
 ```
 ## Provided files
@@ -145,8 +127,6 @@ Thank you to all my generous patrons and donaters!
 # Original model card: HuggingFaceH4's Starchat Beta
 <img src="https://huggingface.co/HuggingFaceH4/starchat-beta/resolve/main/model_logo.png" alt="StarChat Beta Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
 # Model Card for StarChat Beta

 ---
 inference: false
+tags:
+- generated_from_trainer
+model-index:
+- name: starchat-beta
+  results: []
+license: bigcode-openrail-m
 ---
 <!-- header start -->
 These files are GPTQ 4bit model files for [HuggingFaceH4's Starchat Beta](https://huggingface.co/HuggingFaceH4/starchat-beta).
+It is the result of quantising to 4bit using [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ).
 ## Repositories available
 * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/starchat-beta-GPTQ)
+* [4, 5, and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/starchat-beta-GGML)
 * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/HuggingFaceH4/starchat-beta)
 ## How to easily download and use this model in text-generation-webui
 import argparse
 model_name_or_path = "TheBloke/starchat-beta-GPTQ"
 use_triton = False
 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
 model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
         use_safetensors=True,
         device="cuda:0",
         use_triton=use_triton,
         quantize_config=None)
+pipe = pipeline("text-generation", model=model)
+prompt_template = "<|system|>\n<|end|>\n<|user|>\n{query}<|end|>\n<|assistant|>"
+prompt = prompt_template.format(query="How do I sort a list in Python?")
+# We use a special <|end|> token with ID 49155 to denote ends of a turn
+outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.2, top_k=50, top_p=0.95, eos_token_id=49155)
+# You can sort a list in Python by using the sort() method. Here's an example:\n\n```\nnumbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]\nnumbers.sort()\nprint(numbers)\n```\n\nThis will sort the list in place and print the sorted list.
 ```
 ## Provided files
 # Original model card: HuggingFaceH4's Starchat Beta
 <img src="https://huggingface.co/HuggingFaceH4/starchat-beta/resolve/main/model_logo.png" alt="StarChat Beta Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
 # Model Card for StarChat Beta