TheBloke commited on
Commit
82379e1
1 Parent(s): 8a7dfcb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -34
README.md CHANGED
@@ -1,6 +1,11 @@
1
  ---
2
  inference: false
3
- license: other
 
 
 
 
 
4
  ---
5
 
6
  <!-- header start -->
@@ -21,12 +26,12 @@ license: other
21
 
22
  These files are GPTQ 4bit model files for [HuggingFaceH4's Starchat Beta](https://huggingface.co/HuggingFaceH4/starchat-beta).
23
 
24
- It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
25
 
26
  ## Repositories available
27
 
28
  * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/starchat-beta-GPTQ)
29
- * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/starchat-beta-GGML)
30
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/HuggingFaceH4/starchat-beta)
31
 
32
  ## How to easily download and use this model in text-generation-webui
@@ -58,47 +63,24 @@ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
58
  import argparse
59
 
60
  model_name_or_path = "TheBloke/starchat-beta-GPTQ"
61
- model_basename = "gptq_model-4bit--1g"
62
 
63
  use_triton = False
64
 
65
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
66
 
67
  model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
68
- model_basename=model_basename,
69
  use_safetensors=True,
70
- trust_remote_code=True,
71
  device="cuda:0",
72
  use_triton=use_triton,
73
  quantize_config=None)
74
 
75
- print("\n\n*** Generate:")
76
 
77
- input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
78
- output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
79
- print(tokenizer.decode(output[0]))
80
-
81
- # Inference can also be done using transformers' pipeline
82
-
83
- # Prevent printing spurious transformers error when using pipeline with AutoGPTQ
84
- logging.set_verbosity(logging.CRITICAL)
85
-
86
- prompt = "Tell me about AI"
87
- prompt_template=f'''### Human: {prompt}
88
- ### Assistant:'''
89
-
90
- print("*** Pipeline:")
91
- pipe = pipeline(
92
- "text-generation",
93
- model=model,
94
- tokenizer=tokenizer,
95
- max_new_tokens=512,
96
- temperature=0.7,
97
- top_p=0.95,
98
- repetition_penalty=1.15
99
- )
100
-
101
- print(pipe(prompt_template)[0]['generated_text'])
102
  ```
103
 
104
  ## Provided files
@@ -145,8 +127,6 @@ Thank you to all my generous patrons and donaters!
145
 
146
  # Original model card: HuggingFaceH4's Starchat Beta
147
 
148
-
149
-
150
  <img src="https://huggingface.co/HuggingFaceH4/starchat-beta/resolve/main/model_logo.png" alt="StarChat Beta Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
151
 
152
  # Model Card for StarChat Beta
 
1
  ---
2
  inference: false
3
+ tags:
4
+ - generated_from_trainer
5
+ model-index:
6
+ - name: starchat-beta
7
+ results: []
8
+ license: bigcode-openrail-m
9
  ---
10
 
11
  <!-- header start -->
 
26
 
27
  These files are GPTQ 4bit model files for [HuggingFaceH4's Starchat Beta](https://huggingface.co/HuggingFaceH4/starchat-beta).
28
 
29
+ It is the result of quantising to 4bit using [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ).
30
 
31
  ## Repositories available
32
 
33
  * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/starchat-beta-GPTQ)
34
+ * [4, 5, and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/starchat-beta-GGML)
35
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/HuggingFaceH4/starchat-beta)
36
 
37
  ## How to easily download and use this model in text-generation-webui
 
63
  import argparse
64
 
65
  model_name_or_path = "TheBloke/starchat-beta-GPTQ"
 
66
 
67
  use_triton = False
68
 
69
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
70
 
71
  model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
 
72
  use_safetensors=True,
 
73
  device="cuda:0",
74
  use_triton=use_triton,
75
  quantize_config=None)
76
 
77
+ pipe = pipeline("text-generation", model=model)
78
 
79
+ prompt_template = "<|system|>\n<|end|>\n<|user|>\n{query}<|end|>\n<|assistant|>"
80
+ prompt = prompt_template.format(query="How do I sort a list in Python?")
81
+ # We use a special <|end|> token with ID 49155 to denote ends of a turn
82
+ outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.2, top_k=50, top_p=0.95, eos_token_id=49155)
83
+ # You can sort a list in Python by using the sort() method. Here's an example:\n\n```\nnumbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]\nnumbers.sort()\nprint(numbers)\n```\n\nThis will sort the list in place and print the sorted list.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  ```
85
 
86
  ## Provided files
 
127
 
128
  # Original model card: HuggingFaceH4's Starchat Beta
129
 
 
 
130
  <img src="https://huggingface.co/HuggingFaceH4/starchat-beta/resolve/main/model_logo.png" alt="StarChat Beta Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
131
 
132
  # Model Card for StarChat Beta