TheBloke commited on
Commit
d3d75e1
·
1 Parent(s): 43cbb91

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -32
README.md CHANGED
@@ -271,13 +271,20 @@ extra_gated_fields:
271
 
272
  These files are GPTQ 4bit model files for [Bigcode's Starcoder](https://huggingface.co/bigcode/starcoder).
273
 
274
- It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
275
 
276
  ## Repositories available
277
 
278
  * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/starcoder-GPTQ)
279
- * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/none)
280
- * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/bigcode/starcoder)
 
 
 
 
 
 
 
281
 
282
  ## How to easily download and use this model in text-generation-webui
283
 
@@ -308,7 +315,6 @@ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
308
  import argparse
309
 
310
  model_name_or_path = "TheBloke/starcoder-GPTQ"
311
- model_basename = "gptq_model-4bit--1g"
312
 
313
  use_triton = False
314
 
@@ -322,33 +328,9 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
322
  use_triton=use_triton,
323
  quantize_config=None)
324
 
325
- print("\n\n*** Generate:")
326
-
327
- input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
328
- output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
329
- print(tokenizer.decode(output[0]))
330
-
331
- # Inference can also be done using transformers' pipeline
332
-
333
- # Prevent printing spurious transformers error when using pipeline with AutoGPTQ
334
- logging.set_verbosity(logging.CRITICAL)
335
-
336
- prompt = "Tell me about AI"
337
- prompt_template=f'''### Human: {prompt}
338
- ### Assistant:'''
339
-
340
- print("*** Pipeline:")
341
- pipe = pipeline(
342
- "text-generation",
343
- model=model,
344
- tokenizer=tokenizer,
345
- max_new_tokens=512,
346
- temperature=0.7,
347
- top_p=0.95,
348
- repetition_penalty=1.15
349
- )
350
-
351
- print(pipe(prompt_template)[0]['generated_text'])
352
  ```
353
 
354
  ## Provided files
@@ -361,7 +343,7 @@ It was created without group_size to lower VRAM requirements, and with --act-ord
361
 
362
  * `gptq_model-4bit--1g.safetensors`
363
  * Works with AutoGPTQ in CUDA or Triton modes.
364
- * Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
365
  * Works with text-generation-webui, including one-click-installers.
366
  * Parameters: Groupsize = -1. Act Order / desc_act = True.
367
 
 
271
 
272
  These files are GPTQ 4bit model files for [Bigcode's Starcoder](https://huggingface.co/bigcode/starcoder).
273
 
274
+ It is the result of quantising to 4bit using [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ).
275
 
276
  ## Repositories available
277
 
278
  * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/starcoder-GPTQ)
279
+ * [Bigcoder's unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/bigcode/starcoder)
280
+
281
+ ## Prompting
282
+
283
+ The model was trained on GitHub code.
284
+
285
+ As such it is _not_ an instruction model and commands like "Write a function that computes the square root." do not work well.
286
+
287
+ However, by using the [Tech Assistant prompt](https://huggingface.co/datasets/bigcode/ta-prompt) you can turn it into a capable technical assistant.
288
 
289
  ## How to easily download and use this model in text-generation-webui
290
 
 
315
  import argparse
316
 
317
  model_name_or_path = "TheBloke/starcoder-GPTQ"
 
318
 
319
  use_triton = False
320
 
 
328
  use_triton=use_triton,
329
  quantize_config=None)
330
 
331
+ inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
332
+ outputs = model.generate(inputs)
333
+ print(tokenizer.decode(outputs[0]))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
334
  ```
335
 
336
  ## Provided files
 
343
 
344
  * `gptq_model-4bit--1g.safetensors`
345
  * Works with AutoGPTQ in CUDA or Triton modes.
346
+ * Does not work with GPTQ-for-LLaMa.
347
  * Works with text-generation-webui, including one-click-installers.
348
  * Parameters: Groupsize = -1. Act Order / desc_act = True.
349