seonglae
/

opt-125m-4bit-gptq

Text Generation

Inference Endpoints

Model card Files Files and versions Community

opt-125m-4bit-gptq / README.md

seonglae's picture

docs: update model card get started

563d486 over 1 year ago

|

history blame contribute delete

930 Bytes

	---
	license: mit
	tags:
	- auto-gptq
	- opt
	- gptq
	- 4bit
	---

	This model should use [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) so you need to use `auto-gptq`

	```py
	from transformers import AutoTokenizer, pipeline, AutoModelForCausalLM, LlamaForCausalLM, LlamaTokenizer, StoppingCriteria, PreTrainedTokenizerBase
	from auto_gptq import AutoGPTQForCausalLM

	model_id = 'seonglae/opt-125m-4bit-gptq'
	tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
	model = AutoGPTQForCausalLM.from_quantized(
	model_id,
	model_basename=model_basename,
	trust_remote_code=True,
	device='cuda:0',
	use_triton=False,
	use_safetensors=True,
	)

	pipe = pipeline(
	"text-generation",
	model=model,
	tokenizer=tokenizer,
	temperature=0.5,
	top_p=0.95,
	max_new_tokens=100,
	repetition_penalty=1.15,
	)
	prompt = "USER: Are you AI?\nASSISTANT:"
	pipe(prompt)
	```