help me with a question in 4bits model
I recently used AUTOGPTQ with GPT-J models and it worked quite well, now out of nowhere I get an error with triton even though it indicates that it is turned off, has this ever happened to you, do you know of a solution,
error: (base) C:\Users\ReDXeoL\AutoGPTQ\examples\quantization>python basic_usage.py
triton not installed.
Traceback (most recent call last):
##
##
ModuleNotFoundError: No module named 'triton'
(base) C:\Users\ReDXeoL\AutoGPTQ\examples\quantization>
this is my code:
import os
from transformers import AutoTokenizer, TextGenerationPipeline
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
pretrained_model_dir = "A:/LLMs_LOCAL/bertin_gpt_j_6B_alpaca/"
quantized_model_dir = "bertin-gpt-j-6B-alpaca-4bit-128g"
os.makedirs(quantized_model_dir, exist_ok=True)
def main():
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True)
examples = [
tokenizer(
"auto-gptq es una biblioteca de cuantificación de modelos fácil de usar con API amigables para el usuario, basada en el algoritmo GPTQ."
),
tokenizer(
"La inteligencia artificial ha avanzado significativamente en los últimos años."
),
tokenizer(
"La cuantificación de modelos puede reducir el tamaño y mejorar la eficiencia del modelo."
),
tokenizer(
"Los algoritmos de cuantificación pueden reducir la cantidad de memoria y energía requerida."
),
tokenizer(
"El aprendizaje profundo se utiliza en una variedad de aplicaciones, desde la medicina hasta el marketing."
),
tokenizer(
"La arquitectura GPT-4 es la base de muchos modelos de lenguaje de última generación."
),
tokenizer(
"El procesamiento del lenguaje natural permite a las máquinas comprender y comunicarse en lenguajes humanos."
),
tokenizer(
"Las redes neuronales convolucionales se utilizan comúnmente en la visión por computadora."
),
tokenizer(
"Los algoritmos de optimización son fundamentales para el entrenamiento de modelos de aprendizaje profundo."
),
tokenizer(
"El aprendizaje por refuerzo es una técnica de aprendizaje automático en la que los agentes aprenden a través de la interacción con su entorno."
)
]
quantize_config = BaseQuantizeConfig(
bits=4, # quantize model to 4-bit
group_size=128, # it is recommended to set the value to 128
desc_act=False
)
# load un-quantized model, the model will always be force loaded into cpu
model = AutoGPTQForCausalLM.from_pretrained(pretrained_model_dir, quantize_config)
# quantize model, the examples should be list of dict whose keys contains "input_ids" and "attention_mask"
# with value under torch.LongTensor type.
model.quantize(examples, use_triton=False)
# save quantized model
model.save_quantized(quantized_model_dir)
# save quantized model using safetensors
model.save_quantized(quantized_model_dir, use_safetensors=True)
# load quantized model, currently only support cpu or single gpu
model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir, device="cuda:0", use_triton=False)
# inference with model.generate
print(tokenizer.decode(model.generate(**tokenizer("auto_gptq is", return_tensors="pt").to("cuda:0"))[0]))
# or you can also use pipeline
pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer, device="cuda:0")
print(pipeline("auto-gptq is")[0]["generated_text"])
if name == "main":
import logging
logging.basicConfig(
format="%(asctime)s %(levelname)s [%(name)s] %(message)s", level=logging.INFO, datefmt="%Y-%m-%d %H:%M:%S"
)
main()
Yeah this is a recent bug in AutoGPTQ. I pushed a PR that fixes it: https://github.com/PanQiWei/AutoGPTQ/pull/85
Hopefully it'll be merged into main soon. Or pull my PR and build AutoGPTQ from that for now
i get the same error when i want to activate it in text-generation-webui
--autogptq
You did rebuild with pip install .
?
I'm going to bed now but if its still a problem let me know and I'll look tomorrow. Do double check that that basic example isn't setting Triton to True
sorry, it was my fault, I didn't know about pip install . ... it works now, have a nice night, thank you very much, you are a genius.