bad quantization ?

#2
by zoldaten - opened

i tried some models in the row and all of them (LLaMA-Mesh-f16.gguf, LLaMA-Mesh-Q6_K_L.gguf, LLaMA-Mesh-Q8_0.gguf)didnt return appropriate result:
promt: "Create a 3D obj file using the following description: a lamp"

2025-01-20_17h08_26.png
2025-01-20_17h08_41.png
2025-01-20_17h10_28.png

import os from llama_cpp import Llama from huggingface_hub import hf_hub_download import numpy as np

model = Llama(
model_path=hf_hub_download(
repo_id=os.environ.get("REPO_ID", "bartowski/LLaMA-Mesh-GGUF"),
filename=os.environ.get("MODEL_FILE", "LLaMA-Mesh-f16.gguf"),
),
n_gpu_layers=-1
)

message = "Create a 3D obj file using the following description: a lamp"
#message = "Create a 3D model of a table."

response = model.create_chat_completion(
messages=[{"role": "user", "content": message}],
temperature=0.9,
max_tokens=4096,
top_p=0.96,
stream=True,
)
temp=""
for streamed in response:
delta = streamed["choices"][0].get("delta", {})
text_chunk = delta.get("content", "")

    temp += text_chunk

print(temp)

Odd, there shouldn't be anything wrong with the quantization itself, but I also haven't tried to use it. Is this an expected use case that should work? Can you try the original safetensors?

i tried original on demo page - its not ideal sometimes but it works.

my images above result on windows 10 with llama_cli:
llama-cli -m LLaMA-Mesh-Q6_K_L.gguf -p "Create low poly 3D model of a coffe cup" or llama-cli -m LLaMA-Mesh-Q6_K_L.gguf -p "Create a 3D obj file using the following description: a lamp"

ps.
i also use llama_cpp_python code (see above) on ubuntu but model provides a cut of 3d model and finishes thinking its OK:

2025-01-21_10h50_01.png

Sign up or log in to comment