bad quantization ?
i tried some models in the row and all of them (LLaMA-Mesh-f16.gguf, LLaMA-Mesh-Q6_K_L.gguf, LLaMA-Mesh-Q8_0.gguf)didnt return appropriate result:
promt: "Create a 3D obj file using the following description: a lamp"
model = Llama(
model_path=hf_hub_download(
repo_id=os.environ.get("REPO_ID", "bartowski/LLaMA-Mesh-GGUF"),
filename=os.environ.get("MODEL_FILE", "LLaMA-Mesh-f16.gguf"),
),
n_gpu_layers=-1
)
message = "Create a 3D obj file using the following description: a lamp"
#message = "Create a 3D model of a table."
response = model.create_chat_completion(
messages=[{"role": "user", "content": message}],
temperature=0.9,
max_tokens=4096,
top_p=0.96,
stream=True,
)
temp=""
for streamed in response:
delta = streamed["choices"][0].get("delta", {})
text_chunk = delta.get("content", "")
temp += text_chunk
print(temp)
Odd, there shouldn't be anything wrong with the quantization itself, but I also haven't tried to use it. Is this an expected use case that should work? Can you try the original safetensors?
i tried original on demo page - its not ideal sometimes but it works.
my images above result on windows 10 with llama_cli:
llama-cli -m LLaMA-Mesh-Q6_K_L.gguf -p "Create low poly 3D model of a coffe cup" or llama-cli -m LLaMA-Mesh-Q6_K_L.gguf -p "Create a 3D obj file using the following description: a lamp"
ps.
i also use llama_cpp_python code (see above) on ubuntu but model provides a cut of 3d model and finishes thinking its OK: