llama-2-7b-chat response too little tokens?
#9
by
st01cs
- opened
Hi,
I deploy llama-2-7b-chat.ggmlv3.q6_K.bin with llama-cpp-python[server].
Try to access it with OpenAI API,
curl -X 'POST' \
'http://llama07.server.com/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "Write a poem for France?",
"role": "user"
}
]
}'
its response body,
{
"id": "chatcmpl-93a635e0-af7a-4b78-8e96-f93c84b59c69",
"object": "chat.completion",
"created": 1690286307,
"model": "/models/llama-2-7b-chat.ggmlv3.q6_K.bin",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Of course! Here is a poem for France:\n\nFrance, the land"
},
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 26,
"completion_tokens": 16,
"total_tokens": 42
}
}
It always return little tokens, how can I get the full poem in this case ?
Thanks a lot for your job!
By the way, I set llama-cpp-python with following params,
-e USE_MLOCK=0 \
-e N_THREADS=64 \
-e N_BATCH=2048 \
-e N_CTX=8192 \
curl -X 'POST' \
'http://llama07.server.com/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"max_tokens": 512,
"messages": [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "Write a poem for France?",
"role": "user"
}
]
}'
st01cs
changed discussion status to
closed
Did you ever get a solution to this?