Moses25
/

Llama-3-8B-chat-32K

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Moses25 commited on Oct 25, 2024

Commit

df632e1

·

verified ·

1 Parent(s): ff06b08

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -77,6 +77,7 @@ model_path = Llama-3-8B-chat-32K
 python  -m vllm.entrypoints.openai.api_server --model=$model_path \
         --trust-remote-code --host 0.0.0.0  --port 7777 \
         --gpu-memory-utilization 0.8 \
         --max-model-len 8192 --chat-template llama3-chat-template.jinja \
         --tensor-parallel-size 1 --served-model-name chatbot
 ```

 python  -m vllm.entrypoints.openai.api_server --model=$model_path \
         --trust-remote-code --host 0.0.0.0  --port 7777 \
         --gpu-memory-utilization 0.8 \
+        --enforce_eager \
         --max-model-len 8192 --chat-template llama3-chat-template.jinja \
         --tensor-parallel-size 1 --served-model-name chatbot
 ```