File size: 1,050 Bytes
538a7a3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
---
license: mit
---
### environment
optimum-neuron 0.0.25
neuron 2.20.0
transformers-neuronx 0.12.313
transformers 4.45.2
### export
```
optimum-cli export neuron --model meta-llama/Llama-3.2-1B-Instruct --batch_size 1 --sequence_length 1024 --num_cores 2 --auto_cast_type fp16 ./models-hf/meta-llama/Llama-3.2-1B-Instruct
```
### run
```
docker run -it --name llama-31 --rm \
-p 8080:80 \
-v /home/ec2-user/models-hf/:/models \
-e HF_MODEL_ID=/models/meta-llama/Llama-3.2-1B-Instruct \
-e MAX_INPUT_TOKENS=256 \
-e MAX_TOTAL_TOKENS=4096 \
-e MAX_BATCH_SIZE=1 \
-e LOG_LEVEL="info,text_generation_router=debug,text_generation_launcher=debug" \
--device=/dev/neuron0 \
neuronx-tgi:latest \
--model-id /models/meta-llama/Llama-3.2-1B-Instruct \
--max-batch-size 1 \
--max-input-tokens 256 \
--max-total-tokens 1024
```
### test
```
curl 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' -H 'Content-Type: application/json'
``` |