File size: 1,050 Bytes
538a7a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
license: mit
---
### environment
optimum-neuron  0.0.25

neuron 2.20.0

transformers-neuronx  0.12.313

transformers    4.45.2


### export
```
optimum-cli export neuron  --model meta-llama/Llama-3.2-1B-Instruct --batch_size 1 --sequence_length 1024 --num_cores 2 --auto_cast_type fp16  ./models-hf/meta-llama/Llama-3.2-1B-Instruct

```

### run
```
docker run -it --name llama-31 --rm \
   -p 8080:80 \
   -v /home/ec2-user/models-hf/:/models \
   -e HF_MODEL_ID=/models/meta-llama/Llama-3.2-1B-Instruct \
   -e MAX_INPUT_TOKENS=256 \
   -e MAX_TOTAL_TOKENS=4096 \
   -e MAX_BATCH_SIZE=1 \
   -e LOG_LEVEL="info,text_generation_router=debug,text_generation_launcher=debug" \
   --device=/dev/neuron0 \
   neuronx-tgi:latest \
   --model-id /models/meta-llama/Llama-3.2-1B-Instruct \
   --max-batch-size 1 \
   --max-input-tokens 256 \
   --max-total-tokens 1024

```

### test
```
curl 127.0.0.1:8080/generate     -X POST     -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}'     -H 'Content-Type: application/json'
```