# meta-llama

|                     |                                       Model Configuration                                       |
|---------------------|:-----------------------------------------------------------------------------------------------:|
| Source Model        | [`meta-llama/Llama-3.3-70B-Instruct`](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) |
| Inference API       |                                            `MLC_LLM`                                            |
| Quantization        |                                           `q4f16_ft`                                            |
| Model Type          |                                             `llama`                                             |
| Vocab Size          |                                            `128256`                                             |
| Context Window Size |                                            `131072`                                             |
| Prefill Chunk Size  |                                             `8192`                                              |
| Temperature         |                                              `0.6`                                              |
| Repetition Penalty  |                                              `1.0`                                              |
| top_p               |                                              `0.9`                                              |
| pad_token_id        |                                               `0`                                               |
| bos_token_id        |                                            `128000`                                             |
| eos_token_id        |                                   `[128001, 128008, 128009]`                                    |

See [`jetson-ai-lab.com/models.html`](https://jetson-ai-lab.com/models.html) for benchmarks, examples, and containers to deploy local serving and inference for these quantized models.