# meta-llama | | Model Configuration | |---------------------|:-----------------------------------------------------------------------------------------------:| | Source Model | [`meta-llama/Llama-3.3-70B-Instruct`](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | | Inference API | `MLC_LLM` | | Quantization | `q4f16_ft` | | Model Type | `llama` | | Vocab Size | `128256` | | Context Window Size | `131072` | | Prefill Chunk Size | `8192` | | Temperature | `0.6` | | Repetition Penalty | `1.0` | | top_p | `0.9` | | pad_token_id | `0` | | bos_token_id | `128000` | | eos_token_id | `[128001, 128008, 128009]` | See [`jetson-ai-lab.com/models.html`](https://jetson-ai-lab.com/models.html) for benchmarks, examples, and containers to deploy local serving and inference for these quantized models.