Introduction
This repository hosts the LLaMa 3.2 models for the React Native ExecuTorch library. It includes both the 1B and 3B versions of the LLaMa model, as well as their quantized versions in .pte
format, ready for use in the ExecuTorch runtime.
If you'd like to run these models in your own ExecuTorch runtime, refer to the official documentation for setup instructions.
Compatibility
If you intend to use this model outside of React Native ExecuTorch, make sure your runtime is compatible with the ExecuTorch version used to export the .pte
files. For more details, see the compatibility note in the ExecuTorch GitHub repository. If you work with React Native ExecuTorch, the constants from the library will guarantee compatibility with runtime used behind the scenes.
These models were exported using commit fe20be98c
and no forward compatibility is guaranteed. Older versions of the runtime may not work with these files.
Repository Structure
The repository is organized into two main directories:
llama-3.2-1B
llama-3.2-3B
Each directory contains different versions of the model, including QLoRa, SpinQuant, and the original models.
- The
.pte
file should be passed to themodelSource
parameter. - The corresponding
.bin
file should be used fortokenizerSource
.
If you wish to export the model yourself, you’ll need to obtain model weights and the params.json
file from the official repositories, which can be found here.
For the best performance-to-quality ratio, we highly recommend using the QLoRa version, which is optimized for speed without sacrificing too much on model quality.
- Downloads last month
- 350