Introduction

This repository hosts the LLaMa 3.2 models for the React Native ExecuTorch library. It includes both the 1B and 3B versions of the LLaMa model, as well as their quantized versions in .pte format, ready for use in the ExecuTorch runtime.

If you'd like to run these models in your own ExecuTorch runtime, refer to the official documentation for setup instructions.

Compatibility

If you intend to use this model outside of React Native ExecuTorch, make sure your runtime is compatible with the ExecuTorch version used to export the .pte files. For more details, see the compatibility note in the ExecuTorch GitHub repository. If you work with React Native ExecuTorch, the constants from the library will guarantee compatibility with runtime used behind the scenes.

These models were exported using commit fe20be98c and no forward compatibility is guaranteed. Older versions of the runtime may not work with these files.

Repository Structure

The repository is organized into two main directories:

llama-3.2-1B
llama-3.2-3B

Each directory contains different versions of the model, including QLoRa, SpinQuant, and the original models.

The .pte file should be passed to the modelSource parameter.
The corresponding .bin file should be used for tokenizerSource.

If you wish to export the model yourself, you’ll need to obtain model weights and the params.json file from the official repositories, which can be found here.

For the best performance-to-quality ratio, we highly recommend using the QLoRa version, which is optimized for speed without sacrificing too much on model quality.