Llamba Models

The Llamba models are part of Cartesia's Edge library, designed for efficient, high-performance machine learning applications.

For more details, refer to the paper.

Usage

Llamba on PyTorch

To use Llamba with PyTorch:

Install the required package:

pip install --no-binary :all: cartesia-pytorch

Load and run the model

from transformers import AutoTokenizer
from cartesia_pytorch.Llamba.llamba import LlambaLMHeadModel

model = LlambaLMHeadModel.from_pretrained("AvivBick/Llamba-1B", strict=True).to('cuda')
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
input_ids = tokenizer("Hello, my name is", return_tensors="pt").input_ids
input_ids = input_ids.to('cuda')
output = model.generate(input_ids, max_length=100)[0]
print(tokenizer.decode(output, skip_special_tokens=True))

Llamba on MLX

To run Llamba with the Metal framework:
(Add specific instructions here when available.)

Evaluations

Details on model performance, benchmarks, and evaluation metrics can be found in the paper link.
(Expand on this section if specific results or datasets are available.)