Model Card for mamba2-130m-8bit-mlx

This is an MLX-compatible version of the mamba2-130m model, quantized to 8 bits. It uses the EleutherAI/gpt-neox-20b tokenizer. For more details, see our blog post.

Usage

Installation

This model requires the cartesia-metal and cartesia-mlx packages.

Installation requires Xcode, which can be downloaded from https://developer.apple.com/xcode/. Accept the license agreement with:

sudo xcodebuild -license

Install the required dependencies: the exact version of nanobind, followed by cartesia-metal, and finally cartesia-mlx, with the following commands:

pip install nanobind@git+https://github.com/wjakob/nanobind.git@2f04eac452a6d9142dedb957701bdb20125561e4
pip install git+https://github.com/cartesia-ai/edge.git#subdirectory=cartesia-metal
pip install cartesia-mlx

Note: This package has been tested on macOS Sonoma 14.1 with the M3 chip.

Generation example

import mlx.core as mx
import cartesia_mlx as cmx

model = cmx.from_pretrained("cartesia-ai/mamba2-130m-8bit-mlx")
model.set_dtype(mx.float32)   

prompt = "Rene Descartes was"

print(prompt, end="", flush=True)
for text in model.generate(
    prompt,
    max_tokens=500,
    eval_every_n=5,
    verbose=True,
    top_p=0.99,
    temperature=0.85,
):
    print(text, end="", flush=True)

About Cartesia

At Cartesia, we're building real-time multimodal intelligence for every device.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Collection including cartesia-ai/mamba2-130m-8bit-mlx