Rodimus*

Introduction

Rodimus* is a new series of efficient large language models designed to address the challenges of computational complexity in Transformer-based architectures. The Rodimus* includes the base Rodimus model and its enhanced version, Rodimus+. Rodimus leverages a novel Data-Dependent Tempered Selection (DDTS) mechanism within a purely recurrent, linear attention-based framework, achieving high performance.

Building on this, Rodimus+ combines the strengths of Rodimus and the innovative Sliding Window Shared-Key Attention (SW-SKA) in a hybrid approach. This combination effectively integrates semantic, token, and head compression techniques, enabling a balance between accuracy and efficiency.

For more details, please refer to our Paper and Github.

This repository contains the latest checkpoint of Rodimus+ 1.6B trained by continuously updated data, with a focus on the performance of code and math.

Usage

We do not recommend using base language models directly for text generation. Instead, consider applying post-training techniques such as SFT, RLHF or continued pretraining to enhance the model's performance.

Installation

The latest version of transformers is recommended (at least 4.42.0).
We evaluate our models with python=3.8 and torch==2.1.2.
If you use Rodimus, you need to install flash-linear-attention and triton>=2.2.0. If you use Rodimus+, you need to further install flash-attention.

Generation

generate APi

import os
import torch
from modeling_rodimus import RodimusForCausalLM
from tokenization_rodimus_fast import RodimusTokenizer

# load model
ckpt_dir = "model_path"
tokenizer = RodimusTokenizer.from_pretrained(ckpt_dir)
model = RodimusForCausalLM.from_pretrained(
    ckpt_dir,
    torch_dtype=torch.float16,
    device_map="cuda"
).eval()

# inference
input_prompt = "你好！你是谁？"
model_inputs = tokenizer(input_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**model_inputs, max_length=32)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

print(response)

Performance

Code Tasks: HumanEval (0-shot), MBPP (0-shot)

Math Tasks: GSM8K (4-shot), MATH (5-shot)

NLP Tasks: C-Eval (5-shot), CMMLU (5-shot), MMLU (5-shot), BBH (3-shot)

Latest update time: 2025/02/15

Datasets	Rodimus+ 1.6B (20250215)
HumanEval	24.39
MBPP	26.60
GSM8K	50.19
MATH	15.06
C-Eval	47.19
CMMLU	43.76
MMLU	45.52
BBH	35.28

Citation

If you find our work helpful, feel free to give us a cite.

@inproceedings{
he2025rodimus,
title={Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions},
author={Zhihao He and Hang Yu and Zi Gong and Shizhan Liu and Jianguo Li and Weiyao Lin},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=IIVYiJ1ggK}
}

codefuse-ai
/

rodimus_plus_1B6_base_20250215

Rodimus*

Introduction

Usage

Generation

Performance

Citation

Collections including codefuse-ai/rodimus_plus_1B6_base_20250215

Native Models

Rodimus