Rodimus*

Introduction

Rodimus* is a new series of efficient large language models designed to address the challenges of computational complexity in Transformer-based architectures. The Rodimus* includes the base Rodimus model and its enhanced version, Rodimus+. Rodimus leverages a novel Data-Dependent Tempered Selection (DDTS) mechanism within a purely recurrent, linear attention-based framework, achieving high performance.

Building on this, Rodimus+ combines the strengths of Rodimus and the innovative Sliding Window Shared-Key Attention (SW-SKA) in a hybrid approach. This combination effectively integrates semantic, token, and head compression techniques, enabling a balance between accuracy and efficiency.

For more details, please refer to our Paper and Github.

This repository contains the latest checkpoint of Rodimus+ 1.6B trained by continuously updated data, with a focus on the performance of code and math.

Usage

We do not recommend using base language models directly for text generation. Instead, consider applying post-training techniques such as SFT, RLHF or continued pretraining to enhance the model's performance.

Installation

  1. The latest version of transformers is recommended (at least 4.42.0).
  2. We evaluate our models with python=3.8 and torch==2.1.2.
  3. If you use Rodimus, you need to install flash-linear-attention and triton>=2.2.0. If you use Rodimus+, you need to further install flash-attention.

Generation

generate APi

import os
import torch
from modeling_rodimus import RodimusForCausalLM
from tokenization_rodimus_fast import RodimusTokenizer

# load model
ckpt_dir = "model_path"
tokenizer = RodimusTokenizer.from_pretrained(ckpt_dir)
model = RodimusForCausalLM.from_pretrained(
    ckpt_dir,
    torch_dtype=torch.float16,
    device_map="cuda"
).eval()

# inference
input_prompt = "你好!你是谁?"
model_inputs = tokenizer(input_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**model_inputs, max_length=32)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

print(response)

Performance

Code Tasks: HumanEval (0-shot), MBPP (0-shot)

Math Tasks: GSM8K (4-shot), MATH (5-shot)

NLP Tasks: C-Eval (5-shot), CMMLU (5-shot), MMLU (5-shot), BBH (3-shot)

Latest update time: 2025/02/15

Datasets Rodimus+ 1.6B (20250215)
HumanEval 24.39
MBPP 26.60
GSM8K 50.19
MATH 15.06
C-Eval 47.19
CMMLU 43.76
MMLU 45.52
BBH 35.28

Citation

If you find our work helpful, feel free to give us a cite.

@inproceedings{
he2025rodimus,
title={Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions},
author={Zhihao He and Hang Yu and Zi Gong and Shizhan Liu and Jianguo Li and Weiyao Lin},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=IIVYiJ1ggK}
}
Downloads last month
18
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Collections including codefuse-ai/rodimus_plus_1B6_base_20250215