Rodimus*
Introduction
Rodimus* is a new series of efficient large language models designed to address the challenges of computational complexity in Transformer-based architectures. The Rodimus* includes the base Rodimus model and its enhanced version, Rodimus+. Rodimus leverages a novel Data-Dependent Tempered Selection (DDTS) mechanism within a purely recurrent, linear attention-based framework, achieving high performance.
Building on this, Rodimus+ combines the strengths of Rodimus and the innovative Sliding Window Shared-Key Attention (SW-SKA) in a hybrid approach. This combination effectively integrates semantic, token, and head compression techniques, enabling a balance between accuracy and efficiency.
For more details, please refer to our Paper and Github.
This repository contains the latest checkpoint of Rodimus+ 1.6B trained by continuously updated data, with a focus on the performance of code and math.
Usage
We do not recommend using base language models directly for text generation. Instead, consider applying post-training techniques such as SFT, RLHF or continued pretraining to enhance the model's performance.
Installation
- The latest version of transformers is recommended (at least 4.42.0).
- We evaluate our models with
python=3.8
andtorch==2.1.2
. - If you use Rodimus, you need to install flash-linear-attention and triton>=2.2.0. If you use Rodimus+, you need to further install flash-attention.
Generation
generate
APi
import os
import torch
from modeling_rodimus import RodimusForCausalLM
from tokenization_rodimus_fast import RodimusTokenizer
# load model
ckpt_dir = "model_path"
tokenizer = RodimusTokenizer.from_pretrained(ckpt_dir)
model = RodimusForCausalLM.from_pretrained(
ckpt_dir,
torch_dtype=torch.float16,
device_map="cuda"
).eval()
# inference
input_prompt = "你好!你是谁?"
model_inputs = tokenizer(input_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**model_inputs, max_length=32)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(response)
Performance
Code Tasks: HumanEval (0-shot), MBPP (0-shot)
Math Tasks: GSM8K (4-shot), MATH (5-shot)
NLP Tasks: C-Eval (5-shot), CMMLU (5-shot), MMLU (5-shot), BBH (3-shot)
Latest update time: 2025/02/15
Datasets | Rodimus+ 1.6B (20250215) |
---|---|
HumanEval | 24.39 |
MBPP | 26.60 |
GSM8K | 50.19 |
MATH | 15.06 |
C-Eval | 47.19 |
CMMLU | 43.76 |
MMLU | 45.52 |
BBH | 35.28 |
Citation
If you find our work helpful, feel free to give us a cite.
@inproceedings{
he2025rodimus,
title={Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions},
author={Zhihao He and Hang Yu and Zi Gong and Shizhan Liu and Jianguo Li and Weiyao Lin},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=IIVYiJ1ggK}
}
- Downloads last month
- 18