78 36 144

Yaowei Zheng

hiyouga

https://github.com/hiyouga

AI & ML interests

LLM Knowledge Management

Recent Activity

liked a model 1 day ago

xwen-team/Xwen-7B-Chat

liked a model 1 day ago

xwen-team/Xwen-72B-Chat

liked a dataset 3 days ago

cognitivecomputations/dolphin-r1

View all activity

Articles

GaLore: Advancing Large Model Training on Consumer-grade Hardware

Mar 20, 2024

• 26

Organizations

hiyouga's activity

upvoted 4 papers 20 days ago

upvoted a paper 25 days ago

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published 26 days ago • 252

upvoted 2 papers about 1 month ago

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Paper • 2412.18619 • Published Dec 16, 2024 • 54

OpenAI o1 System Card

Paper • 2412.16720 • Published Dec 21, 2024 • 31

upvoted a paper about 2 months ago

ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 79

upvoted a paper 2 months ago

Yi-Lightning Technical Report

Paper • 2412.01253 • Published Dec 2, 2024 • 27

upvoted 3 papers 3 months ago

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

Paper • 2411.07975 • Published Nov 12, 2024 • 29

Hyper-Connections

Paper • 2409.19606 • Published Sep 29, 2024 • 21

LLM-based Optimization of Compound AI Systems: A Survey

Paper • 2410.16392 • Published Oct 21, 2024 • 14

upvoted an article 4 months ago

Article

A Short Summary of Chinese AI Global Expansion

Oct 3, 2024

• 21

upvoted a collection 5 months ago

Qwen2.5

Collection

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Nov 28, 2024 • 496

upvoted an article 5 months ago

Article

Meet Yi-Coder: A Small but Mighty LLM for Code

•

Sep 4, 2024

• 15

upvoted a paper 5 months ago

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3, 2024 • 78

upvoted an article 5 months ago

Article

Understanding Vector Quantization in VQ-VAE

•

Aug 28, 2024

• 15

upvoted 3 papers 5 months ago

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Paper • 2408.16725 • Published Aug 29, 2024 • 53

LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models

Paper • 2409.00509 • Published Aug 31, 2024 • 38

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

Paper • 2408.15664 • Published Aug 28, 2024 • 12