LLM as a Broken Telephone: Iterative Generation Distorts Information Paper • 2502.20258 • Published 11 days ago • 18
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Paper • 2503.04598 • Published 4 days ago • 16
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper • 2501.18585 • Published Jan 30 • 56
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 108
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning Paper • 2411.02337 • Published Nov 4, 2024 • 35
Why Does the Effective Context Length of LLMs Fall Short? Paper • 2410.18745 • Published Oct 24, 2024 • 18
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration Paper • 2410.02367 • Published Oct 3, 2024 • 48
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 574
Granite Code Models: A Family of Open Foundation Models for Code Intelligence Paper • 2405.04324 • Published May 7, 2024 • 22
Arctic-embed Collection A collection of text embedding models optimized for retrieval accuracy and efficiency • 8 items • Updated Dec 5, 2024 • 20
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published Aug 29, 2024 • 49
CogVLM2: Visual Language Models for Image and Video Understanding Paper • 2408.16500 • Published Aug 29, 2024 • 57
Controllable Text Generation for Large Language Models: A Survey Paper • 2408.12599 • Published Aug 22, 2024 • 65
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published Jun 22, 2024 • 46
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools Paper • 2406.12793 • Published Jun 18, 2024 • 32