Running 2.08k 2.08k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
view article Article How to generate text: using different decoding methods for language generation with Transformers Mar 1, 2020 • 165
unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit Text Generation • Updated 20 days ago • 348k • 22
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency By not-lain • Jan 30 • 35
view article Article Mastering Long Contexts in LLMs with KVPress By nvidia and 1 other • Jan 23 • 64
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published Jan 16 • 37
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published Jan 16 • 70