Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding Paper • 2402.05109 • Published Feb 7, 2024 • 1
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19, 2024 • 55
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 77