Ouroboros: Speculative Decoding with Large Model Enhanced Drafting Paper • 2402.13720 • Published Feb 21, 2024 • 7
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies Paper • 2404.06395 • Published Apr 9, 2024 • 22
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads Paper • 2410.01805 • Published Oct 2, 2024
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling Paper • 2502.14856 • Published 14 days ago • 7
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs Paper • 2502.12085 • Published 17 days ago • 2
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling Paper • 2502.14856 • Published 14 days ago • 7