FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling Paper • 2502.14856 • Published 14 days ago • 7
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs Paper • 2502.12085 • Published 17 days ago • 2