Parrot: Efficient Serving of LLM-based Applications with Semantic Variable Paper • 2405.19888 • Published May 30, 2024 • 7
FlashDecoding++: Faster Large Language Model Inference on GPUs Paper • 2311.01282 • Published Nov 2, 2023 • 36