Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper β’ 2501.13629 β’ Published 3 days ago β’ 37
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B Text Generation β’ Updated 3 days ago β’ 86.1k β’ β’ 488
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published 18 days ago β’ 248
SCBench: A KV Cache-Centric Analysis of Long-Context Methods Paper β’ 2412.10319 β’ Published Dec 13, 2024 β’ 9
SCBench: A KV Cache-Centric Analysis of Long-Context Methods Paper β’ 2412.10319 β’ Published Dec 13, 2024 β’ 9
SCBench: A KV Cache-Centric Analysis of Long-Context Methods Paper β’ 2412.10319 β’ Published Dec 13, 2024 β’ 9 β’ 2
Multimodal Latent Language Modeling with Next-Token Diffusion Paper β’ 2412.08635 β’ Published Dec 11, 2024 β’ 44
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy Sep 18, 2024 β’ 216
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Paper β’ 2409.10516 β’ Published Sep 16, 2024 β’ 41
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Paper β’ 2409.10516 β’ Published Sep 16, 2024 β’ 41
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Paper β’ 2409.10516 β’ Published Sep 16, 2024 β’ 41 β’ 2
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Paper β’ 2408.11049 β’ Published Aug 20, 2024 β’ 13
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14, 2024 β’ 57