Collections
Discover the best community collections!
Collections including paper arxiv:2401.08671
-
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 50 -
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 53 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 143 -
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper • 2401.12954 • Published • 28
-
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 258 -
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Paper • 2312.12456 • Published • 41 -
Accelerating LLM Inference with Staged Speculative Decoding
Paper • 2308.04623 • Published • 23 -
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Paper • 2208.07339 • Published • 4
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 16 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 9 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 11 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 47
-
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 35 -
Co-training and Co-distillation for Quality Improvement and Compression of Language Models
Paper • 2311.02849 • Published • 3 -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 28 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 118
-
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper • 2311.10093 • Published • 57 -
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
Paper • 2311.12229 • Published • 26 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 47 -
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
Paper • 2312.00845 • Published • 36
-
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Paper • 2310.09199 • Published • 24 -
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 14 -
Personality Traits in Large Language Models
Paper • 2307.00184 • Published • 20 -
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Paper • 2310.12962 • Published • 14