-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 84 -
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper • 2404.10667 • Published • 17 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 24 -
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper • 2402.09353 • Published • 26
Collections
Discover the best community collections!
Collections including paper arxiv:2402.09353
-
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 16 -
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Paper • 2311.15127 • Published • 12 -
Learning Transferable Visual Models From Natural Language Supervision
Paper • 2103.00020 • Published • 11 -
U-Net: Convolutional Networks for Biomedical Image Segmentation
Paper • 1505.04597 • Published • 7
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 4 -
Attention Is All You Need
Paper • 1706.03762 • Published • 44 -
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper • 2005.11401 • Published • 12 -
Language Model Evaluation Beyond Perplexity
Paper • 2106.00085 • Published
-
Efficient Few-Shot Learning Without Prompts
Paper • 2209.11055 • Published • 3 -
Parameter-Efficient Transfer Learning for NLP
Paper • 1902.00751 • Published • 1 -
GPT Understands, Too
Paper • 2103.10385 • Published • 8 -
The Power of Scale for Parameter-Efficient Prompt Tuning
Paper • 2104.08691 • Published • 9
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 602 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 182 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
ResLoRA: Identity Residual Mapping in Low-Rank Adaption
Paper • 2402.18039 • Published • 11
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 8 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 99 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 109
-
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper • 2402.09353 • Published • 26 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 126 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 22 -
AtP*: An efficient and scalable method for localizing LLM behaviour to components
Paper • 2403.00745 • Published • 11