-
Rethinking Optimization and Architecture for Tiny Language Models
Paper • 2402.02791 • Published • 12 -
More Agents Is All You Need
Paper • 2402.05120 • Published • 51 -
Scaling Laws for Forgetting When Fine-Tuning Large Language Models
Paper • 2401.05605 • Published -
Aligning Large Language Models with Counterfactual DPO
Paper • 2401.09566 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2403.09629
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 21 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 16 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 27 -
The Impact of Reasoning Step Length on Large Language Models
Paper • 2401.04925 • Published • 15
-
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Paper • 2401.03065 • Published • 10 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 46 -
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation
Paper • 2312.14187 • Published • 49 -
On the Effectiveness of Large Language Models in Domain-Specific Code Generation
Paper • 2312.01639 • Published • 1
-
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Paper • 2312.04474 • Published • 29 -
Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning
Paper • 2312.08901 • Published -
Learning From Mistakes Makes LLM Better Reasoner
Paper • 2310.20689 • Published • 28 -
Making Large Language Models Better Reasoners with Step-Aware Verifier
Paper • 2206.02336 • Published • 1
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 140 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 96 -
ReFT: Representation Finetuning for Language Models
Paper • 2404.03592 • Published • 86 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 256
-
google/flan-t5-large
Text2Text Generation • Updated • 594k • • 548 -
deepseek-ai/deepseek-coder-6.7b-instruct
Text Generation • Updated • 93k • 335 -
Object Recognition as Next Token Prediction
Paper • 2312.02142 • Published • 11 -
colbert-ir/dspy-Oct11-T5-Large-MH-3k-v1
Text2Text Generation • Updated • 9 • 1
-
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Paper • 2402.10176 • Published • 34 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 49 -
Matryoshka Representation Learning
Paper • 2205.13147 • Published • 8
-
Ada-Instruct: Adapting Instruction Generators for Complex Reasoning
Paper • 2310.04484 • Published • 5 -
Diversity of Thought Improves Reasoning Abilities of Large Language Models
Paper • 2310.07088 • Published • 5 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 75 -
Democratizing Reasoning Ability: Tailored Learning from Large Language Model
Paper • 2310.13332 • Published • 14
-
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 75 -
Challenges and Applications of Large Language Models
Paper • 2307.10169 • Published • 47 -
Efficiently Modeling Long Sequences with Structured State Spaces
Paper • 2111.00396 • Published • 1 -
DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning
Paper • 2006.08381 • Published