kaizuberbuehler
's Collections
LM Training
updated
Rho-1: Not All Tokens Are What You Need
Paper
•
2404.07965
•
Published
•
90
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper
•
2404.10667
•
Published
•
18
Instruction-tuned Language Models are Better Knowledge Learners
Paper
•
2402.12847
•
Published
•
26
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper
•
2402.09353
•
Published
•
26
QLoRA: Efficient Finetuning of Quantized LLMs
Paper
•
2305.14314
•
Published
•
48
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
•
2403.03507
•
Published
•
184
Reverse Training to Nurse the Reversal Curse
Paper
•
2403.13799
•
Published
•
13
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
•
2312.15166
•
Published
•
57
ReFT: Representation Finetuning for Language Models
Paper
•
2404.03592
•
Published
•
92
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
•
2404.03715
•
Published
•
61
Learn Your Reference Model for Real Good Alignment
Paper
•
2404.09656
•
Published
•
83
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
•
2404.08801
•
Published
•
65
Pre-training Small Base LMs with Fewer Tokens
Paper
•
2404.08634
•
Published
•
35
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Paper
•
2404.07413
•
Published
•
37
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
Training Strategies
Paper
•
2404.06395
•
Published
•
22
SambaLingo: Teaching Large Language Models New Languages
Paper
•
2404.05829
•
Published
•
13
Advancing LLM Reasoning Generalists with Preference Trees
Paper
•
2404.02078
•
Published
•
44
Poro 34B and the Blessing of Multilinguality
Paper
•
2404.01856
•
Published
•
13
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
•
2404.14219
•
Published
•
255
The Instruction Hierarchy: Training LLMs to Prioritize Privileged
Instructions
Paper
•
2404.13208
•
Published
•
39
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper
•
2404.05892
•
Published
•
33
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
•
2312.00752
•
Published
•
140
OpenELM: An Efficient Language Model Family with Open-source Training
and Inference Framework
Paper
•
2404.14619
•
Published
•
127
Jamba: A Hybrid Transformer-Mamba Language Model
Paper
•
2403.19887
•
Published
•
107
Make Your LLM Fully Utilize the Context
Paper
•
2404.16811
•
Published
•
53
Tele-FLM Technical Report
Paper
•
2404.16645
•
Published
•
18
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video
Dense Captioning
Paper
•
2404.16994
•
Published
•
36
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper
•
2405.00732
•
Published
•
120
Iterative Reasoning Preference Optimization
Paper
•
2404.19733
•
Published
•
48
What matters when building vision-language models?
Paper
•
2405.02246
•
Published
•
102
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper
•
2405.12130
•
Published
•
47
Your Transformer is Secretly Linear
Paper
•
2405.12250
•
Published
•
151
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small
Reference Models
Paper
•
2405.20541
•
Published
•
22
How Do Large Language Models Acquire Factual Knowledge During
Pretraining?
Paper
•
2406.11813
•
Published
•
31
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
with Nothing
Paper
•
2406.08464
•
Published
•
67
The Llama 3 Herd of Models
Paper
•
2407.21783
•
Published
•
110
Gemma 2: Improving Open Language Models at a Practical Size
Paper
•
2408.00118
•
Published
•
76
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware
Experts
Paper
•
2407.21770
•
Published
•
22
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Paper
•
2408.07055
•
Published
•
66
Data curation via joint example selection further accelerates multimodal
learning
Paper
•
2406.17711
•
Published
•
3
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Paper
•
2408.12570
•
Published
•
31
OLMoE: Open Mixture-of-Experts Language Models
Paper
•
2409.02060
•
Published
•
78
Training Language Models to Self-Correct via Reinforcement Learning
Paper
•
2409.12917
•
Published
•
136
GRIN: GRadient-INformed MoE
Paper
•
2409.12136
•
Published
•
16
Preference Tuning with Human Feedback on Language, Speech, and Vision
Tasks: A Survey
Paper
•
2409.11564
•
Published
•
20
NVLM: Open Frontier-Class Multimodal LLMs
Paper
•
2409.11402
•
Published
•
73
Instruction Following without Instruction Tuning
Paper
•
2409.14254
•
Published
•
28
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
Multimodal Models
Paper
•
2409.17146
•
Published
•
106
Programming Every Example: Lifting Pre-training Data Quality like
Experts at Scale
Paper
•
2409.17115
•
Published
•
61
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from
Disparate Training Data
Paper
•
2406.14546
•
Published
•
2
Thinking LLMs: General Instruction Following with Thought Generation
Paper
•
2410.10630
•
Published
•
18
Paper
•
2412.08905
•
Published
•
106
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper
•
2412.16145
•
Published
•
38
RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Paper
•
2412.14922
•
Published
•
85
Diving into Self-Evolving Training for Multimodal Reasoning
Paper
•
2412.17451
•
Published
•
42
Paper
•
2412.16720
•
Published
•
31
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
Paper
•
2411.15124
•
Published
•
58
Natural Language Reinforcement Learning
Paper
•
2411.14251
•
Published
•
28
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Paper
•
2411.04905
•
Published
•
113
Mixture-of-Transformers: A Sparse and Scalable Architecture for
Multi-Modal Foundation Models
Paper
•
2411.04996
•
Published
•
50
2.5 Years in Class: A Multimodal Textbook for Vision-Language
Pretraining
Paper
•
2501.00958
•
Published
•
97
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
•
2501.04682
•
Published
•
89
Scaling Laws for Floating Point Quantization Training
Paper
•
2501.02423
•
Published
•
25
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Paper
•
2501.01904
•
Published
•
31
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical
Reasoning
Paper
•
2501.06458
•
Published
•
29
Enhancing Human-Like Responses in Large Language Models
Paper
•
2501.05032
•
Published
•
49
Do generative video models learn physical principles from watching
videos?
Paper
•
2501.09038
•
Published
•
30
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
•
2501.12948
•
Published
•
164
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper
•
2501.12599
•
Published
•
47