pretraining - a tyzhu Collection

tyzhu 's Collections

IR

pretraining

updated about 14 hours ago

Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models

Paper • 2502.15499 • Published 13 days ago • 13
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs

Paper • 2502.17422 • Published 10 days ago • 7
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

Paper • 2502.17535 • Published 10 days ago • 8
Scaling LLM Pre-training with Vocabulary Curriculum

Paper • 2502.17910 • Published 10 days ago • 1
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Paper • 2502.15007 • Published 14 days ago • 157
Predictive Data Selection: The Data That Predicts Is the Data That Teaches

Paper • 2503.00808 • Published 4 days ago • 49