SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 8 items • Updated 8 days ago • 163
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction Paper • 2410.21169 • Published 15 days ago • 29
OPT Collection OPT (Open Pretrained Transformer) is a series of open-sourced large causal language models which perform similar in performance to GPT3. • 12 items • Updated 19 days ago • 4
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Sep 18 • 326
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? Paper • 2409.07703 • Published Sep 12 • 66
Switch-Transformers release Collection This release included various MoE (Mixture of expert) models, based on the T5 architecture . The base models use from 8 to 256 experts. • 9 items • Updated Jul 31 • 16
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention Aug 21 • 22
view article Article ∞🧙🏼♂️AnyClassifier - Generating Synthetic Data For Text Classification By kenhktsui • Aug 19 • 8
view article Article The case for specialized pre-training: ultra-fast foundation models for dedicated tasks By Pclanglais • Aug 4 • 26
📈 Scaling Laws with Vocabulary Collection Increase your vocabulary size when you scale up your language model • 5 items • Updated Aug 11 • 4
Gemma 2: Improving Open Language Models at a Practical Size Paper • 2408.00118 • Published Jul 31 • 73
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models Paper • 2407.12327 • Published Jul 17 • 77