Autoregressive Video Generation without Vector Quantization Paper • 2412.14169 • Published 7 days ago • 13
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated 6 days ago • 90
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer Paper • 2412.07720 • Published 15 days ago • 30
EXAONE-3.5 Collection EXAONE 3.5 language model series including instruction-tuned models of 2.4B, 7.8B, and 32B. • 10 items • Updated 15 days ago • 80
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published 21 days ago • 43
Toxic Commons Collection Tools for de-toxifying public domain data, especially multilingual and historical text data and data with OCR errors. • 3 items • Updated Oct 31 • 5
Common Models Collection The first generation of models pretrained on Common Corpus. • 5 items • Updated 20 days ago • 27
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens Paper • 2411.17691 • Published 29 days ago • 9
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22 • 55
HyenaDNA Models Collection HyenaDNA models usable directly with Hugging Face classes like AutoModel. • 8 items • Updated Nov 14, 2023 • 16
Qwen 2.5 Coder Collection Complete collection of Code-specific model series for Qwen2.5 in bnb 4bit, 16bit and GGUF formats. • 35 items • Updated about 9 hours ago • 20
Qwen2.5-Math Collection Math-specific model series based on Qwen2.5 • 9 items • Updated 27 days ago • 58
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 • 40 items • Updated 27 days ago • 257