Constraint Back-translation Improves Complex Instruction Following of Large Language Models Paper • 2410.24175 • Published 9 days ago • 15
Language Models can Self-Lengthen to Generate Long Texts Paper • 2410.23933 • Published 10 days ago • 15
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Sep 18 • 307
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources Paper • 2409.08239 • Published Sep 12 • 16
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation Paper • 2409.06820 • Published Sep 10 • 62
Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation Paper • 2409.03271 • Published Sep 5 • 2
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients Paper • 2407.08296 • Published Jul 11 • 31
Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian Paper • 2405.13929 • Published May 22 • 52
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding Paper • 2405.08748 • Published May 14 • 19
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping Paper • 2402.14083 • Published Feb 21 • 47
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27 • 44
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Paper • 2403.09629 • Published Mar 14 • 72
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment Paper • 2403.05135 • Published Mar 8 • 42
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8 • 60
DeepSeek-VL: Towards Real-World Vision-Language Understanding Paper • 2403.05525 • Published Mar 8 • 39