Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models Paper • 2405.15574 • Published May 24 • 53
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper • 2405.14333 • Published May 23 • 34
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach Paper • 2405.15613 • Published May 24 • 13
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published May 31 • 63
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models Paper • 2405.20541 • Published May 30 • 20
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model Paper • 2406.04333 • Published Jun 6 • 36
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step Paper • 2406.04314 • Published Jun 6 • 26
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments Paper • 2406.04151 • Published Jun 6 • 17
Mixture-of-Agents Enhances Large Language Model Capabilities Paper • 2406.04692 • Published Jun 7 • 55
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild Paper • 2406.04770 • Published Jun 7 • 26
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10 • 64
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning Paper • 2406.06469 • Published Jun 10 • 23
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization Paper • 2406.05981 • Published Jun 10 • 11
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers Paper • 2406.05370 • Published Jun 8 • 14
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11 • 55
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models Paper • 2406.06563 • Published Jun 3 • 17
MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering Paper • 2406.06573 • Published Jun 3 • 8
PowerInfer-2: Fast Large Language Model Inference on a Smartphone Paper • 2406.06282 • Published Jun 10 • 36
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Paper • 2406.05955 • Published Jun 10 • 22
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12 • 65
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models Paper • 2407.01906 • Published Jul 2 • 34
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3 • 92
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models Paper • 2407.02687 • Published Jul 2 • 22
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems Paper • 2407.01370 • Published Jul 1 • 85
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation Paper • 2407.02371 • Published Jul 2 • 49
Agentless: Demystifying LLM-based Software Engineering Agents Paper • 2407.01489 • Published Jul 1 • 42
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Paper • 2407.02490 • Published Jul 2 • 23
Understanding Alignment in Multimodal LLMs: A Comprehensive Study Paper • 2407.02477 • Published Jul 2 • 21
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper • 2407.01284 • Published Jul 1 • 75
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation Paper • 2407.00468 • Published Jun 29 • 34
ColPali: Efficient Document Retrieval with Vision Language Models Paper • 2407.01449 • Published Jun 27 • 41
RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published Jul 1 • 34
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning Paper • 2407.00782 • Published Jun 30 • 23
Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP Paper • 2407.00402 • Published Jun 29 • 22
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28 • 94
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale Paper • 2406.19280 • Published Jun 27 • 60
Direct Preference Knowledge Distillation for Large Language Models Paper • 2406.19774 • Published Jun 28 • 21
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs Paper • 2406.18629 • Published Jun 26 • 40
Octo-planner: On-device Language Model for Planner-Action Agents Paper • 2406.18082 • Published Jun 26 • 47
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation Paper • 2406.18522 • Published Jun 26 • 40
A Closer Look into Mixture-of-Experts in Large Language Models Paper • 2406.18219 • Published Jun 26 • 15
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 86
LongIns: A Challenging Long-context Instruction-based Exam for LLMs Paper • 2406.17588 • Published Jun 25 • 20
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets Paper • 2406.18518 • Published Jun 26 • 23
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Paper • 2406.16860 • Published Jun 24 • 57
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published Jun 22 • 45
WARP: On the Benefits of Weight Averaged Rewarded Policies Paper • 2406.16768 • Published Jun 24 • 22
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5 • 52
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages Paper • 2407.05975 • Published Jul 8 • 34
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation Paper • 2407.06135 • Published Jul 8 • 20
Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction Paper • 2407.03651 • Published Jul 4 • 15
AgentInstruct: Toward Generative Teaching with Agentic Flows Paper • 2407.03502 • Published Jul 3 • 44
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence Paper • 2407.07061 • Published Jul 9 • 26
BM25S: Orders of magnitude faster lexical search via eager sparse scoring Paper • 2407.03618 • Published Jul 4 • 11
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On Paper • 2407.08348 • Published Jul 11 • 50
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model Paper • 2407.07053 • Published Jul 9 • 41
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients Paper • 2407.08296 • Published Jul 11 • 31
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist Paper • 2407.08733 • Published Jul 11 • 20
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception Paper • 2407.08303 • Published Jul 11 • 17
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models Paper • 2407.09025 • Published Jul 12 • 128
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing Paper • 2407.08770 • Published Jul 11 • 19
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism Paper • 2407.10457 • Published Jul 15 • 22
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper • 2407.11963 • Published Jul 16 • 43
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models Paper • 2407.12327 • Published Jul 17 • 77
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression Paper • 2407.12077 • Published Jul 16 • 54
E5-V: Universal Embeddings with Multimodal Large Language Models Paper • 2407.12580 • Published Jul 17 • 39
Compact Language Models via Pruning and Knowledge Distillation Paper • 2407.14679 • Published Jul 19 • 37
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts Paper • 2407.21770 • Published Jul 31 • 22
Gemma 2: Improving Open Language Models at a Practical Size Paper • 2408.00118 • Published Jul 31 • 73
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities Paper • 2408.00765 • Published Aug 1 • 12
Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning Paper • 2408.00690 • Published Aug 1 • 21
TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization Methods Paper • 2407.21630 • Published Jul 31 • 8
ShieldGemma: Generative AI Content Moderation Based on Gemma Paper • 2407.21772 • Published Jul 31 • 13
A Large Encoder-Decoder Family of Foundation Models For Chemical Language Paper • 2407.20267 • Published Jul 24 • 31
Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework Paper • 2407.20729 • Published Jul 30 • 25
JaColBERTv2.5: Optimising Multi-Vector Retrievers to Create State-of-the-Art Japanese Retrievers with Constrained Resources Paper • 2407.20750 • Published Jul 30 • 21
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain Paper • 2407.19584 • Published Jul 28 • 61
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages Paper • 2407.19672 • Published Jul 29 • 54
Theia: Distilling Diverse Vision Foundation Models for Robot Learning Paper • 2407.20179 • Published Jul 29 • 45
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains Paper • 2407.18961 • Published Jul 18 • 38
Mixture of Nested Experts: Adaptive Processing of Visual Tokens Paper • 2407.19985 • Published Jul 29 • 34
Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning Paper • 2407.18248 • Published Jul 25 • 30
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26 • 31
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents Paper • 2407.17490 • Published Jul 3 • 30
Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? Paper • 2407.16607 • Published Jul 23 • 21
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23 • 68
DDK: Distilling Domain Knowledge for Efficient Large Language Models Paper • 2407.16154 • Published Jul 23 • 20
Longhorn: State Space Models are Amortized Online Learners Paper • 2407.14207 • Published Jul 19 • 16
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency Paper • 2407.17470 • Published Jul 24 • 14
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis Paper • 2407.13301 • Published Jul 18 • 54
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18 • 34
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18 • 52
Understanding Reference Policies in Direct Preference Optimization Paper • 2407.13709 • Published Jul 18 • 16
Fast Inference from Transformers via Speculative Decoding Paper • 2211.17192 • Published Nov 30, 2022 • 4
LLM Augmented LLMs: Expanding Capabilities through Composition Paper • 2401.02412 • Published Jan 4 • 36
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge Paper • 2407.19594 • Published Jul 28 • 19
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher Paper • 2407.20183 • Published Jul 29 • 37
Improving Retrieval Augmented Language Model with Self-Reasoning Paper • 2407.19813 • Published Jul 29 • 6
Adaptive Retrieval-Augmented Generation for Conversational Systems Paper • 2407.21712 • Published Jul 31 • 2
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Paper • 2408.02545 • Published Aug 5 • 33
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks Paper • 2408.03615 • Published Aug 7 • 30
WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models Paper • 2408.03837 • Published Aug 7 • 17
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 154
LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection Paper • 2408.04284 • Published Aug 8 • 22
Better Alignment with Instruction Back-and-Forth Translation Paper • 2408.04614 • Published Aug 8 • 14
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 Paper • 2408.05147 • Published Aug 9 • 37
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Paper • 2408.04840 • Published Aug 9 • 31
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12 • 115
ControlNeXt: Powerful and Efficient Control for Image and Video Generation Paper • 2408.06070 • Published Aug 12 • 52
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Paper • 2408.06072 • Published Aug 12 • 35
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents Paper • 2408.06327 • Published Aug 12 • 15
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13 • 65
OpenResearcher: Unleashing AI for Accelerated Scientific Research Paper • 2408.06941 • Published Aug 13 • 30
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm Paper • 2408.08072 • Published Aug 15 • 31
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search Paper • 2408.08152 • Published Aug 15 • 51
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents Paper • 2408.07199 • Published Aug 13 • 20
LLM Pruning and Distillation in Practice: The Minitron Approach Paper • 2408.11796 • Published Aug 21 • 53
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations Paper • 2408.08459 • Published Aug 15 • 44
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19 • 51
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published Aug 20 • 40
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20 • 56
Controllable Text Generation for Large Language Models: A Survey Paper • 2408.12599 • Published Aug 22 • 62
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 116
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Paper • 2408.15237 • Published Aug 27 • 36
Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts Paper • 2408.15664 • Published Aug 28 • 11
A Web-Based Solution for Federated Learning with LLM-Based Automation Paper • 2408.13010 • Published Aug 23 • 8
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher Paper • 2408.14176 • Published Aug 26 • 59
LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs Paper • 2408.13467 • Published Aug 24 • 23
MobileQuant: Mobile-friendly Quantization for On-device Language Models Paper • 2408.13933 • Published Aug 25 • 13
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published Aug 27 • 138
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline Paper • 2408.15079 • Published Aug 27 • 52
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models Paper • 2408.15518 • Published Aug 28 • 42
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28 • 83
CogVLM2: Visual Language Models for Image and Video Understanding Paper • 2408.16500 • Published Aug 29 • 56
Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation Paper • 2409.03271 • Published Sep 5 • 2
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA Paper • 2409.02897 • Published Sep 4 • 44
Large Language Model-Based Agents for Software Engineering: A Survey Paper • 2409.02977 • Published Sep 4
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners Paper • 2408.16768 • Published Aug 29 • 26
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems Paper • 2408.16293 • Published Aug 29 • 24
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding Paper • 2408.15545 • Published Aug 28 • 34
InkubaLM: A small language model for low-resource African languages Paper • 2408.17024 • Published Aug 30 • 12
LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models Paper • 2409.00509 • Published Aug 31 • 38
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Paper • 2409.02813 • Published Sep 4 • 28
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining Paper • 2409.02326 • Published Sep 3 • 18
FuzzCoder: Byte-level Fuzzing Test via Large Language Model Paper • 2409.01944 • Published Sep 3 • 44
Building Math Agents with Multi-Turn Iterative Preference Learning Paper • 2409.02392 • Published Sep 4 • 14
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data Paper • 2409.03810 • Published Sep 5 • 30
Configurable Foundation Models: Building LLMs from a Modular Perspective Paper • 2409.02877 • Published Sep 4 • 27
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published Sep 4 • 72
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? Paper • 2409.07703 • Published Sep 12 • 66
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources Paper • 2409.08239 • Published Sep 12 • 16
Policy Filtration in RLHF to Fine-Tune LLM for Code Generation Paper • 2409.06957 • Published Sep 11 • 5
A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B Paper • 2409.11055 • Published Sep 17 • 16
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18 • 73
A Controlled Study on Long Context Extension and Generalization in LLMs Paper • 2409.12181 • Published Sep 18 • 43
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18 • 36
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19 • 134
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published Sep 19 • 47
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published Sep 19 • 23
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion Paper • 2409.12957 • Published Sep 19 • 18
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper • 2409.12959 • Published Sep 19 • 36
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization Paper • 2409.12903 • Published Sep 19 • 21
RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning Paper • 2409.14674 • Published Sep 23 • 41
Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking Paper • 2409.15268 • Published Sep 23 • 11
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper • 2409.16191 • Published Sep 24 • 41
OmniBench: Towards The Future of Universal Omni-Language Models Paper • 2409.15272 • Published Sep 23 • 25
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25 • 99
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Paper • 2409.18125 • Published Sep 26 • 33
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Paper • 2409.20566 • Published Sep 30 • 51
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Paper • 2409.16040 • Published Sep 24 • 13
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published Sep 26 • 36
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models Paper • 2409.18943 • Published Sep 27 • 26
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Paper • 2410.00531 • Published Oct 1 • 28
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis Paper • 2409.20059 • Published Sep 30 • 15
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging Paper • 2410.01215 • Published Oct 2 • 30
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents Paper • 2207.01206 • Published Jul 4, 2022 • 2
Upcycling Large Language Models into Mixture of Experts Paper • 2410.07524 • Published about 1 month ago • 3