Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models Paper • 2408.06663 • Published Aug 13 • 15
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Paper • 2409.02813 • Published 15 days ago • 27
Llama 3.1 Collection This collection hosts the transformers and original repos of the Meta Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Aug 2 • 569
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6 • 30
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations Paper • 2408.08459 • Published Aug 15 • 44
Design Proteins Using Large Language Models: Enhancements and Comparative Analyses Paper • 2408.06396 • Published Aug 12 • 8
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12 • 114
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling Paper • 2408.04810 • Published Aug 9 • 22
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 152
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI Paper • 2408.03361 • Published Aug 6 • 85
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts Paper • 2407.21770 • Published Jul 31 • 20
Gemma 2: Improving Open Language Models at a Practical Size Paper • 2408.00118 • Published Jul 31 • 73
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18 • 52
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism Paper • 2407.10457 • Published Jul 15 • 22
Autoregressive Speech Synthesis without Vector Quantization Paper • 2407.08551 • Published Jul 11 • 13
Learning to (Learn at Test Time): RNNs with Expressive Hidden States Paper • 2407.04620 • Published Jul 5 • 26
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 84
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11 • 55
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published Jun 3 • 42
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training Paper • 2405.15319 • Published May 24 • 24
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published May 24 • 43
You Only Cache Once: Decoder-Decoder Architectures for Language Models Paper • 2405.05254 • Published May 8 • 8
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30 • 73
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published Apr 19 • 38
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 250
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 103
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 63
SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model Paper • 2403.13064 • Published Mar 19 • 31
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? Paper • 2403.14624 • Published Mar 21 • 50
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 123
Gemma: Open Models Based on Gemini Research and Technology Paper • 2403.08295 • Published Mar 13 • 47
DrugAssist: A Large Language Model for Molecule Optimization Paper • 2401.10334 • Published Dec 28, 2023 • 4
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 182
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5 • 56
Learning and Leveraging World Models in Visual Representation Learning Paper • 2403.00504 • Published Mar 1 • 29
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 590
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling Paper • 2402.12226 • Published Feb 19 • 40
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens Paper • 2401.17377 • Published Jan 30 • 34
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 86