Mahdip72 (Mahdi Pourmirzaei)

upvoted a paper about 17 hours ago

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published 1 day ago • 65

upvoted a paper 9 days ago

Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models

Paper • 2408.06663 • Published Aug 13 • 15

upvoted a paper 10 days ago

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published 15 days ago • 27

upvoted a paper 20 days ago

Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published 24 days ago • 119

upvoted a paper 23 days ago

Scalable Autoregressive Image Generation with Mamba

Paper • 2408.12245 • Published 29 days ago • 22

upvoted a paper 27 days ago

Sapiens: Foundation for Human Vision Models

Paper • 2408.12569 • Published 28 days ago • 84

upvoted a collection 29 days ago

Llama 3.1

Collection

This collection hosts the transformers and original repos of the Meta Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Aug 2 • 569

upvoted 9 papers about 1 month ago

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Paper • 2408.03314 • Published Aug 6 • 30

JPEG-LM: LLMs as Image Generators with Canonical Codec Representations

Paper • 2408.08459 • Published Aug 15 • 44

Design Proteins Using Large Language Models: Enhancements and Comparative Analyses

Paper • 2408.06396 • Published Aug 12 • 8

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

Paper • 2408.03361 • Published Aug 6 • 85

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3 • 74

upvoted 4 papers about 2 months ago

MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Paper • 2407.21770 • Published Jul 31 • 20

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 102

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31 • 73

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 103

upvoted 6 papers 2 months ago

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

Paper • 2407.13623 • Published Jul 18 • 52

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

Paper • 2407.10457 • Published Jul 15 • 22

Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15 • 153

Autoregressive Speech Synthesis without Vector Quantization

Paper • 2407.08551 • Published Jul 11 • 13

Vision language models are blind

Paper • 2407.06581 • Published Jul 9 • 80

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Paper • 2407.04620 • Published Jul 5 • 26

upvoted 2 papers 3 months ago

LiteSearch: Efficacious Tree Search for LLM

Paper • 2407.00320 • Published Jun 29 • 37

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 125

upvoted an article 3 months ago

Article

Welcome Gemma 2 - Google's new open LLM

Jun 27

• 115

upvoted 6 papers 3 months ago

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25 • 84

An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11 • 55

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3 • 42

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Paper • 2405.15319 • Published May 24 • 24

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Paper • 2405.15738 • Published May 24 • 43

Depth Anything V2

Paper • 2406.09414 • Published Jun 13 • 91

upvoted 3 papers 4 months ago

You Only Cache Once: Decoder-Decoder Architectures for Language Models

Paper • 2405.05254 • Published May 8 • 8

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3 • 98

Octopus v4: Graph of language models

Paper • 2404.19296 • Published Apr 30 • 118

upvoted 5 papers 5 months ago

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30 • 73

KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30 • 108

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Paper • 2404.13208 • Published Apr 19 • 38

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 250

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11 • 83

upvoted 9 papers 6 months ago

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 103

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3 • 63

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 51

SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model

Paper • 2403.13064 • Published Mar 19 • 31

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Paper • 2403.14624 • Published Mar 21 • 50

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 123

Gemma: Open Models Based on Gemini Research and Technology

Paper • 2403.08295 • Published Mar 13 • 47

DrugAssist: A Large Language Model for Molecule Optimization

Paper • 2401.10334 • Published Dec 28, 2023 • 4

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 182

upvoted 6 papers 7 months ago

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Paper • 2403.03206 • Published Mar 5 • 56

Learning and Leveraging World Models in Visual Representation Learning

Paper • 2403.00504 • Published Mar 1 • 29

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 590

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Paper • 2402.12226 • Published Feb 19 • 40

Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20 • 94

Grandmaster-Level Chess Without Search

Paper • 2402.04494 • Published Feb 7 • 65

upvoted 2 papers 8 months ago

Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens

Paper • 2401.17377 • Published Jan 30 • 34

Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23 • 86

Mahdi Pourmirzaei

AI & ML interests

Organizations

Mahdip72's activity

Welcome Gemma 2 - Google's new open LLM