marinaretik (marinaretikof)

upvoted 6 papers 13 days ago

Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation

Paper • 2409.03718 • Published 14 days ago • 24

Building Math Agents with Multi-Turn Iterative Preference Learning

Paper • 2409.02392 • Published 16 days ago • 14

From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents

Paper • 2409.03512 • Published 14 days ago • 25

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

Paper • 2409.03420 • Published 14 days ago • 23

Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published 14 days ago • 83

Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing

Paper • 2409.01322 • Published 17 days ago • 94

upvoted 8 papers 18 days ago

Towards Realistic Example-based Modeling via 3D Gaussian Stitching

Paper • 2408.15708 • Published 22 days ago • 7

ReMamba: Equip Mamba with Effective Long-Sequence Modeling

Paper • 2408.15496 • Published 23 days ago • 10

Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature

Paper • 2408.15836 • Published 22 days ago • 11

Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models

Paper • 2408.15915 • Published 22 days ago • 19

Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published 23 days ago • 41

BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline

Paper • 2408.15079 • Published 23 days ago • 51

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published 22 days ago • 81

Law of Vision Representation in MLLMs

Paper • 2408.16357 • Published 22 days ago • 92

upvoted 6 papers 22 days ago

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Paper • 2408.13257 • Published 27 days ago • 25

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published 28 days ago • 109

Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation

Paper • 2408.14819 • Published 24 days ago • 18

upvoted 4 papers about 1 month ago

TurboEdit: Instant text-based image editing

Paper • 2408.08332 • Published Aug 14 • 17

Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning

Paper • 2408.07931 • Published Aug 15 • 17

Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15 • 37

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16 • 96

upvoted 3 papers about 2 months ago

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 103

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

Paper • 2408.00754 • Published Aug 1 • 21

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31 • 73

upvoted a collection about 2 months ago

Gemma 2 2B Release

Collection

The 2.6B parameter version of Gemma 2. • 6 items • Updated Jul 31 • 76

upvoted 22 papers about 2 months ago

Very Large-Scale Multi-Agent Simulation in AgentScope

Paper • 2407.17789 • Published Jul 25 • 30

LAMBDA: A Large Model Based Data Agent

Paper • 2407.17535 • Published Jul 24 • 34

DDK: Distilling Domain Knowledge for Efficient Large Language Models

Paper • 2407.16154 • Published Jul 23 • 20

PERSONA: A Reproducible Testbed for Pluralistic Alignment

Paper • 2407.17387 • Published Jul 24 • 17

VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24 • 38

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

Paper • 2407.16741 • Published Jul 23 • 67

INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model

Paper • 2407.16198 • Published Jul 23 • 13

OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person

Paper • 2407.16224 • Published Jul 23 • 23

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

Paper • 2407.16655 • Published Jul 23 • 28

KAN or MLP: A Fairer Comparison

Paper • 2407.16674 • Published Jul 23 • 41

CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis

Paper • 2407.13301 • Published Jul 18 • 55

AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

Paper • 2407.15711 • Published Jul 22 • 9

HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions

Paper • 2407.15187 • Published Jul 21 • 10

LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding

Paper • 2407.15754 • Published Jul 22 • 19

POGEMA: A Benchmark Platform for Cooperative Multi-Agent Navigation

Paper • 2407.14931 • Published Jul 20 • 20

Compact Language Models via Pruning and Knowledge Distillation

Paper • 2407.14679 • Published Jul 19 • 35

Knowledge Mechanisms in Large Language Models: A Survey and Perspective

Paper • 2407.15017 • Published Jul 22 • 33

The Vision of Autonomic Computing: Can LLMs Make It a Reality?

Paper • 2407.14402 • Published Jul 19 • 13

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

Paper • 2407.12594 • Published Jul 17 • 18

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

Paper • 2407.14057 • Published Jul 19 • 41

EVLM: An Efficient Vision-Language Model for Visual Understanding

Paper • 2407.14177 • Published Jul 19 • 42

Internal Consistency and Self-Feedback in Large Language Models: A Survey

Paper • 2407.14507 • Published Jul 19 • 44

upvoted 10 papers 2 months ago

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval

Paper • 2407.12883 • Published Jul 16 • 8

CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

Paper • 2406.13897 • Published May 30 • 11

Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation

Paper • 2407.13481 • Published Jul 18 • 9

Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study

Paper • 2406.07057 • Published Jun 11 • 15

Scaling Granite Code Models to 128K Context

Paper • 2407.13739 • Published Jul 18 • 18

Understanding Reference Policies in Direct Preference Optimization

Paper • 2407.13709 • Published Jul 18 • 15

Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion

Paper • 2407.13759 • Published Jul 18 • 17

Shape of Motion: 4D Reconstruction from a Single Video

Paper • 2407.13764 • Published Jul 18 • 19

Scaling Retrieval-Based Language Models with a Trillion-Token Datastore

Paper • 2407.12854 • Published Jul 9 • 29

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

Paper • 2407.13623 • Published Jul 18 • 52

marinaretikof

AI & ML interests

Organizations

marinaretik's activity