Ivy Zhang's picture

Ivy Zhang

Ivy1997

·

AI & ML interests

None yet

Recent Activity

liked a model 2 days ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

new activity 3 days ago

BAAI/Infinity-MM:ChartQA，DocVQA，InfoVQA 等明显低于汇报结果

liked a model 7 days ago

Qwen/Qwen2.5-VL-3B-Instruct

View all activity

Organizations

Ivy1997's activity

upvoted a collection 8 days ago

Qwen2-VL

Vision-language model series based on Qwen2 • 16 items • Updated Dec 6, 2024 • 200

upvoted 3 papers 10 days ago

EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion

Paper • 2501.13452 • Published 12 days ago • 7

Temporal Preference Optimization for Long-Form Video Understanding

Paper • 2501.13919 • Published 11 days ago • 21

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published 11 days ago • 22

upvoted a collection 27 days ago

AIMv2

A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated Nov 22, 2024 • 71

upvoted 9 papers about 1 month ago

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Paper • 2412.04626 • Published Dec 5, 2024 • 13

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Paper • 2411.14794 • Published Nov 22, 2024 • 13

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 59

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 23

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Paper • 2411.14199 • Published Nov 21, 2024 • 30

Hymba: A Hybrid-head Architecture for Small Language Models

Paper • 2411.13676 • Published Nov 20, 2024 • 41

Multimodal Autoregressive Pre-training of Large Vision Encoders

Paper • 2411.14402 • Published Nov 21, 2024 • 43

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Paper • 2411.14405 • Published Nov 21, 2024 • 58

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 73

upvoted a paper about 2 months ago

Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines

Paper • 2410.21220 • Published Oct 28, 2024 • 10