Rui Zhao's picture

Rui Zhao

ruizhaocv

·

https://ruizhaocv.github.io/

AI & ML interests

Multimodal and GenAI

Recent Activity

upvoted a paper about 8 hours ago

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

upvoted a paper 1 day ago

PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding

upvoted a paper 8 days ago

Chain-of-Retrieval Augmented Generation

View all activity

Organizations

ruizhaocv's activity

upvoted a paper about 8 hours ago

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Paper • 2502.01061 • Published 1 day ago • 56

upvoted a paper 1 day ago

PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding

Paper • 2501.16411 • Published 8 days ago • 17

upvoted a paper 8 days ago

Chain-of-Retrieval Augmented Generation

Paper • 2501.14342 • Published 11 days ago • 43

upvoted a paper 11 days ago

Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step

Paper • 2501.13926 • Published 12 days ago • 33

upvoted 2 papers 13 days ago

Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments

Paper • 2501.10893 • Published 17 days ago • 23

Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks

Paper • 2501.11733 • Published 15 days ago • 27

upvoted 3 papers 14 days ago

GameFactory: Creating New Games with Generative Interactive Videos

Paper • 2501.08325 • Published 21 days ago • 61

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Paper • 2501.09781 • Published 19 days ago • 24

Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control

Paper • 2501.03847 • Published 28 days ago • 23

upvoted a paper about 1 month ago

VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models

Paper • 2412.19645 • Published Dec 27, 2024 • 13

upvoted 10 papers about 2 months ago

BrushEdit: All-In-One Image Inpainting and Editing

Paper • 2412.10316 • Published Dec 13, 2024 • 33

Wonderland: Navigating 3D Scenes from a Single Image

Paper • 2412.12091 • Published Dec 16, 2024 • 16

ColorFlow: Retrieval-Augmented Image Sequence Colorization

Paper • 2412.11815 • Published Dec 16, 2024 • 26

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

Paper • 2412.09283 • Published Dec 12, 2024 • 19

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 139

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 90

LoRACLR: Contrastive Adaptation for Customization of Diffusion Models

Paper • 2412.09622 • Published Dec 12, 2024 • 8

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Paper • 2412.09618 • Published Dec 12, 2024 • 21

Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published Dec 11, 2024 • 44

StyleMaster: Stylize Your Video with Artistic Generation and Translation

Paper • 2412.07744 • Published Dec 10, 2024 • 19