Jiaming Han's picture

Jiaming Han

csuhan

·

https://csuhan.com

csuhan

AI & ML interests

Computer Vision

Recent Activity

upvoted a paper 21 days ago

Diffusion Adversarial Post-Training for One-Step Video Generation

upvoted a paper 22 days ago

VideoAuteur: Towards Long Narrative Video Generation

upvoted a paper about 1 month ago

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

View all activity

Organizations

csuhan's activity

upvoted a paper 21 days ago

Diffusion Adversarial Post-Training for One-Step Video Generation

Paper • 2501.08316 • Published 22 days ago • 32

upvoted a paper 22 days ago

VideoAuteur: Towards Long Narrative Video Generation

Paper • 2501.06173 • Published 26 days ago • 31

upvoted a paper about 1 month ago

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

Paper • 2412.18597 • Published Dec 24, 2024 • 19

upvoted a paper 2 months ago

AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

Paper • 2412.02611 • Published Dec 3, 2024 • 23

upvoted 2 papers 3 months ago

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Paper • 2410.13861 • Published Oct 17, 2024 • 53

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Paper • 2410.16268 • Published Oct 21, 2024 • 67

upvoted 2 papers 4 months ago

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3, 2024 • 80

Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant

Paper • 2410.13360 • Published Oct 17, 2024 • 8

upvoted 2 papers about 1 year ago

OneLLM: One Framework to Align All Modalities with Language

Paper • 2312.03700 • Published Dec 6, 2023 • 21

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models

Paper • 2311.07575 • Published Nov 13, 2023 • 14

upvoted a paper over 1 year ago

ImageBind-LLM: Multi-modality Instruction Tuning

Paper • 2309.03905 • Published Sep 7, 2023 • 17