Hao Yu's picture

21 1

Hao Yu

Jerydeak

·

AI & ML interests

NLP

Recent Activity

upvoted a paper 5 days ago

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

upvoted a paper 5 days ago

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

upvoted a paper 5 days ago

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

View all activity

Organizations

Jerydeak's activity

upvoted 3 papers 5 days ago

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

Paper • 2412.20800 • Published 13 days ago • 9

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Paper • 2501.01423 • Published 10 days ago • 34

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

Paper • 2412.21059 • Published 13 days ago • 18

upvoted 8 papers 23 days ago

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

Paper • 2412.08503 • Published Dec 11, 2024 • 8

StyleMaster: Stylize Your Video with Artistic Generation and Translation

Paper • 2412.07744 • Published Dec 10, 2024 • 19

Learning Flow Fields in Attention for Controllable Person Image Generation

Paper • 2412.08486 • Published Dec 11, 2024 • 32

LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations

Paper • 2412.08580 • Published Dec 11, 2024 • 45

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

Paper • 2412.07774 • Published Dec 10, 2024 • 26

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 45

Maya: An Instruction Finetuned Multilingual Multimodal Model

Paper • 2412.07112 • Published Dec 10, 2024 • 26

Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published Dec 9, 2024 • 71

upvoted a paper 25 days ago

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

Paper • 2412.09428 • Published about 1 month ago • 7

upvoted 6 papers about 1 month ago

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Paper • 2412.04626 • Published Dec 5, 2024 • 12

CompCap: Improving Multimodal Large Language Models with Composite Captions

Paper • 2412.05243 • Published Dec 6, 2024 • 18

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

Paper • 2412.05237 • Published Dec 6, 2024 • 47

ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality

Paper • 2412.04062 • Published Dec 5, 2024 • 7

OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Paper • 2412.01169 • Published Dec 2, 2024 • 12

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Paper • 2412.04431 • Published Dec 5, 2024 • 17

upvoted a paper 2 months ago

AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

Paper • 2410.24024 • Published Oct 31, 2024 • 48

upvoted a paper 8 months ago

2BP: 2-Stage Backpropagation

Paper • 2405.18047 • Published May 28, 2024 • 23