MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published 26 days ago • 36
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation Paper • 2406.07686 • Published Jun 11 • 14