WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Paper • 2410.12705 • Published 24 days ago • 29 • 3
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots Paper • 2406.02523 • Published Jun 4 • 9 • 1
V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation Paper • 2406.02511 • Published Jun 4 • 8 • 2
I4VGen: Image as Stepping Stone for Text-to-Video Generation Paper • 2406.02230 • Published Jun 4 • 15 • 3
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Paper • 2406.02430 • Published Jun 4 • 29 • 2
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs Paper • 2406.02886 • Published Jun 5 • 7 • 1
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM Paper • 2406.02884 • Published Jun 5 • 14 • 2
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning Paper • 2406.03344 • Published Jun 5 • 18 • 1
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration Paper • 2406.01014 • Published Jun 3 • 30 • 2
Block Transformer: Global-to-Local Language Modeling for Fast Inference Paper • 2406.02657 • Published Jun 4 • 36 • 1