LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 113
VideoAgent: Long-form Video Understanding with Large Language Model as Agent Paper • 2403.10517 • Published Mar 15, 2024 • 33
HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting Paper • 2402.06149 • Published Feb 9, 2024 • 18