Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper • 2501.09755 • Published 19 days ago • 33
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision Paper • 2407.06189 • Published Jul 8, 2024 • 26
Open World Object Detection in the Era of Foundation Models Paper • 2312.05745 • Published Dec 10, 2023 • 1
PROB: Probabilistic Objectness for Open World Object Detection Paper • 2212.01424 • Published Dec 2, 2022
VideoAgent: Long-form Video Understanding with Large Language Model as Agent Paper • 2403.10517 • Published Mar 15, 2024 • 33