TACO Models Collection This collection contains the best-performing TACO models based on LLaMA-3/Qwen2 and SigLIP/CLIP. • 3 items • Updated 23 days ago • 8
Agent Laboratory: Using LLM Agents as Research Assistants Paper • 2501.04227 • Published 5 days ago • 66
TransPixar: Advancing Text-to-Video Generation with Transparency Paper • 2501.03006 • Published 6 days ago • 19
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published 9 days ago • 34
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published Dec 5, 2024 • 59
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published about 1 month ago • 137
YuLan-Mini Collection A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details. • 5 items • Updated 14 days ago • 10
Large Motion Video Autoencoding with Cross-modal Video VAE Paper • 2412.17805 • Published 20 days ago • 24
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published Dec 5, 2024 • 59
LLM-jp-3 Pre-trained Models Collection Pre-trained models in the LLM-jp-3 model series • 4 items • Updated 20 days ago • 5
AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated Nov 22, 2024 • 69
Gemma-APS Release Collection Gemma models for text-to-propositions segmentation. The models are distilled from fine-tuned Gemini Pro model applied to multi-domain synthetic data. • 3 items • Updated about 1 month ago • 20
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper • 2410.02757 • Published Oct 3, 2024 • 36