view post Post 202 VideoLLaMA 3🔥multimodal foundation models for Image and Video Understanding by DAMO Alibaba Model: DAMO-NLP-SG/videollama3-678cdda9281a0e32fe79af15Paper: VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding (2501.13106)✨ 2B/7B✨ Apache2.0 See translation
view post Post 1285 UI-TARS 🔥 series of native GUI agent models (2B/7B/72B) released by ByteDance, combining perception, reasoning, grounding, and memory into one system. Model: https://huggingface.co/bytedance-researchPaper: UI-TARS: Pioneering Automated GUI Interaction with Native Agents (2501.12326) See translation
Hub 📊 Running 13 🔥 China AI policy research 🤗 Running 11 🏃 Watermark Demo Demo of watermarking with gradio Running on CPU Upgrade 14 📈🚀 Llm Race To The Top View Chatbot Arena ELO of top models increasing Running on CPU Upgrade 145 🔬 Open LLM Progress Tracker
Fun Spaces ✨ Running 234 🏢 3D Arena Running on A10G 443 ✏️ LEDITS Running on A10G 4.72k 🎵 MusicGen Running on Zero 4.93k 👁 IllusionDiffusion Generate stunning high quality illusion artwork