VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper β’ 2501.00599 β’ Published 12 days ago β’ 40
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper β’ 2501.00958 β’ Published 11 days ago β’ 91
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper β’ 2410.17243 β’ Published Oct 22, 2024 β’ 89
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio Paper β’ 2410.12787 β’ Published Oct 16, 2024 β’ 31
Running on CPU Upgrade 12.2k π Open LLM Leaderboard Track, rank and evaluate open LLMs and chatbots
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages Paper β’ 2407.19672 β’ Published Jul 29, 2024 β’ 56
view post Post If you're trying to run MoE Mixtral-8x7b under DeepSpeed w/ HF Transformers it's likely to hang on the first forward.The solution is here https://github.com/microsoft/DeepSpeed/pull/4966?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en-US#issuecomment-1989671378and you need deepspeed>=0.13.0Thanks to Masahiro Tanaka for the fix. π 7 7 + Reply
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15, 2024 β’ 171
Audio Dialogues: Dialogues dataset for audio and music understanding Paper β’ 2404.07616 β’ Published Apr 11, 2024 β’ 15
SeaLLMs -- Large Language Models for Southeast Asia Paper β’ 2312.00738 β’ Published Dec 1, 2023 β’ 23
Contrastive Decoding Improves Reasoning in Large Language Models Paper β’ 2309.09117 β’ Published Sep 17, 2023 β’ 37