DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Paper • 2412.10302 • Published 12 days ago • 7
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 11 days ago • 131
view article Article Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints May 1 • 69
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14 • 53
view article Article DEMO: French Spoken Language Understanding with the new speech resources from NAVER LABS Europe By mzboito • Aug 28 • 9
view article Article Deep Learning over the Internet: Training Language Models Collaboratively Jul 15, 2021 • 4
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 124