Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated 5 days ago • 294
ViDoRe Benchmark Collection Benchmark for document retrieval using visual features, introduced in the ColPali paper. Datasets are using the QA format. • 10 items • Updated 23 days ago • 13
view article Article Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints May 1, 2024 • 72
view article Article seemore: Implement a Vision Language Model from Scratch By AviSoori1x • Jun 23, 2024 • 70