File size: 304 Bytes
f3891c4
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
---
datasets:
- FoundationVision/groma_instruct
language:
- en
pipeline_tag: image-text-to-text
library_name: transformers
---

This repository contains the model of the paper [Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models](https://huggingface.co/papers/2404.13013).