datasets: | |
- FoundationVision/groma_instruct | |
language: | |
- en | |
pipeline_tag: image-text-to-text | |
library_name: transformers | |
This repository contains the model of the paper [Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models](https://huggingface.co/papers/2404.13013). |