File size: 1,750 Bytes
13fb07d 01fb063 d4d8170 8fa4ee8 d4d8170 01fb063 d4d8170 01fb063 d4d8170 01fb063 d1cff20 01fb063 8fa4ee8 01fb063 d4d8170 8fa4ee8 01fb063 d4d8170 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
---
license: apache-2.0
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
[GiT: Towards Generalist Vision Transformer through Universal Language Interface](https://arxiv.org/abs/2222.33333)
This repository includes GiT checkpoints, logs, and the pre-trained files used.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
In this project, we introduce GiT (Large Visual Modeling). GiT has the following characteristics:
- ๐ฎ **Minimalist architecture design similar to LLM**: GiT consists solely of a single transformer, without the inclusion of additional vision encoder and adapter.
- ๐ **Covering all types of visual understanding tasks**: GiT addresses a spectrum of visual tasks, including object-level tasks (e.g., objecte detection), pixel-level tasks (e.g., semantic segmentation) and vision-language tasks (e.g., image captioning).
- ๐ค **Achieving task synergy by unified language interface**: Similar to LLM, GiT observes task synergy effect in multi-task training.
- ๐ฅ **SOTA performance on zero-shot and few-shot benchmark**: GiT scales well with model size and data, demonstrating remarkable generalizability across diverse scenarios after trained on 27 datasets.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6585493b53c37507639fe3ba/0-qINMmUF8ugjb2jdsHLa.png)
- **Developed by:** Haiyang Wang ( [email protected] ), Hao Tan
pache license 2.0
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository:** https://github.com/Haiyang-W/GiT
- **Paper:** https://arxiv.org/abs/2222.33333
## Uses
Please refer [here](https://github.com/Haiyang-W/GiT) for more detail about usage. |