GiT / README.md
kanashi6's picture
Update README.md
d1cff20 verified
|
raw
history blame
1.75 kB
metadata
license: apache-2.0

Model Card for Model ID

GiT: Towards Generalist Vision Transformer through Universal Language Interface

This repository includes GiT checkpoints, logs, and the pre-trained files used.

Model Details

Model Description

In this project, we introduce GiT (Large Visual Modeling). GiT has the following characteristics:

  • ๐Ÿ˜ฎ Minimalist architecture design similar to LLM: GiT consists solely of a single transformer, without the inclusion of additional vision encoder and adapter.
  • ๐Ÿš€ Covering all types of visual understanding tasks: GiT addresses a spectrum of visual tasks, including object-level tasks (e.g., objecte detection), pixel-level tasks (e.g., semantic segmentation) and vision-language tasks (e.g., image captioning).
  • ๐Ÿค— Achieving task synergy by unified language interface: Similar to LLM, GiT observes task synergy effect in multi-task training.
  • ๐Ÿ”ฅ SOTA performance on zero-shot and few-shot benchmark: GiT scales well with model size and data, demonstrating remarkable generalizability across diverse scenarios after trained on 27 datasets.

image/png

pache license 2.0

Model Sources

Uses

Please refer here for more detail about usage.