kanashi6
/

GiT

Model card Files Files and versions Community

GiT / README.md

kanashi6's picture

Update README.md

968b76b verified 11 months ago

|

history blame contribute delete

1.8 kB

	---
	license: apache-2.0
	---
	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->
	[GiT: Towards Generalist Vision Transformer through Universal Language Interface](https://arxiv.org/abs/2403.09394)

	This repository includes GiT checkpoints, logs, and the pre-trained files used.

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->
	In this project, we introduce GiT (Generalist Vision Transformer). GiT has the following characteristics:

	- 😮 Minimalist architecture design similar to LLM: GiT consists solely of a single transformer, without the inclusion of additional vision encoder and adapter.
	- 🚀 Covering all types of visual understanding tasks: GiT addresses a spectrum of visual tasks, including object-level tasks (e.g., objecte detection), pixel-level tasks (e.g., semantic segmentation) and vision-language tasks (e.g., image captioning).
	- 🤗 Achieving task synergy by unified language interface: Similar to LLM, GiT observes task synergy effect in multi-task training.
	- 🔥 Strong performance on zero-shot and few-shot benchmark: GiT scales well with model size and data, demonstrating remarkable generalizability across diverse scenarios after trained on 27 datasets.

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6585493b53c37507639fe3ba/glLj40VWCFaa0BVi4-_9d.png)



	- Developed by: Haiyang Wang ( [email protected] ), Hao Tang ([email protected])
	- License: Apache license 2.0

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/Haiyang-W/GiT
	- Paper: https://arxiv.org/abs/2403.09394

	## Uses
	Please refer [here](https://github.com/Haiyang-W/GiT) for more detail about usage.