InternImage
Collection
Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
β’
18 items
β’
Updated
β’
1
This repository provides PyTorch-format pretrained weights for different variants of the InternImage model. The weights are intended for use in downstream tasks and fine-tuning on specific datasets.
InternImage is an advanced vision foundation model developed by researchers from Shanghai AI Laboratory, Tsinghua University, and other institutions. Unlike models based on Transformers, InternImage employs DCNv3 as its core operator. This approach equips the model with dynamic and effective receptive fields required for downstream tasks like object detection and segmentation, while enabling adaptive spatial aggregation.
If this work is helpful for your research, please consider citing the following BibTeX entry.
@inproceedings{wang2023internimage,
title={Internimage: Exploring large-scale vision foundation models with deformable convolutions},
author={Wang, Wenhai and Dai, Jifeng and Chen, Zhe and Huang, Zhenhang and Li, Zhiqi and Zhu, Xizhou and Hu, Xiaowei and Lu, Tong and Lu, Lewei and Li, Hongsheng and others},
booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
pages={14408--14419},
year={2023}
}