LianghuiZhu commited on
Commit
32054e8
·
verified ·
1 Parent(s): 2f67c2e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -3
README.md CHANGED
@@ -1,3 +1,56 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+
6
+ <br>
7
+
8
+ # Vim Model Card
9
+
10
+ ## Model Details
11
+
12
+ Vision Mamba (Vim) is a generic backbone trained on the ImageNet-1K dataset for vision tasks.
13
+
14
+ - **Developed by:** [HUST](https://english.hust.edu.cn/), [Horizon Robotics](https://en.horizon.cc/), [BAAI](https://www.baai.ac.cn/english.html)
15
+ - **Model type:** A generic vision backbone based on the bidirectional state space model (SSM) architecture.
16
+ - **License:** Non-commercial license
17
+
18
+
19
+ ### Model Sources
20
+
21
+ - **Repository:** https://github.com/hustvl/Vim
22
+ - **Paper:** https://arxiv.org/abs/2401.09417
23
+
24
+ ## Uses
25
+
26
+ The primary use of Vim is research on vision tasks, e.g., classification, segmentation, detection, and instance segmentation, with an SSM-based backbone.
27
+ The primary intended users of the model are researchers and hobbyists in computer vision, machine learning, and artificial intelligence.
28
+
29
+ ## How to Get Started with the Model
30
+
31
+ - You can replace the backbone for vision tasks with the proposed Vim: https://github.com/hustvl/Vim/blob/main/vim/models_mamba.py
32
+ - Then you can load this checkpoint and start training.
33
+
34
+ ## Training Details
35
+
36
+ Vim is pretrained on ImageNet-1K with classification supervision.
37
+ The training data is around 1.3M images from [ImageNet-1K dataset](https://www.image-net.org/challenges/LSVRC/2012/).
38
+ See more details in this [paper](https://arxiv.org/abs/2401.09417).
39
+
40
+ ## Evaluation
41
+
42
+ Vim-base is evaluated on ImageNet-1K val set, and achieves 81.9% Top-1 Acc. See more details in this [paper](https://arxiv.org/abs/2401.09417).
43
+
44
+ ## Additional Information
45
+
46
+ ### Citation Information
47
+
48
+ ```
49
+ @article{vim,
50
+ title={Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model},
51
+ author={Lianghui Zhu and Bencheng Liao and Qian Zhang and Xinlong Wang and Wenyu Liu and Xinggang Wang},
52
+ journal={arXiv preprint arXiv:2401.09417},
53
+ year={2024}
54
+ }
55
+ ```
56
+