shiertier commited on
Commit
a93dfe6
·
verified ·
1 Parent(s): fe89d0e

Upload model

Browse files
Files changed (5) hide show
  1. README.md +42 -0
  2. config.json +53 -0
  3. model.safetensors +3 -0
  4. preprocessor_config.json +18 -0
  5. pytorch_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - vision
5
+ ---
6
+
7
+ # ViTMatte model
8
+
9
+ ViTMatte model trained on Composition-1k. It was introduced in the paper [ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers](https://arxiv.org/abs/2305.15272) by Yao et al. and first released in [this repository](https://github.com/hustvl/ViTMatte).
10
+
11
+ Disclaimer: The team releasing ViTMatte did not write a model card for this model so this model card has been written by the Hugging Face team.
12
+
13
+ ## Model description
14
+
15
+ ViTMatte is a simple approach to image matting, the task of accurately estimating the foreground object in an image. The model consists of a Vision Transformer (ViT) with a lightweight head on top.
16
+
17
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/vitmatte_architecture.png"
18
+ alt="drawing" width="600"/>
19
+
20
+ <small> ViTMatte high-level overview. Taken from the <a href="https://arxiv.org/abs/2305.15272">original paper.</a> </small>
21
+
22
+ ## Intended uses & limitations
23
+
24
+ You can use the raw model for image matting. See the [model hub](https://huggingface.co/models?search=vitmatte) to look for other
25
+ fine-tuned versions that may interest you.
26
+
27
+ ### How to use
28
+
29
+ We refer to the [docs](https://huggingface.co/docs/transformers/main/en/model_doc/vitmatte#transformers.VitMatteForImageMatting.forward.example).
30
+
31
+ ### BibTeX entry and citation info
32
+
33
+ ```bibtex
34
+ @misc{yao2023vitmatte,
35
+ title={ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers},
36
+ author={Jingfeng Yao and Xinggang Wang and Shusheng Yang and Baoyuan Wang},
37
+ year={2023},
38
+ eprint={2305.15272},
39
+ archivePrefix={arXiv},
40
+ primaryClass={cs.CV}
41
+ }
42
+ ```
config.json ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_commit_hash": null,
3
+ "architectures": [
4
+ "VitMatteForImageMatting"
5
+ ],
6
+ "backbone_config": {
7
+ "hidden_size": 384,
8
+ "image_size": 512,
9
+ "model_type": "vitdet",
10
+ "num_attention_heads": 6,
11
+ "num_channels": 4,
12
+ "out_features": [
13
+ "stage12"
14
+ ],
15
+ "out_indices": [
16
+ 12
17
+ ],
18
+ "residual_block_indices": [
19
+ 2,
20
+ 5,
21
+ 8,
22
+ 11
23
+ ],
24
+ "use_relative_position_embeddings": true,
25
+ "window_block_indices": [
26
+ 0,
27
+ 1,
28
+ 3,
29
+ 4,
30
+ 6,
31
+ 7,
32
+ 9,
33
+ 10
34
+ ],
35
+ "window_size": 14
36
+ },
37
+ "convstream_hidden_sizes": [
38
+ 48,
39
+ 96,
40
+ 192
41
+ ],
42
+ "fusion_hidden_sizes": [
43
+ 256,
44
+ 128,
45
+ 64,
46
+ 32
47
+ ],
48
+ "hidden_size": 384,
49
+ "initializer_range": 0.02,
50
+ "model_type": "vitmatte",
51
+ "torch_dtype": "float32",
52
+ "transformers_version": null
53
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bda9289db1bb6762d978b42d1c62ae3f34daf7497171a347a1d09657efd788cb
3
+ size 103294572
preprocessor_config.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_normalize": true,
3
+ "do_pad": true,
4
+ "do_rescale": true,
5
+ "image_mean": [
6
+ 0.5,
7
+ 0.5,
8
+ 0.5
9
+ ],
10
+ "image_processor_type": "VitMatteImageProcessor",
11
+ "image_std": [
12
+ 0.5,
13
+ 0.5,
14
+ 0.5
15
+ ],
16
+ "rescale_factor": 0.00392156862745098,
17
+ "size_divisibility": 32
18
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ec6aed44bc8d8ab7f4d0ff46da3520a534cf5a97a8262404ff6efa9ae33b1e5
3
+ size 103349013