ams89 commited on
Commit
9a2e6ba
·
verified ·
1 Parent(s): 25b184b

یک ساحال زیبا ```

Browse files
Files changed (1) hide show
  1. README.md +1 -69
README.md CHANGED
@@ -1,71 +1,3 @@
1
  ---
2
- license: apache-2.0
3
- tags:
4
- - vision
5
- datasets:
6
- - imagenet-1k
7
- ---
8
-
9
- # Vision Transformer (base-sized model) pre-trained with MAE
10
-
11
- Vision Transformer (ViT) model pre-trained using the MAE method. It was introduced in the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick and first released in [this repository](https://github.com/facebookresearch/mae).
12
-
13
- Disclaimer: The team releasing MAE did not write a model card for this model so this model card has been written by the Hugging Face team.
14
-
15
- ## Model description
16
-
17
- The Vision Transformer (ViT) is a transformer encoder model (BERT-like). Images are presented to the model as a sequence of fixed-size patches.
18
-
19
- During pre-training, one randomly masks out a high portion (75%) of the image patches. First, the encoder is used to encode the visual patches. Next, a learnable (shared) mask token is added at the positions of the masked patches. The decoder takes the encoded visual patches and mask tokens as input and reconstructs raw pixel values for the masked positions.
20
-
21
- By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder.
22
-
23
- ## Intended uses & limitations
24
-
25
- You can use the raw model for image classification. See the [model hub](https://huggingface.co/models?search=facebook/vit-mae) to look for
26
- fine-tuned versions on a task that interests you.
27
-
28
- ### How to use
29
-
30
- Here is how to use this model:
31
-
32
- ```python
33
- from transformers import AutoImageProcessor, ViTMAEForPreTraining
34
- from PIL import Image
35
- import requests
36
-
37
- url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
38
- image = Image.open(requests.get(url, stream=True).raw)
39
-
40
- processor = AutoImageProcessor.from_pretrained('facebook/vit-mae-base')
41
- model = ViTMAEForPreTraining.from_pretrained('facebook/vit-mae-base')
42
-
43
- inputs = processor(images=image, return_tensors="pt")
44
- outputs = model(**inputs)
45
- loss = outputs.loss
46
- mask = outputs.mask
47
- ids_restore = outputs.ids_restore
48
- ```
49
-
50
- ### BibTeX entry and citation info
51
-
52
- ```bibtex
53
- @article{DBLP:journals/corr/abs-2111-06377,
54
- author = {Kaiming He and
55
- Xinlei Chen and
56
- Saining Xie and
57
- Yanghao Li and
58
- Piotr Doll{\'{a}}r and
59
- Ross B. Girshick},
60
- title = {Masked Autoencoders Are Scalable Vision Learners},
61
- journal = {CoRR},
62
- volume = {abs/2111.06377},
63
- year = {2021},
64
- url = {https://arxiv.org/abs/2111.06377},
65
- eprinttype = {arXiv},
66
- eprint = {2111.06377},
67
- timestamp = {Tue, 16 Nov 2021 12:12:31 +0100},
68
- biburl = {https://dblp.org/rec/journals/corr/abs-2111-06377.bib},
69
- bibsource = {dblp computer science bibliography, https://dblp.org}
70
- }
71
  ```
 
1
  ---
2
+ یک ساحال زیبا
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ```