Spaces:
Runtime error
Runtime error
liangfeng
commited on
Commit
·
b92a792
1
Parent(s):
0839f49
clean up
Browse files- CODE_OF_CONDUCT.md +0 -80
- CONTRIBUTING.md +0 -32
- GETTING_STARTED.md +0 -99
- INSTALL.md +0 -33
- LICENSE +0 -399
- README.md +1 -57
- app.py +21 -9
- configs/ovseg_swinB_vitL_bs32_120k.yaml +0 -100
- datasets/DATASETS.md +0 -122
- datasets/prepare_ade20k_full_sem_seg.py +0 -1011
- datasets/prepare_ade20k_sem_seg.py +0 -35
- datasets/prepare_coco_stuff_sem_seg.py +0 -219
- datasets/prepare_pascal_context.py +0 -69
- datasets/prepare_voc_sem_seg.py +0 -71
- open_vocab_seg/.DS_Store +0 -0
- open_vocab_seg/modeling/.DS_Store +0 -0
- open_vocab_seg/modeling/clip_adapter/__init__.py +2 -0
- open_vocab_seg/modeling/clip_adapter/clip/__init__.py +1 -0
- open_vocab_seg/modeling/clip_adapter/clip/bpe_simple_vocab_16e6.txt.gz +3 -0
- open_vocab_seg/modeling/clip_adapter/clip/clip.py +285 -0
- open_vocab_seg/modeling/clip_adapter/clip/model.py +613 -0
- open_vocab_seg/modeling/clip_adapter/clip/simple_tokenizer.py +150 -0
- open_vocab_seg/modeling/clip_adapter/text_template.py +3 -2
- open_vocab_seg/modeling/clip_adapter/utils.py +3 -3
- configs/ovseg_swinB_vitL_demo.yaml → ovseg_swinB_vitL_demo.yaml +1 -1
- requirements.txt +8 -2
- resources/demo_samples/sample_01.jpeg +3 -0
- resources/demo_samples/sample_02.jpeg +3 -0
- tools/convert-pretrained-clip-model-to-d2.py +0 -69
- tools/convert-pretrained-swin-model-to-d2.py +0 -30
- tools/convert-torchvision-to-d2.py +0 -54
- tools/ovseg_replace_clip.py +0 -30
- tools/search_thr_ensemble_w.sh +0 -11
- tools/web_demo.py +0 -76
- train_net.py +0 -309
CODE_OF_CONDUCT.md
DELETED
@@ -1,80 +0,0 @@
|
|
1 |
-
# Code of Conduct
|
2 |
-
|
3 |
-
## Our Pledge
|
4 |
-
|
5 |
-
In the interest of fostering an open and welcoming environment, we as
|
6 |
-
contributors and maintainers pledge to make participation in our project and
|
7 |
-
our community a harassment-free experience for everyone, regardless of age, body
|
8 |
-
size, disability, ethnicity, sex characteristics, gender identity and expression,
|
9 |
-
level of experience, education, socio-economic status, nationality, personal
|
10 |
-
appearance, race, religion, or sexual identity and orientation.
|
11 |
-
|
12 |
-
## Our Standards
|
13 |
-
|
14 |
-
Examples of behavior that contributes to creating a positive environment
|
15 |
-
include:
|
16 |
-
|
17 |
-
* Using welcoming and inclusive language
|
18 |
-
* Being respectful of differing viewpoints and experiences
|
19 |
-
* Gracefully accepting constructive criticism
|
20 |
-
* Focusing on what is best for the community
|
21 |
-
* Showing empathy towards other community members
|
22 |
-
|
23 |
-
Examples of unacceptable behavior by participants include:
|
24 |
-
|
25 |
-
* The use of sexualized language or imagery and unwelcome sexual attention or
|
26 |
-
advances
|
27 |
-
* Trolling, insulting/derogatory comments, and personal or political attacks
|
28 |
-
* Public or private harassment
|
29 |
-
* Publishing others' private information, such as a physical or electronic
|
30 |
-
address, without explicit permission
|
31 |
-
* Other conduct which could reasonably be considered inappropriate in a
|
32 |
-
professional setting
|
33 |
-
|
34 |
-
## Our Responsibilities
|
35 |
-
|
36 |
-
Project maintainers are responsible for clarifying the standards of acceptable
|
37 |
-
behavior and are expected to take appropriate and fair corrective action in
|
38 |
-
response to any instances of unacceptable behavior.
|
39 |
-
|
40 |
-
Project maintainers have the right and responsibility to remove, edit, or
|
41 |
-
reject comments, commits, code, wiki edits, issues, and other contributions
|
42 |
-
that are not aligned to this Code of Conduct, or to ban temporarily or
|
43 |
-
permanently any contributor for other behaviors that they deem inappropriate,
|
44 |
-
threatening, offensive, or harmful.
|
45 |
-
|
46 |
-
## Scope
|
47 |
-
|
48 |
-
This Code of Conduct applies within all project spaces, and it also applies when
|
49 |
-
an individual is representing the project or its community in public spaces.
|
50 |
-
Examples of representing a project or community include using an official
|
51 |
-
project e-mail address, posting via an official social media account, or acting
|
52 |
-
as an appointed representative at an online or offline event. Representation of
|
53 |
-
a project may be further defined and clarified by project maintainers.
|
54 |
-
|
55 |
-
This Code of Conduct also applies outside the project spaces when there is a
|
56 |
-
reasonable belief that an individual's behavior may have a negative impact on
|
57 |
-
the project or its community.
|
58 |
-
|
59 |
-
## Enforcement
|
60 |
-
|
61 |
-
Instances of abusive, harassing, or otherwise unacceptable behavior may be
|
62 |
-
reported by contacting the project team at <[email protected]>. All
|
63 |
-
complaints will be reviewed and investigated and will result in a response that
|
64 |
-
is deemed necessary and appropriate to the circumstances. The project team is
|
65 |
-
obligated to maintain confidentiality with regard to the reporter of an incident.
|
66 |
-
Further details of specific enforcement policies may be posted separately.
|
67 |
-
|
68 |
-
Project maintainers who do not follow or enforce the Code of Conduct in good
|
69 |
-
faith may face temporary or permanent repercussions as determined by other
|
70 |
-
members of the project's leadership.
|
71 |
-
|
72 |
-
## Attribution
|
73 |
-
|
74 |
-
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
|
75 |
-
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
|
76 |
-
|
77 |
-
[homepage]: https://www.contributor-covenant.org
|
78 |
-
|
79 |
-
For answers to common questions about this code of conduct, see
|
80 |
-
https://www.contributor-covenant.org/faq
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CONTRIBUTING.md
DELETED
@@ -1,32 +0,0 @@
|
|
1 |
-
# Contributing to OVSeg
|
2 |
-
We want to make contributing to this project as easy and transparent as
|
3 |
-
possible.
|
4 |
-
|
5 |
-
## Pull Requests
|
6 |
-
We actively welcome your pull requests.
|
7 |
-
|
8 |
-
1. Fork the repo and create your branch from `main`.
|
9 |
-
2. If you've added code that should be tested, add tests.
|
10 |
-
3. If you've changed APIs, update the documentation.
|
11 |
-
4. Ensure the test suite passes.
|
12 |
-
5. Make sure your code lints.
|
13 |
-
6. If you haven't already, complete the Contributor License Agreement ("CLA").
|
14 |
-
|
15 |
-
## Contributor License Agreement ("CLA")
|
16 |
-
In order to accept your pull request, we need you to submit a CLA. You only need
|
17 |
-
to do this once to work on any of Meta's open source projects.
|
18 |
-
|
19 |
-
Complete your CLA here: <https://code.facebook.com/cla>
|
20 |
-
|
21 |
-
## Issues
|
22 |
-
We use GitHub issues to track public bugs. Please ensure your description is
|
23 |
-
clear and has sufficient instructions to be able to reproduce the issue.
|
24 |
-
|
25 |
-
Meta has a [bounty program](https://www.facebook.com/whitehat/) for the safe
|
26 |
-
disclosure of security bugs. In those cases, please go through the process
|
27 |
-
outlined on that page and do not file a public issue.
|
28 |
-
|
29 |
-
|
30 |
-
## License
|
31 |
-
By contributing to OVSeg, you agree that your contributions will be licensed
|
32 |
-
under the LICENSE file in the root directory of this source tree.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
GETTING_STARTED.md
DELETED
@@ -1,99 +0,0 @@
|
|
1 |
-
## Getting started with OVSeg
|
2 |
-
|
3 |
-
|
4 |
-
### Try demo
|
5 |
-
|
6 |
-
We release our largest model (Swin-Base + CLIP-ViT-L/14) [ovseg_swinbase_vitL14_ft_mpt.pth](https://drive.google.com/file/d/1cn-ohxgXDrDfkzC1QdO-fi8IjbjXmgKy/view?usp=sharing) (md5: <tt>526080</tt>).
|
7 |
-
|
8 |
-
- Test on sample image
|
9 |
-
```bash
|
10 |
-
python demo.py --config-file configs/ovseg_swinB_vitL_demo.yaml --class-names 'Oculus' 'Ukulele' --input ./resources/demo_samples/sample_03.jpeg --output ./pred --opts MODEL.WEIGHTS #PATH_of_ovseg_swinbase_vitL14_ft_mpt.pth
|
11 |
-
```
|
12 |
-
|
13 |
-
### Evaluation with pre-trained weights
|
14 |
-
|
15 |
-
We release our largest model (Swin-Base + CLIP-ViT-L/14) [ovseg_swinbase_vitL14_ft_mpt.pth](https://drive.google.com/file/d/1cn-ohxgXDrDfkzC1QdO-fi8IjbjXmgKy/view?usp=sharing) (md5: <tt>526080</tt>).
|
16 |
-
|
17 |
-
- Test on ADE20K-150 and ADE-847
|
18 |
-
```bash
|
19 |
-
python train_net.py --num-gpu 8 --eval-only --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.WEIGHTS #PATH_of_ovseg_swinbase_vitL14_ft_mpt.pth DATASETS.TEST \(\"ade20k_sem_seg_val\",\"ade20k_full_sem_seg_val\"\)
|
20 |
-
```
|
21 |
-
|
22 |
-
- Test on PascalContext-59 and PascalContext-459
|
23 |
-
```bash
|
24 |
-
python train_net.py --num-gpu 8 --eval-only --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.WEIGHTS #PATH_of_ovseg_swinbase_vitL14_ft_mpt.pth MODEL.CLIP_ADAPTER.CLIP_ENSEMBLE_WEIGHT 0.6 DATASETS.TEST \(\"pascal_context_59_sem_seg_val\",\"pascal_context_459_sem_seg_val\",\)
|
25 |
-
```
|
26 |
-
|
27 |
-
- Test on PascalVOC-20
|
28 |
-
```bash
|
29 |
-
python train_net.py --num-gpu 8 --eval-only --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.WEIGHTS #PATH_of_ovseg_swinbase_vitL14_ft_mpt.pth MODEL.CLIP_ADAPTER.CLIP_ENSEMBLE_WEIGHT 0.45 DATASETS.TEST \(\"pascalvoc20_sem_seg_val\",\)
|
30 |
-
```
|
31 |
-
|
32 |
-
#### Performance benchmark
|
33 |
-
|
34 |
-
| method | backbone | training dataset | A-847 | PC-459 | A-150 | PC-59 | PAS-20 |
|
35 |
-
|------------------------------------|----------|------------------|:-----:|:------:|:-----:|:-----:|:------:|
|
36 |
-
| Open-vocabulary generalist models. | | | | | | | |
|
37 |
-
| SPNet | R-101 | PASCAL-15 | - | - | - | 24.3 | 18.3 |
|
38 |
-
| ZS3Net | R-101 | PASCAL-15 | - | - | - | 19.4 | 38.3 |
|
39 |
-
| LSeg | R-101 | PASCAL-15 | - | - | - | - | 47.4 |
|
40 |
-
| LSeg+ | R-101 | COCO Panoptic | 2.5 | 5.2 | 13.0 | 36.0 | 59.0 |
|
41 |
-
| SimBaseline | R-101c | COCO-Stuff-156 | - | - | 15.3 | - | 74.5 |
|
42 |
-
| ZegFormer | R-50 | COCO-Stuff-156 | - | - | 16.4 | - | 80.7 |
|
43 |
-
| OpenSeg | R-101 | COCO Panoptic | 4.0 | 6.5 | 15.3 | 36.9 | 60.0 |
|
44 |
-
| OVSeg (Ours) | R-101c | COCO-Stuff-171 | 7.1 | 11.0 | 24.8 | 53.3 | 92.6 |
|
45 |
-
| LSeg+ | Eff-B7 | COCO Panoptic | 3.8 | 7.8 | 18.0 | 46.5 | - |
|
46 |
-
| OpenSeg | Eff-B7 | COCO Panoptic | 6.3 | 9.0 | 21.1 | 42.1 | - |
|
47 |
-
| OVSeg (Ours) | Swin-B | COCO-Stuff-171 | 9.0 | 12.4 | 29.6 | 55.7 | 94.5 |
|
48 |
-
| Supervised specialist models. | | | | | | | |
|
49 |
-
| FCN | FCN-8s | Same as test | - | - | 29.4 | 37.8 | - |
|
50 |
-
| Deeplab | R-101 | Same as test | - | - | - | 45.7 | 77.7 |
|
51 |
-
| SelfTrain | Eff-L2 | Same as test | - | - | - | - | 90.0 |
|
52 |
-
|
53 |
-
#### Ablation study
|
54 |
-
|
55 |
-
- Mask prompt tuning can bring significant improvement without changing CLIP weights (Table 3 in [paper](https://arxiv.org/pdf/2210.04150.pdf))
|
56 |
-
|
57 |
-
Download the checkpoint with mpt only [ovseg_swinbase_vitL14_mpt_only.pt](https://drive.google.com/file/d/1LJGWFjHw76OGDNy9r9KQIaACfIm9KMhQ/view?usp=sharing) (md5: <tt>2dd495</tt>).
|
58 |
-
|
59 |
-
```bash
|
60 |
-
python train_net.py --num-gpu 8 --eval-only --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.WEIGHTS #PATH_of_ovseg_swinbase_vitL14_mpt_only.pt DATASETS.TEST \(\"ade20k_sem_seg_val\",\"ade20k_full_sem_seg_val\"\)
|
61 |
-
```
|
62 |
-
|
63 |
-
- Mask prompt tuning can improve over fully finetuned model (Table 3 in [paper](https://arxiv.org/pdf/2210.04150.pdf))
|
64 |
-
|
65 |
-
With the same [ovseg_swinbase_vitL14_ft_mpt.pth](https://drive.google.com/file/d/1cn-ohxgXDrDfkzC1QdO-fi8IjbjXmgKy/view?usp=sharing) checkpoint, set `MASK_PROMPT_FWD` as `False`
|
66 |
-
|
67 |
-
```bash
|
68 |
-
python train_net.py --num-gpu 8 --eval-only --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.CLIP_ADAPTER.MASK_PROMPT_FWD False MODEL.WEIGHTS #PATH_of_ovseg_swinbase_vitL14_ft_mpt.pth DATASETS.TEST \(\"ade20k_sem_seg_val\",\"ade20k_full_sem_seg_val\"\)
|
69 |
-
```
|
70 |
-
|
71 |
-
- The effects of class prediction ensemble (Table 6 in [paper](https://arxiv.org/pdf/2210.04150.pdf))
|
72 |
-
|
73 |
-
With the same [ovseg_swinbase_vitL14_ft_mpt.pth](https://drive.google.com/file/d/1cn-ohxgXDrDfkzC1QdO-fi8IjbjXmgKy/view?usp=sharing) checkpoint, set `CLIP_ENSEMBLE` as `False`.
|
74 |
-
|
75 |
-
```bash
|
76 |
-
python train_net.py --num-gpu 8 --eval-only --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.CLIP_ADAPTER.CLIP_ENSEMBLE False MODEL.WEIGHTS #PATH_of_ovseg_swinbase_vitL14_ft_mpt.pth DATASETS.TEST \(\"ade20k_sem_seg_val\",\"ade20k_full_sem_seg_val\"\)
|
77 |
-
```
|
78 |
-
|
79 |
-
### Training Segmentation model
|
80 |
-
|
81 |
-
Our model is trained on COCO-Stuff
|
82 |
-
|
83 |
-
- Training baseline w/ original CLIP
|
84 |
-
```
|
85 |
-
python train_net.py --num-gpu 8 --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.CLIP_ADAPTER.MASK_PROMPT_FWD False
|
86 |
-
```
|
87 |
-
|
88 |
-
To reproduce our final results, you may want to use the our mask-adapted CLIP
|
89 |
-
|
90 |
-
- Training ovseg w/ mask-adapted CLIP
|
91 |
-
```
|
92 |
-
python train_net.py --num-gpu 8 --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.CLIP_ADAPTER.CLIP_MODEL_NAME #PATH_TO_MASKADAPTED_CLIP
|
93 |
-
```
|
94 |
-
|
95 |
-
CAUTION: The final results is sensitive to the ensemble (appendix A.5 in [paper](https://arxiv.org/pdf/2210.04150.pdf)). Thus, you may want to use the ```tools/search_thr_ensemble_w.sh``` to find the best ensemble hyper-parameters.
|
96 |
-
|
97 |
-
### Fine-tuning CLIP with collected mask-category pairs
|
98 |
-
|
99 |
-
We are still working on this part, stay tuned!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
INSTALL.md
DELETED
@@ -1,33 +0,0 @@
|
|
1 |
-
## Installation
|
2 |
-
|
3 |
-
### Requirements
|
4 |
-
- Linux with Python ≥ 3.6
|
5 |
-
- PyTorch ≥ 1.8 and [torchvision](https://github.com/pytorch/vision/) that matches the PyTorch installation.
|
6 |
-
Install them together at [pytorch.org](https://pytorch.org) to make sure of this. Note, please check
|
7 |
-
PyTorch version matches that is required by Detectron2.
|
8 |
-
- Detectron2: follow [Detectron2 installation instructions](https://detectron2.readthedocs.io/tutorials/install.html).
|
9 |
-
|
10 |
-
### Usage
|
11 |
-
|
12 |
-
Install required packages.
|
13 |
-
|
14 |
-
```bash
|
15 |
-
conda create --name ovseg python=3.8
|
16 |
-
conda activate ovseg
|
17 |
-
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
|
18 |
-
pip install -r requirements.txt
|
19 |
-
```
|
20 |
-
|
21 |
-
You need to download `detectron2==0.6` following [instructions](https://detectron2.readthedocs.io/en/latest/tutorials/install.html)
|
22 |
-
|
23 |
-
```bash
|
24 |
-
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html
|
25 |
-
```
|
26 |
-
|
27 |
-
|
28 |
-
FurtherMore, install the modified clip package.
|
29 |
-
|
30 |
-
```bash
|
31 |
-
cd third_party/CLIP
|
32 |
-
python -m pip install -Ue .
|
33 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
LICENSE
DELETED
@@ -1,399 +0,0 @@
|
|
1 |
-
Attribution-NonCommercial 4.0 International
|
2 |
-
|
3 |
-
=======================================================================
|
4 |
-
|
5 |
-
Creative Commons Corporation ("Creative Commons") is not a law firm and
|
6 |
-
does not provide legal services or legal advice. Distribution of
|
7 |
-
Creative Commons public licenses does not create a lawyer-client or
|
8 |
-
other relationship. Creative Commons makes its licenses and related
|
9 |
-
information available on an "as-is" basis. Creative Commons gives no
|
10 |
-
warranties regarding its licenses, any material licensed under their
|
11 |
-
terms and conditions, or any related information. Creative Commons
|
12 |
-
disclaims all liability for damages resulting from their use to the
|
13 |
-
fullest extent possible.
|
14 |
-
|
15 |
-
Using Creative Commons Public Licenses
|
16 |
-
|
17 |
-
Creative Commons public licenses provide a standard set of terms and
|
18 |
-
conditions that creators and other rights holders may use to share
|
19 |
-
original works of authorship and other material subject to copyright
|
20 |
-
and certain other rights specified in the public license below. The
|
21 |
-
following considerations are for informational purposes only, are not
|
22 |
-
exhaustive, and do not form part of our licenses.
|
23 |
-
|
24 |
-
Considerations for licensors: Our public licenses are
|
25 |
-
intended for use by those authorized to give the public
|
26 |
-
permission to use material in ways otherwise restricted by
|
27 |
-
copyright and certain other rights. Our licenses are
|
28 |
-
irrevocable. Licensors should read and understand the terms
|
29 |
-
and conditions of the license they choose before applying it.
|
30 |
-
Licensors should also secure all rights necessary before
|
31 |
-
applying our licenses so that the public can reuse the
|
32 |
-
material as expected. Licensors should clearly mark any
|
33 |
-
material not subject to the license. This includes other CC-
|
34 |
-
licensed material, or material used under an exception or
|
35 |
-
limitation to copyright. More considerations for licensors:
|
36 |
-
wiki.creativecommons.org/Considerations_for_licensors
|
37 |
-
|
38 |
-
Considerations for the public: By using one of our public
|
39 |
-
licenses, a licensor grants the public permission to use the
|
40 |
-
licensed material under specified terms and conditions. If
|
41 |
-
the licensor's permission is not necessary for any reason--for
|
42 |
-
example, because of any applicable exception or limitation to
|
43 |
-
copyright--then that use is not regulated by the license. Our
|
44 |
-
licenses grant only permissions under copyright and certain
|
45 |
-
other rights that a licensor has authority to grant. Use of
|
46 |
-
the licensed material may still be restricted for other
|
47 |
-
reasons, including because others have copyright or other
|
48 |
-
rights in the material. A licensor may make special requests,
|
49 |
-
such as asking that all changes be marked or described.
|
50 |
-
Although not required by our licenses, you are encouraged to
|
51 |
-
respect those requests where reasonable. More_considerations
|
52 |
-
for the public:
|
53 |
-
wiki.creativecommons.org/Considerations_for_licensees
|
54 |
-
|
55 |
-
=======================================================================
|
56 |
-
|
57 |
-
Creative Commons Attribution-NonCommercial 4.0 International Public
|
58 |
-
License
|
59 |
-
|
60 |
-
By exercising the Licensed Rights (defined below), You accept and agree
|
61 |
-
to be bound by the terms and conditions of this Creative Commons
|
62 |
-
Attribution-NonCommercial 4.0 International Public License ("Public
|
63 |
-
License"). To the extent this Public License may be interpreted as a
|
64 |
-
contract, You are granted the Licensed Rights in consideration of Your
|
65 |
-
acceptance of these terms and conditions, and the Licensor grants You
|
66 |
-
such rights in consideration of benefits the Licensor receives from
|
67 |
-
making the Licensed Material available under these terms and
|
68 |
-
conditions.
|
69 |
-
|
70 |
-
Section 1 -- Definitions.
|
71 |
-
|
72 |
-
a. Adapted Material means material subject to Copyright and Similar
|
73 |
-
Rights that is derived from or based upon the Licensed Material
|
74 |
-
and in which the Licensed Material is translated, altered,
|
75 |
-
arranged, transformed, or otherwise modified in a manner requiring
|
76 |
-
permission under the Copyright and Similar Rights held by the
|
77 |
-
Licensor. For purposes of this Public License, where the Licensed
|
78 |
-
Material is a musical work, performance, or sound recording,
|
79 |
-
Adapted Material is always produced where the Licensed Material is
|
80 |
-
synched in timed relation with a moving image.
|
81 |
-
|
82 |
-
b. Adapter's License means the license You apply to Your Copyright
|
83 |
-
and Similar Rights in Your contributions to Adapted Material in
|
84 |
-
accordance with the terms and conditions of this Public License.
|
85 |
-
|
86 |
-
c. Copyright and Similar Rights means copyright and/or similar rights
|
87 |
-
closely related to copyright including, without limitation,
|
88 |
-
performance, broadcast, sound recording, and Sui Generis Database
|
89 |
-
Rights, without regard to how the rights are labeled or
|
90 |
-
categorized. For purposes of this Public License, the rights
|
91 |
-
specified in Section 2(b)(1)-(2) are not Copyright and Similar
|
92 |
-
Rights.
|
93 |
-
d. Effective Technological Measures means those measures that, in the
|
94 |
-
absence of proper authority, may not be circumvented under laws
|
95 |
-
fulfilling obligations under Article 11 of the WIPO Copyright
|
96 |
-
Treaty adopted on December 20, 1996, and/or similar international
|
97 |
-
agreements.
|
98 |
-
|
99 |
-
e. Exceptions and Limitations means fair use, fair dealing, and/or
|
100 |
-
any other exception or limitation to Copyright and Similar Rights
|
101 |
-
that applies to Your use of the Licensed Material.
|
102 |
-
|
103 |
-
f. Licensed Material means the artistic or literary work, database,
|
104 |
-
or other material to which the Licensor applied this Public
|
105 |
-
License.
|
106 |
-
|
107 |
-
g. Licensed Rights means the rights granted to You subject to the
|
108 |
-
terms and conditions of this Public License, which are limited to
|
109 |
-
all Copyright and Similar Rights that apply to Your use of the
|
110 |
-
Licensed Material and that the Licensor has authority to license.
|
111 |
-
|
112 |
-
h. Licensor means the individual(s) or entity(ies) granting rights
|
113 |
-
under this Public License.
|
114 |
-
|
115 |
-
i. NonCommercial means not primarily intended for or directed towards
|
116 |
-
commercial advantage or monetary compensation. For purposes of
|
117 |
-
this Public License, the exchange of the Licensed Material for
|
118 |
-
other material subject to Copyright and Similar Rights by digital
|
119 |
-
file-sharing or similar means is NonCommercial provided there is
|
120 |
-
no payment of monetary compensation in connection with the
|
121 |
-
exchange.
|
122 |
-
|
123 |
-
j. Share means to provide material to the public by any means or
|
124 |
-
process that requires permission under the Licensed Rights, such
|
125 |
-
as reproduction, public display, public performance, distribution,
|
126 |
-
dissemination, communication, or importation, and to make material
|
127 |
-
available to the public including in ways that members of the
|
128 |
-
public may access the material from a place and at a time
|
129 |
-
individually chosen by them.
|
130 |
-
|
131 |
-
k. Sui Generis Database Rights means rights other than copyright
|
132 |
-
resulting from Directive 96/9/EC of the European Parliament and of
|
133 |
-
the Council of 11 March 1996 on the legal protection of databases,
|
134 |
-
as amended and/or succeeded, as well as other essentially
|
135 |
-
equivalent rights anywhere in the world.
|
136 |
-
|
137 |
-
l. You means the individual or entity exercising the Licensed Rights
|
138 |
-
under this Public License. Your has a corresponding meaning.
|
139 |
-
|
140 |
-
Section 2 -- Scope.
|
141 |
-
|
142 |
-
a. License grant.
|
143 |
-
|
144 |
-
1. Subject to the terms and conditions of this Public License,
|
145 |
-
the Licensor hereby grants You a worldwide, royalty-free,
|
146 |
-
non-sublicensable, non-exclusive, irrevocable license to
|
147 |
-
exercise the Licensed Rights in the Licensed Material to:
|
148 |
-
|
149 |
-
a. reproduce and Share the Licensed Material, in whole or
|
150 |
-
in part, for NonCommercial purposes only; and
|
151 |
-
|
152 |
-
b. produce, reproduce, and Share Adapted Material for
|
153 |
-
NonCommercial purposes only.
|
154 |
-
|
155 |
-
2. Exceptions and Limitations. For the avoidance of doubt, where
|
156 |
-
Exceptions and Limitations apply to Your use, this Public
|
157 |
-
License does not apply, and You do not need to comply with
|
158 |
-
its terms and conditions.
|
159 |
-
|
160 |
-
3. Term. The term of this Public License is specified in Section
|
161 |
-
6(a).
|
162 |
-
|
163 |
-
4. Media and formats; technical modifications allowed. The
|
164 |
-
Licensor authorizes You to exercise the Licensed Rights in
|
165 |
-
all media and formats whether now known or hereafter created,
|
166 |
-
and to make technical modifications necessary to do so. The
|
167 |
-
Licensor waives and/or agrees not to assert any right or
|
168 |
-
authority to forbid You from making technical modifications
|
169 |
-
necessary to exercise the Licensed Rights, including
|
170 |
-
technical modifications necessary to circumvent Effective
|
171 |
-
Technological Measures. For purposes of this Public License,
|
172 |
-
simply making modifications authorized by this Section 2(a)
|
173 |
-
(4) never produces Adapted Material.
|
174 |
-
|
175 |
-
5. Downstream recipients.
|
176 |
-
|
177 |
-
a. Offer from the Licensor -- Licensed Material. Every
|
178 |
-
recipient of the Licensed Material automatically
|
179 |
-
receives an offer from the Licensor to exercise the
|
180 |
-
Licensed Rights under the terms and conditions of this
|
181 |
-
Public License.
|
182 |
-
|
183 |
-
b. No downstream restrictions. You may not offer or impose
|
184 |
-
any additional or different terms or conditions on, or
|
185 |
-
apply any Effective Technological Measures to, the
|
186 |
-
Licensed Material if doing so restricts exercise of the
|
187 |
-
Licensed Rights by any recipient of the Licensed
|
188 |
-
Material.
|
189 |
-
|
190 |
-
6. No endorsement. Nothing in this Public License constitutes or
|
191 |
-
may be construed as permission to assert or imply that You
|
192 |
-
are, or that Your use of the Licensed Material is, connected
|
193 |
-
with, or sponsored, endorsed, or granted official status by,
|
194 |
-
the Licensor or others designated to receive attribution as
|
195 |
-
provided in Section 3(a)(1)(A)(i).
|
196 |
-
|
197 |
-
b. Other rights.
|
198 |
-
|
199 |
-
1. Moral rights, such as the right of integrity, are not
|
200 |
-
licensed under this Public License, nor are publicity,
|
201 |
-
privacy, and/or other similar personality rights; however, to
|
202 |
-
the extent possible, the Licensor waives and/or agrees not to
|
203 |
-
assert any such rights held by the Licensor to the limited
|
204 |
-
extent necessary to allow You to exercise the Licensed
|
205 |
-
Rights, but not otherwise.
|
206 |
-
|
207 |
-
2. Patent and trademark rights are not licensed under this
|
208 |
-
Public License.
|
209 |
-
|
210 |
-
3. To the extent possible, the Licensor waives any right to
|
211 |
-
collect royalties from You for the exercise of the Licensed
|
212 |
-
Rights, whether directly or through a collecting society
|
213 |
-
under any voluntary or waivable statutory or compulsory
|
214 |
-
licensing scheme. In all other cases the Licensor expressly
|
215 |
-
reserves any right to collect such royalties, including when
|
216 |
-
the Licensed Material is used other than for NonCommercial
|
217 |
-
purposes.
|
218 |
-
|
219 |
-
Section 3 -- License Conditions.
|
220 |
-
|
221 |
-
Your exercise of the Licensed Rights is expressly made subject to the
|
222 |
-
following conditions.
|
223 |
-
|
224 |
-
a. Attribution.
|
225 |
-
|
226 |
-
1. If You Share the Licensed Material (including in modified
|
227 |
-
form), You must:
|
228 |
-
|
229 |
-
a. retain the following if it is supplied by the Licensor
|
230 |
-
with the Licensed Material:
|
231 |
-
|
232 |
-
i. identification of the creator(s) of the Licensed
|
233 |
-
Material and any others designated to receive
|
234 |
-
attribution, in any reasonable manner requested by
|
235 |
-
the Licensor (including by pseudonym if
|
236 |
-
designated);
|
237 |
-
|
238 |
-
ii. a copyright notice;
|
239 |
-
|
240 |
-
iii. a notice that refers to this Public License;
|
241 |
-
|
242 |
-
iv. a notice that refers to the disclaimer of
|
243 |
-
warranties;
|
244 |
-
|
245 |
-
v. a URI or hyperlink to the Licensed Material to the
|
246 |
-
extent reasonably practicable;
|
247 |
-
|
248 |
-
b. indicate if You modified the Licensed Material and
|
249 |
-
retain an indication of any previous modifications; and
|
250 |
-
|
251 |
-
c. indicate the Licensed Material is licensed under this
|
252 |
-
Public License, and include the text of, or the URI or
|
253 |
-
hyperlink to, this Public License.
|
254 |
-
|
255 |
-
2. You may satisfy the conditions in Section 3(a)(1) in any
|
256 |
-
reasonable manner based on the medium, means, and context in
|
257 |
-
which You Share the Licensed Material. For example, it may be
|
258 |
-
reasonable to satisfy the conditions by providing a URI or
|
259 |
-
hyperlink to a resource that includes the required
|
260 |
-
information.
|
261 |
-
|
262 |
-
3. If requested by the Licensor, You must remove any of the
|
263 |
-
information required by Section 3(a)(1)(A) to the extent
|
264 |
-
reasonably practicable.
|
265 |
-
|
266 |
-
4. If You Share Adapted Material You produce, the Adapter's
|
267 |
-
License You apply must not prevent recipients of the Adapted
|
268 |
-
Material from complying with this Public License.
|
269 |
-
|
270 |
-
Section 4 -- Sui Generis Database Rights.
|
271 |
-
|
272 |
-
Where the Licensed Rights include Sui Generis Database Rights that
|
273 |
-
apply to Your use of the Licensed Material:
|
274 |
-
|
275 |
-
a. for the avoidance of doubt, Section 2(a)(1) grants You the right
|
276 |
-
to extract, reuse, reproduce, and Share all or a substantial
|
277 |
-
portion of the contents of the database for NonCommercial purposes
|
278 |
-
only;
|
279 |
-
|
280 |
-
b. if You include all or a substantial portion of the database
|
281 |
-
contents in a database in which You have Sui Generis Database
|
282 |
-
Rights, then the database in which You have Sui Generis Database
|
283 |
-
Rights (but not its individual contents) is Adapted Material; and
|
284 |
-
|
285 |
-
c. You must comply with the conditions in Section 3(a) if You Share
|
286 |
-
all or a substantial portion of the contents of the database.
|
287 |
-
|
288 |
-
For the avoidance of doubt, this Section 4 supplements and does not
|
289 |
-
replace Your obligations under this Public License where the Licensed
|
290 |
-
Rights include other Copyright and Similar Rights.
|
291 |
-
|
292 |
-
Section 5 -- Disclaimer of Warranties and Limitation of Liability.
|
293 |
-
|
294 |
-
a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
|
295 |
-
EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
|
296 |
-
AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
|
297 |
-
ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
|
298 |
-
IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
|
299 |
-
WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
|
300 |
-
PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
|
301 |
-
ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
|
302 |
-
KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
|
303 |
-
ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
|
304 |
-
|
305 |
-
b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
|
306 |
-
TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
|
307 |
-
NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
|
308 |
-
INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
|
309 |
-
COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
|
310 |
-
USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
|
311 |
-
ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
|
312 |
-
DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
|
313 |
-
IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
|
314 |
-
|
315 |
-
c. The disclaimer of warranties and limitation of liability provided
|
316 |
-
above shall be interpreted in a manner that, to the extent
|
317 |
-
possible, most closely approximates an absolute disclaimer and
|
318 |
-
waiver of all liability.
|
319 |
-
|
320 |
-
Section 6 -- Term and Termination.
|
321 |
-
|
322 |
-
a. This Public License applies for the term of the Copyright and
|
323 |
-
Similar Rights licensed here. However, if You fail to comply with
|
324 |
-
this Public License, then Your rights under this Public License
|
325 |
-
terminate automatically.
|
326 |
-
|
327 |
-
b. Where Your right to use the Licensed Material has terminated under
|
328 |
-
Section 6(a), it reinstates:
|
329 |
-
|
330 |
-
1. automatically as of the date the violation is cured, provided
|
331 |
-
it is cured within 30 days of Your discovery of the
|
332 |
-
violation; or
|
333 |
-
|
334 |
-
2. upon express reinstatement by the Licensor.
|
335 |
-
|
336 |
-
For the avoidance of doubt, this Section 6(b) does not affect any
|
337 |
-
right the Licensor may have to seek remedies for Your violations
|
338 |
-
of this Public License.
|
339 |
-
|
340 |
-
c. For the avoidance of doubt, the Licensor may also offer the
|
341 |
-
Licensed Material under separate terms or conditions or stop
|
342 |
-
distributing the Licensed Material at any time; however, doing so
|
343 |
-
will not terminate this Public License.
|
344 |
-
|
345 |
-
d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
|
346 |
-
License.
|
347 |
-
|
348 |
-
Section 7 -- Other Terms and Conditions.
|
349 |
-
|
350 |
-
a. The Licensor shall not be bound by any additional or different
|
351 |
-
terms or conditions communicated by You unless expressly agreed.
|
352 |
-
|
353 |
-
b. Any arrangements, understandings, or agreements regarding the
|
354 |
-
Licensed Material not stated herein are separate from and
|
355 |
-
independent of the terms and conditions of this Public License.
|
356 |
-
|
357 |
-
Section 8 -- Interpretation.
|
358 |
-
|
359 |
-
a. For the avoidance of doubt, this Public License does not, and
|
360 |
-
shall not be interpreted to, reduce, limit, restrict, or impose
|
361 |
-
conditions on any use of the Licensed Material that could lawfully
|
362 |
-
be made without permission under this Public License.
|
363 |
-
|
364 |
-
b. To the extent possible, if any provision of this Public License is
|
365 |
-
deemed unenforceable, it shall be automatically reformed to the
|
366 |
-
minimum extent necessary to make it enforceable. If the provision
|
367 |
-
cannot be reformed, it shall be severed from this Public License
|
368 |
-
without affecting the enforceability of the remaining terms and
|
369 |
-
conditions.
|
370 |
-
|
371 |
-
c. No term or condition of this Public License will be waived and no
|
372 |
-
failure to comply consented to unless expressly agreed to by the
|
373 |
-
Licensor.
|
374 |
-
|
375 |
-
d. Nothing in this Public License constitutes or may be interpreted
|
376 |
-
as a limitation upon, or waiver of, any privileges and immunities
|
377 |
-
that apply to the Licensor or You, including from the legal
|
378 |
-
processes of any jurisdiction or authority.
|
379 |
-
|
380 |
-
=======================================================================
|
381 |
-
|
382 |
-
Creative Commons is not a party to its public
|
383 |
-
licenses. Notwithstanding, Creative Commons may elect to apply one of
|
384 |
-
its public licenses to material it publishes and in those instances
|
385 |
-
will be considered the “Licensor.” The text of the Creative Commons
|
386 |
-
public licenses is dedicated to the public domain under the CC0 Public
|
387 |
-
Domain Dedication. Except for the limited purpose of indicating that
|
388 |
-
material is shared under a Creative Commons public license or as
|
389 |
-
otherwise permitted by the Creative Commons policies published at
|
390 |
-
creativecommons.org/policies, Creative Commons does not authorize the
|
391 |
-
use of the trademark "Creative Commons" or any other trademark or logo
|
392 |
-
of Creative Commons without its prior written consent including,
|
393 |
-
without limitation, in connection with any unauthorized modifications
|
394 |
-
to any of its public licenses or any other arrangements,
|
395 |
-
understandings, or agreements concerning use of licensed material. For
|
396 |
-
the avoidance of doubt, this paragraph does not form part of the
|
397 |
-
public licenses.
|
398 |
-
|
399 |
-
Creative Commons may be contacted at creativecommons.org.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
CHANGED
@@ -10,60 +10,4 @@ pinned: false
|
|
10 |
license: cc-by-nc-4.0
|
11 |
---
|
12 |
|
13 |
-
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
14 |
-
|
15 |
-
# [OVSeg] Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
|
16 |
-
|
17 |
-
<img src="resources/pytorch-logo-dark.png" width="10%">
|
18 |
-
|
19 |
-
This is the official PyTorch implementation of our paper: <br>
|
20 |
-
**Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP**<br>
|
21 |
-
[Feng Liang](https://jeff-liangf.github.io/), [Bichen Wu](https://www.linkedin.com/in/bichenwu), [Xiaoliang Dai](https://sites.google.com/view/xiaoliangdai/), [Kunpeng Li](https://kunpengli1994.github.io/), [Yinan Zhao](https://yinan-zhao.github.io/), [Hang Zhang](https://hangzhang.org/), [Peizhao Zhang](https://www.linkedin.com/in/peizhao-zhang-14846042/), [Peter Vajda](https://sites.google.com/site/vajdap), [Diana Marculescu](https://www.ece.utexas.edu/people/faculty/diana-marculescu)
|
22 |
-
|
23 |
-
[[arXiv](https://arxiv.org/abs/2210.04150)] [[Project](https://jeff-liangf.github.io/projects/ovseg/)]
|
24 |
-
|
25 |
-
<p align="center">
|
26 |
-
<img src="resources/ovseg.gif" width="100%">
|
27 |
-
</p>
|
28 |
-
|
29 |
-
|
30 |
-
## Installation
|
31 |
-
|
32 |
-
Please see [installation guide](./INSTALL.md).
|
33 |
-
|
34 |
-
## Data Preparation
|
35 |
-
|
36 |
-
Please see [datasets preparation](./datasets/DATASETS.md).
|
37 |
-
|
38 |
-
## Getting started
|
39 |
-
|
40 |
-
Please see [getting started instruction](./GETTING_STARTED.md).
|
41 |
-
|
42 |
-
## LICENSE
|
43 |
-
|
44 |
-
Shield: [![CC BY-NC 4.0][cc-by-nc-shield]][cc-by-nc]
|
45 |
-
|
46 |
-
The majority of OVSeg is licensed under a
|
47 |
-
[Creative Commons Attribution-NonCommercial 4.0 International License](LICENSE).
|
48 |
-
|
49 |
-
[![CC BY-NC 4.0][cc-by-nc-image]][cc-by-nc]
|
50 |
-
|
51 |
-
[cc-by-nc]: http://creativecommons.org/licenses/by-nc/4.0/
|
52 |
-
[cc-by-nc-image]: https://licensebuttons.net/l/by-nc/4.0/88x31.png
|
53 |
-
[cc-by-nc-shield]: https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg
|
54 |
-
|
55 |
-
However portions of the project are under separate license terms: CLIP and ZSSEG are licensed under the [MIT license](https://github.com/openai/CLIP/blob/main/LICENSE); MaskFormer is licensed under the [CC-BY-NC](https://github.com/facebookresearch/MaskFormer/blob/main/LICENSE); openclip is licensed under the license at [its repo](https://github.com/mlfoundations/open_clip/blob/main/LICENSE).
|
56 |
-
|
57 |
-
|
58 |
-
## Citing OVSeg :pray:
|
59 |
-
|
60 |
-
If you use OVSeg in your research or wish to refer to the baseline results published in the paper, please use the following BibTeX entry.
|
61 |
-
|
62 |
-
```BibTeX
|
63 |
-
@article{liang2022open,
|
64 |
-
title={Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP},
|
65 |
-
author={Liang, Feng and Wu, Bichen and Dai, Xiaoliang and Li, Kunpeng and Zhao, Yinan and Zhang, Hang and Zhang, Peizhao and Vajda, Peter and Marculescu, Diana},
|
66 |
-
journal={arXiv preprint arXiv:2210.04150},
|
67 |
-
year={2022}
|
68 |
-
}
|
69 |
-
```
|
|
|
10 |
license: cc-by-nc-4.0
|
11 |
---
|
12 |
|
13 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app.py
CHANGED
@@ -6,6 +6,13 @@ import multiprocessing as mp
|
|
6 |
import numpy as np
|
7 |
from PIL import Image
|
8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
from detectron2.config import get_cfg
|
10 |
|
11 |
from detectron2.projects.deeplab import add_deeplab_config
|
@@ -15,6 +22,12 @@ from open_vocab_seg.utils import VisualizationDemo
|
|
15 |
|
16 |
import gradio as gr
|
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
def setup_cfg(config_file):
|
19 |
# load config from file and command-line arguments
|
20 |
cfg = get_cfg()
|
@@ -27,7 +40,7 @@ def setup_cfg(config_file):
|
|
27 |
|
28 |
def inference(class_names, input_img):
|
29 |
mp.set_start_method("spawn", force=True)
|
30 |
-
config_file = './
|
31 |
cfg = setup_cfg(config_file)
|
32 |
|
33 |
demo = VisualizationDemo(cfg)
|
@@ -38,19 +51,18 @@ def inference(class_names, input_img):
|
|
38 |
|
39 |
return Image.fromarray(np.uint8(visualized_output.get_image())).convert('RGB')
|
40 |
|
41 |
-
# demo = gr.Interface(fn=greet, inputs="text", outputs="text")
|
42 |
-
# demo.launch()
|
43 |
-
|
44 |
|
45 |
-
examples = [['Oculus, Ukulele', './resources/demo_samples/sample_03.jpeg'],
|
|
|
|
|
46 |
output_labels = ['segmentation map']
|
47 |
|
48 |
title = 'OVSeg'
|
49 |
|
50 |
description = """
|
51 |
-
Gradio Demo for Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP \n
|
52 |
-
You may click on of the examples or upload your own image. \n
|
53 |
-
|
54 |
"""
|
55 |
|
56 |
article = """
|
@@ -59,7 +71,7 @@ article = """
|
|
59 |
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
|
60 |
</a>
|
61 |
|
|
62 |
-
<a href='https://github.com' target='_blank'>Github Repo</a></p>
|
63 |
"""
|
64 |
|
65 |
gr.Interface(
|
|
|
6 |
import numpy as np
|
7 |
from PIL import Image
|
8 |
|
9 |
+
|
10 |
+
try:
|
11 |
+
import detectron2
|
12 |
+
except:
|
13 |
+
import os
|
14 |
+
os.system('pip install git+https://github.com/facebookresearch/detectron2.git')
|
15 |
+
|
16 |
from detectron2.config import get_cfg
|
17 |
|
18 |
from detectron2.projects.deeplab import add_deeplab_config
|
|
|
22 |
|
23 |
import gradio as gr
|
24 |
|
25 |
+
import gdown
|
26 |
+
|
27 |
+
ckpt_url = 'https://drive.google.com/uc?id=1cn-ohxgXDrDfkzC1QdO-fi8IjbjXmgKy'
|
28 |
+
output = './ovseg_swinbase_vitL14_ft_mpt.pth'
|
29 |
+
gdown.download(ckpt_url, output, quiet=False)
|
30 |
+
|
31 |
def setup_cfg(config_file):
|
32 |
# load config from file and command-line arguments
|
33 |
cfg = get_cfg()
|
|
|
40 |
|
41 |
def inference(class_names, input_img):
|
42 |
mp.set_start_method("spawn", force=True)
|
43 |
+
config_file = './ovseg_swinB_vitL_demo.yaml'
|
44 |
cfg = setup_cfg(config_file)
|
45 |
|
46 |
demo = VisualizationDemo(cfg)
|
|
|
51 |
|
52 |
return Image.fromarray(np.uint8(visualized_output.get_image())).convert('RGB')
|
53 |
|
|
|
|
|
|
|
54 |
|
55 |
+
examples = [['Oculus, Ukulele', './resources/demo_samples/sample_03.jpeg'],
|
56 |
+
['Saturn V, toys, blossom', './resources/demo_samples/sample_01.jpeg'],
|
57 |
+
['Golden gate, yacht', './resources/demo_samples/sample_02.jpeg'],]
|
58 |
output_labels = ['segmentation map']
|
59 |
|
60 |
title = 'OVSeg'
|
61 |
|
62 |
description = """
|
63 |
+
Gradio Demo for Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP. \n
|
64 |
+
OVSeg could perform open vocabulary segmentation, you may input more classes (seperate by comma). You may click on of the examples or upload your own image. \n
|
65 |
+
It might take some time to process. Cheers!
|
66 |
"""
|
67 |
|
68 |
article = """
|
|
|
71 |
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
|
72 |
</a>
|
73 |
|
|
74 |
+
<a href='https://github.com/facebookresearch/ov-seg' target='_blank'>Github Repo</a></p>
|
75 |
"""
|
76 |
|
77 |
gr.Interface(
|
configs/ovseg_swinB_vitL_bs32_120k.yaml
DELETED
@@ -1,100 +0,0 @@
|
|
1 |
-
MODEL:
|
2 |
-
META_ARCHITECTURE: "OVSeg"
|
3 |
-
BACKBONE:
|
4 |
-
FREEZE_AT: 0
|
5 |
-
NAME: "D2SwinTransformer"
|
6 |
-
SWIN:
|
7 |
-
EMBED_DIM: 128
|
8 |
-
DEPTHS: [2, 2, 18, 2]
|
9 |
-
NUM_HEADS: [4, 8, 16, 32]
|
10 |
-
WINDOW_SIZE: 12
|
11 |
-
APE: False
|
12 |
-
DROP_PATH_RATE: 0.3
|
13 |
-
PATCH_NORM: True
|
14 |
-
PRETRAIN_IMG_SIZE: 384
|
15 |
-
WEIGHTS: "swin_base_patch4_window12_384_22k.pkl"
|
16 |
-
PIXEL_MEAN: [123.675, 116.280, 103.530]
|
17 |
-
PIXEL_STD: [58.395, 57.120, 57.375]
|
18 |
-
SEM_SEG_HEAD:
|
19 |
-
NAME: "OpenVocabMaskFormerHead"
|
20 |
-
IN_FEATURES: ["res2", "res3", "res4", "res5"]
|
21 |
-
IGNORE_VALUE: 255
|
22 |
-
NUM_CLASSES: 171 # number of categories in training set
|
23 |
-
EMBEDDING_DIM: 768
|
24 |
-
EMBED_LAYERS: 2
|
25 |
-
COMMON_STRIDE: 4 # not used, hard-coded
|
26 |
-
LOSS_WEIGHT: 1.0
|
27 |
-
CONVS_DIM: 256
|
28 |
-
MASK_DIM: 256
|
29 |
-
NORM: "GN"
|
30 |
-
MASK_FORMER:
|
31 |
-
TRANSFORMER_IN_FEATURE: "res5"
|
32 |
-
DEEP_SUPERVISION: True
|
33 |
-
NO_OBJECT_WEIGHT: 0.1
|
34 |
-
DICE_WEIGHT: 1.0
|
35 |
-
MASK_WEIGHT: 20.0
|
36 |
-
HIDDEN_DIM: 256
|
37 |
-
NUM_OBJECT_QUERIES: 100
|
38 |
-
NHEADS: 8
|
39 |
-
DROPOUT: 0.1
|
40 |
-
DIM_FEEDFORWARD: 2048
|
41 |
-
ENC_LAYERS: 0
|
42 |
-
DEC_LAYERS: 6
|
43 |
-
PRE_NORM: False
|
44 |
-
CLIP_ADAPTER:
|
45 |
-
TEXT_TEMPLATES: "vild"
|
46 |
-
CLIP_MODEL_NAME: "ViT-L/14"
|
47 |
-
MASK_FILL: "mean"
|
48 |
-
MASK_EXPAND_RATIO: 1.0
|
49 |
-
MASK_THR: 0.4 # choose the foreground objects
|
50 |
-
MASK_MATTING: False # use soft background, default not used
|
51 |
-
MASK_PROMPT_DEPTH: 3
|
52 |
-
MASK_PROMPT_FWD: True # use mask prompt during forward
|
53 |
-
REGION_RESIZED: True # resize to the input of clip, e.g., 224
|
54 |
-
CLIP_ENSEMBLE: True # use ensemble of two classification branches
|
55 |
-
CLIP_ENSEMBLE_WEIGHT: 0.7
|
56 |
-
DATASETS:
|
57 |
-
TRAIN: ("coco_2017_train_stuff_sem_seg",)
|
58 |
-
TEST: ("ade20k_sem_seg_val",)
|
59 |
-
SOLVER:
|
60 |
-
IMS_PER_BATCH: 32
|
61 |
-
BASE_LR: 0.00006
|
62 |
-
MAX_ITER: 120000
|
63 |
-
WARMUP_FACTOR: 1e-6
|
64 |
-
WARMUP_ITERS: 1500
|
65 |
-
LR_SCHEDULER_NAME: "WarmupPolyLR"
|
66 |
-
WEIGHT_DECAY: 0.01
|
67 |
-
WEIGHT_DECAY_NORM: 0.0
|
68 |
-
WEIGHT_DECAY_EMBED: 0.0
|
69 |
-
BACKBONE_MULTIPLIER: 1.0
|
70 |
-
TEST_IMS_PER_BATCH: 1
|
71 |
-
CLIP_GRADIENTS:
|
72 |
-
ENABLED: True
|
73 |
-
CLIP_TYPE: "full_model"
|
74 |
-
CLIP_VALUE: 0.01
|
75 |
-
NORM_TYPE: 2.0
|
76 |
-
INPUT:
|
77 |
-
MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 640) for x in range(5, 21)]"]
|
78 |
-
MIN_SIZE_TRAIN_SAMPLING: "choice"
|
79 |
-
MIN_SIZE_TEST: 640
|
80 |
-
MAX_SIZE_TRAIN: 2560
|
81 |
-
MAX_SIZE_TEST: 2560
|
82 |
-
CROP:
|
83 |
-
ENABLED: True
|
84 |
-
TYPE: "absolute"
|
85 |
-
SIZE: (640, 640)
|
86 |
-
SINGLE_CATEGORY_MAX_AREA: 1.0
|
87 |
-
COLOR_AUG_SSD: True
|
88 |
-
SIZE_DIVISIBILITY: 640 # used in dataset mapper
|
89 |
-
FORMAT: "RGB"
|
90 |
-
TEST:
|
91 |
-
EVAL_PERIOD: 5000
|
92 |
-
AUG:
|
93 |
-
ENABLED: False
|
94 |
-
MIN_SIZES: [256, 384, 512, 640, 768, 896]
|
95 |
-
MAX_SIZE: 3584
|
96 |
-
FLIP: True
|
97 |
-
DATALOADER:
|
98 |
-
FILTER_EMPTY_ANNOTATIONS: True
|
99 |
-
NUM_WORKERS: 4
|
100 |
-
VERSION: 2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
datasets/DATASETS.md
DELETED
@@ -1,122 +0,0 @@
|
|
1 |
-
## Prepare Datasets for OVSeg
|
2 |
-
|
3 |
-
This doc is a modification/extension of [MaskFormer](https://github.com/facebookresearch/MaskFormer/blob/main/datasets/README.md) following [Detectron2 fromat](https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html).
|
4 |
-
|
5 |
-
A dataset can be used by accessing [DatasetCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.DatasetCatalog)
|
6 |
-
for its data, or [MetadataCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.MetadataCatalog) for its metadata (class names, etc).
|
7 |
-
This document explains how to setup the builtin datasets so they can be used by the above APIs.
|
8 |
-
[Use Custom Datasets](https://detectron2.readthedocs.io/tutorials/datasets.html) gives a deeper dive on how to use `DatasetCatalog` and `MetadataCatalog`,
|
9 |
-
and how to add new datasets to them.
|
10 |
-
|
11 |
-
OVSeg has builtin support for a few datasets.
|
12 |
-
The datasets are assumed to exist in a directory specified by the environment variable
|
13 |
-
`DETECTRON2_DATASETS`.
|
14 |
-
Under this directory, detectron2 will look for datasets in the structure described below, if needed.
|
15 |
-
```
|
16 |
-
$DETECTRON2_DATASETS/
|
17 |
-
coco/ # COCOStuff-171
|
18 |
-
ADEChallengeData2016/ # ADE20K-150
|
19 |
-
ADE20K_2021_17_01/ # ADE20K-847
|
20 |
-
VOCdevkit/
|
21 |
-
VOC2012/ # PASCALVOC-20
|
22 |
-
VOC2010/ # PASCALContext-59, PASCALContext-459
|
23 |
-
```
|
24 |
-
|
25 |
-
You can set the location for builtin datasets by `export DETECTRON2_DATASETS=/path/to/datasets`.
|
26 |
-
If left unset, the default is `./datasets` relative to your current working directory.
|
27 |
-
|
28 |
-
Without specific notifications, our model is trained on COCOStuff-171 and evlauted on ADE20K-150, ADE20K-847, PASCALVOC-20, PASCALContext-59 and PASCALContext-459.
|
29 |
-
|
30 |
-
| dataset | split | # images | # categories |
|
31 |
-
|:--------------:|:---------:|:--------:|:------------:|
|
32 |
-
| COCO Stuff | train2017 | 118K | 171 |
|
33 |
-
| ADE20K | val | 2K | 150/847 |
|
34 |
-
| Pascal VOC | val | 1.5K | 20 |
|
35 |
-
| Pascal Context | val | 5K | 59/459 |
|
36 |
-
|
37 |
-
|
38 |
-
### Expected dataset structure for [COCO Stuff](https://github.com/nightrome/cocostuff):
|
39 |
-
```
|
40 |
-
coco/
|
41 |
-
train2017/ # http://images.cocodataset.org/zips/train2017.zip
|
42 |
-
annotations/ # http://images.cocodataset.org/annotations/annotations_trainval2017.zip
|
43 |
-
stuffthingmaps/
|
44 |
-
stuffthingmaps_trainval2017.zip # http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip
|
45 |
-
train2017/
|
46 |
-
# below are generated
|
47 |
-
stuffthingmaps_detectron2/
|
48 |
-
train2017/
|
49 |
-
```
|
50 |
-
|
51 |
-
The directory `stuffthingmaps_detectron2` is generated by running `python datasets/prepare_coco_stuff_sem_seg.py`.
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
### Expected dataset structure for [ADE20k Scene Parsing (ADE20K-150)](http://sceneparsing.csail.mit.edu/):
|
56 |
-
```
|
57 |
-
ADEChallengeData2016/
|
58 |
-
annotations/
|
59 |
-
images/
|
60 |
-
objectInfo150.txt
|
61 |
-
# below are generated
|
62 |
-
annotations_detectron2/
|
63 |
-
```
|
64 |
-
The directory `annotations_detectron2` is generated by running `python datasets/prepare_ade20k_sem_seg.py`.
|
65 |
-
|
66 |
-
|
67 |
-
### Expected dataset structure for [ADE20k-Full (ADE20K-847)](https://github.com/CSAILVision/ADE20K#download):
|
68 |
-
```
|
69 |
-
ADE20K_2021_17_01/
|
70 |
-
images/
|
71 |
-
index_ade20k.pkl
|
72 |
-
objects.txt
|
73 |
-
# below are generated
|
74 |
-
images_detectron2/
|
75 |
-
annotations_detectron2/
|
76 |
-
```
|
77 |
-
The directories `images_detectron2` and `annotations_detectron2` are generated by running `python datasets/prepare_ade20k_full_sem_seg.py`.
|
78 |
-
|
79 |
-
### Expected dataset structure for [Pascal VOC 2012 (PASCALVOC-20)](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#devkit):
|
80 |
-
```
|
81 |
-
VOCdevkit/VOC2012/
|
82 |
-
Annotations/
|
83 |
-
ImageSets/
|
84 |
-
JPEGImages/
|
85 |
-
SegmentationClass/
|
86 |
-
SegmentationObject/
|
87 |
-
SegmentationClassAug/ # https://github.com/kazuto1011/deeplab-pytorch/blob/master/data/datasets/voc12/README.md
|
88 |
-
# below are generated
|
89 |
-
images_detectron2/
|
90 |
-
annotations_detectron2/
|
91 |
-
```
|
92 |
-
|
93 |
-
It starts with a tar file `VOCtrainval_11-May-2012.tar`.
|
94 |
-
|
95 |
-
We use SBD augmentated training data as `SegmentationClassAug` following [Deeplab](https://github.com/kazuto1011/deeplab-pytorch/blob/master/data/datasets/voc12/README.md)
|
96 |
-
|
97 |
-
The directories `images_detectron2` and `annotations_detectron2` are generated by running `python datasets/prepare_voc_sem_seg.py`.
|
98 |
-
|
99 |
-
|
100 |
-
### Expected dataset structure for [Pascal Context](https://www.cs.stanford.edu/~roozbeh/pascal-context/):
|
101 |
-
|
102 |
-
```
|
103 |
-
VOCdevkit/VOC2010/
|
104 |
-
Annotations/
|
105 |
-
ImageSets/
|
106 |
-
JPEGImages/
|
107 |
-
SegmentationClass/
|
108 |
-
SegmentationObject/
|
109 |
-
# below are from https://www.cs.stanford.edu/~roozbeh/pascal-context/trainval.tar.gz
|
110 |
-
trainval/
|
111 |
-
labels.txt
|
112 |
-
59_labels.txt # https://www.cs.stanford.edu/~roozbeh/pascal-context/59_labels.txt
|
113 |
-
pascalcontext_val.txt # https://drive.google.com/file/d/1BCbiOKtLvozjVnlTJX51koIveUZHCcUh/view?usp=sharing
|
114 |
-
# below are generated
|
115 |
-
annotations_detectron2/
|
116 |
-
pc459_val
|
117 |
-
pc59_val
|
118 |
-
```
|
119 |
-
It starts with a tar file `VOCtrainval_03-May-2010.tar`. You may want to download the 5K validation set [here](https://drive.google.com/file/d/1BCbiOKtLvozjVnlTJX51koIveUZHCcUh/view?usp=sharing).
|
120 |
-
|
121 |
-
The directory `annotations_detectron2` is generated by running `python datasets/prepare_pascal_context.py`.
|
122 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
datasets/prepare_ade20k_full_sem_seg.py
DELETED
@@ -1,1011 +0,0 @@
|
|
1 |
-
# Copyright (c) Facebook, Inc. and its affiliates.
|
2 |
-
# Copyright (c) Meta Platforms, Inc. All Rights Reserved
|
3 |
-
|
4 |
-
import os
|
5 |
-
import pickle as pkl
|
6 |
-
from pathlib import Path
|
7 |
-
|
8 |
-
import cv2
|
9 |
-
import numpy as np
|
10 |
-
import tqdm
|
11 |
-
from PIL import Image
|
12 |
-
|
13 |
-
ADE20K_SEM_SEG_FULL_CATEGORIES = [
|
14 |
-
{"name": "wall", "id": 2978, "trainId": 0},
|
15 |
-
{"name": "building, edifice", "id": 312, "trainId": 1},
|
16 |
-
{"name": "sky", "id": 2420, "trainId": 2},
|
17 |
-
{"name": "tree", "id": 2855, "trainId": 3},
|
18 |
-
{"name": "road, route", "id": 2131, "trainId": 4},
|
19 |
-
{"name": "floor, flooring", "id": 976, "trainId": 5},
|
20 |
-
{"name": "ceiling", "id": 447, "trainId": 6},
|
21 |
-
{"name": "bed", "id": 165, "trainId": 7},
|
22 |
-
{"name": "sidewalk, pavement", "id": 2377, "trainId": 8},
|
23 |
-
{"name": "earth, ground", "id": 838, "trainId": 9},
|
24 |
-
{"name": "cabinet", "id": 350, "trainId": 10},
|
25 |
-
{"name": "person, individual, someone, somebody, mortal, soul", "id": 1831, "trainId": 11},
|
26 |
-
{"name": "grass", "id": 1125, "trainId": 12},
|
27 |
-
{"name": "windowpane, window", "id": 3055, "trainId": 13},
|
28 |
-
{"name": "car, auto, automobile, machine, motorcar", "id": 401, "trainId": 14},
|
29 |
-
{"name": "mountain, mount", "id": 1610, "trainId": 15},
|
30 |
-
{"name": "plant, flora, plant life", "id": 1910, "trainId": 16},
|
31 |
-
{"name": "table", "id": 2684, "trainId": 17},
|
32 |
-
{"name": "chair", "id": 471, "trainId": 18},
|
33 |
-
{"name": "curtain, drape, drapery, mantle, pall", "id": 687, "trainId": 19},
|
34 |
-
{"name": "door", "id": 774, "trainId": 20},
|
35 |
-
{"name": "sofa, couch, lounge", "id": 2473, "trainId": 21},
|
36 |
-
{"name": "sea", "id": 2264, "trainId": 22},
|
37 |
-
{"name": "painting, picture", "id": 1735, "trainId": 23},
|
38 |
-
{"name": "water", "id": 2994, "trainId": 24},
|
39 |
-
{"name": "mirror", "id": 1564, "trainId": 25},
|
40 |
-
{"name": "house", "id": 1276, "trainId": 26},
|
41 |
-
{"name": "rug, carpet, carpeting", "id": 2178, "trainId": 27},
|
42 |
-
{"name": "shelf", "id": 2329, "trainId": 28},
|
43 |
-
{"name": "armchair", "id": 57, "trainId": 29},
|
44 |
-
{"name": "fence, fencing", "id": 907, "trainId": 30},
|
45 |
-
{"name": "field", "id": 913, "trainId": 31},
|
46 |
-
{"name": "lamp", "id": 1395, "trainId": 32},
|
47 |
-
{"name": "rock, stone", "id": 2138, "trainId": 33},
|
48 |
-
{"name": "seat", "id": 2272, "trainId": 34},
|
49 |
-
{"name": "river", "id": 2128, "trainId": 35},
|
50 |
-
{"name": "desk", "id": 724, "trainId": 36},
|
51 |
-
{"name": "bathtub, bathing tub, bath, tub", "id": 155, "trainId": 37},
|
52 |
-
{"name": "railing, rail", "id": 2053, "trainId": 38},
|
53 |
-
{"name": "signboard, sign", "id": 2380, "trainId": 39},
|
54 |
-
{"name": "cushion", "id": 689, "trainId": 40},
|
55 |
-
{"name": "path", "id": 1788, "trainId": 41},
|
56 |
-
{"name": "work surface", "id": 3087, "trainId": 42},
|
57 |
-
{"name": "stairs, steps", "id": 2530, "trainId": 43},
|
58 |
-
{"name": "column, pillar", "id": 581, "trainId": 44},
|
59 |
-
{"name": "sink", "id": 2388, "trainId": 45},
|
60 |
-
{"name": "wardrobe, closet, press", "id": 2985, "trainId": 46},
|
61 |
-
{"name": "snow", "id": 2454, "trainId": 47},
|
62 |
-
{"name": "refrigerator, icebox", "id": 2096, "trainId": 48},
|
63 |
-
{"name": "base, pedestal, stand", "id": 137, "trainId": 49},
|
64 |
-
{"name": "bridge, span", "id": 294, "trainId": 50},
|
65 |
-
{"name": "blind, screen", "id": 212, "trainId": 51},
|
66 |
-
{"name": "runway", "id": 2185, "trainId": 52},
|
67 |
-
{"name": "cliff, drop, drop-off", "id": 524, "trainId": 53},
|
68 |
-
{"name": "sand", "id": 2212, "trainId": 54},
|
69 |
-
{"name": "fireplace, hearth, open fireplace", "id": 943, "trainId": 55},
|
70 |
-
{"name": "pillow", "id": 1869, "trainId": 56},
|
71 |
-
{"name": "screen door, screen", "id": 2251, "trainId": 57},
|
72 |
-
{"name": "toilet, can, commode, crapper, pot, potty, stool, throne", "id": 2793, "trainId": 58},
|
73 |
-
{"name": "skyscraper", "id": 2423, "trainId": 59},
|
74 |
-
{"name": "grandstand, covered stand", "id": 1121, "trainId": 60},
|
75 |
-
{"name": "box", "id": 266, "trainId": 61},
|
76 |
-
{"name": "pool table, billiard table, snooker table", "id": 1948, "trainId": 62},
|
77 |
-
{"name": "palm, palm tree", "id": 1744, "trainId": 63},
|
78 |
-
{"name": "double door", "id": 783, "trainId": 64},
|
79 |
-
{"name": "coffee table, cocktail table", "id": 571, "trainId": 65},
|
80 |
-
{"name": "counter", "id": 627, "trainId": 66},
|
81 |
-
{"name": "countertop", "id": 629, "trainId": 67},
|
82 |
-
{"name": "chest of drawers, chest, bureau, dresser", "id": 491, "trainId": 68},
|
83 |
-
{"name": "kitchen island", "id": 1374, "trainId": 69},
|
84 |
-
{"name": "boat", "id": 223, "trainId": 70},
|
85 |
-
{"name": "waterfall, falls", "id": 3016, "trainId": 71},
|
86 |
-
{
|
87 |
-
"name": "stove, kitchen stove, range, kitchen range, cooking stove",
|
88 |
-
"id": 2598,
|
89 |
-
"trainId": 72,
|
90 |
-
},
|
91 |
-
{"name": "flower", "id": 978, "trainId": 73},
|
92 |
-
{"name": "bookcase", "id": 239, "trainId": 74},
|
93 |
-
{"name": "controls", "id": 608, "trainId": 75},
|
94 |
-
{"name": "book", "id": 236, "trainId": 76},
|
95 |
-
{"name": "stairway, staircase", "id": 2531, "trainId": 77},
|
96 |
-
{"name": "streetlight, street lamp", "id": 2616, "trainId": 78},
|
97 |
-
{
|
98 |
-
"name": "computer, computing machine, computing device, data processor, electronic computer, information processing system",
|
99 |
-
"id": 591,
|
100 |
-
"trainId": 79,
|
101 |
-
},
|
102 |
-
{
|
103 |
-
"name": "bus, autobus, coach, charabanc, double-decker, jitney, motorbus, motorcoach, omnibus, passenger vehicle",
|
104 |
-
"id": 327,
|
105 |
-
"trainId": 80,
|
106 |
-
},
|
107 |
-
{"name": "swivel chair", "id": 2679, "trainId": 81},
|
108 |
-
{"name": "light, light source", "id": 1451, "trainId": 82},
|
109 |
-
{"name": "bench", "id": 181, "trainId": 83},
|
110 |
-
{"name": "case, display case, showcase, vitrine", "id": 420, "trainId": 84},
|
111 |
-
{"name": "towel", "id": 2821, "trainId": 85},
|
112 |
-
{"name": "fountain", "id": 1023, "trainId": 86},
|
113 |
-
{"name": "embankment", "id": 855, "trainId": 87},
|
114 |
-
{
|
115 |
-
"name": "television receiver, television, television set, tv, tv set, idiot box, boob tube, telly, goggle box",
|
116 |
-
"id": 2733,
|
117 |
-
"trainId": 88,
|
118 |
-
},
|
119 |
-
{"name": "van", "id": 2928, "trainId": 89},
|
120 |
-
{"name": "hill", "id": 1240, "trainId": 90},
|
121 |
-
{"name": "awning, sunshade, sunblind", "id": 77, "trainId": 91},
|
122 |
-
{"name": "poster, posting, placard, notice, bill, card", "id": 1969, "trainId": 92},
|
123 |
-
{"name": "truck, motortruck", "id": 2880, "trainId": 93},
|
124 |
-
{"name": "airplane, aeroplane, plane", "id": 14, "trainId": 94},
|
125 |
-
{"name": "pole", "id": 1936, "trainId": 95},
|
126 |
-
{"name": "tower", "id": 2828, "trainId": 96},
|
127 |
-
{"name": "court", "id": 631, "trainId": 97},
|
128 |
-
{"name": "ball", "id": 103, "trainId": 98},
|
129 |
-
{
|
130 |
-
"name": "aircraft carrier, carrier, flattop, attack aircraft carrier",
|
131 |
-
"id": 3144,
|
132 |
-
"trainId": 99,
|
133 |
-
},
|
134 |
-
{"name": "buffet, counter, sideboard", "id": 308, "trainId": 100},
|
135 |
-
{"name": "hovel, hut, hutch, shack, shanty", "id": 1282, "trainId": 101},
|
136 |
-
{"name": "apparel, wearing apparel, dress, clothes", "id": 38, "trainId": 102},
|
137 |
-
{"name": "minibike, motorbike", "id": 1563, "trainId": 103},
|
138 |
-
{"name": "animal, animate being, beast, brute, creature, fauna", "id": 29, "trainId": 104},
|
139 |
-
{"name": "chandelier, pendant, pendent", "id": 480, "trainId": 105},
|
140 |
-
{"name": "step, stair", "id": 2569, "trainId": 106},
|
141 |
-
{"name": "booth, cubicle, stall, kiosk", "id": 247, "trainId": 107},
|
142 |
-
{"name": "bicycle, bike, wheel, cycle", "id": 187, "trainId": 108},
|
143 |
-
{"name": "doorframe, doorcase", "id": 778, "trainId": 109},
|
144 |
-
{"name": "sconce", "id": 2243, "trainId": 110},
|
145 |
-
{"name": "pond", "id": 1941, "trainId": 111},
|
146 |
-
{"name": "trade name, brand name, brand, marque", "id": 2833, "trainId": 112},
|
147 |
-
{"name": "bannister, banister, balustrade, balusters, handrail", "id": 120, "trainId": 113},
|
148 |
-
{"name": "bag", "id": 95, "trainId": 114},
|
149 |
-
{"name": "traffic light, traffic signal, stoplight", "id": 2836, "trainId": 115},
|
150 |
-
{"name": "gazebo", "id": 1087, "trainId": 116},
|
151 |
-
{"name": "escalator, moving staircase, moving stairway", "id": 868, "trainId": 117},
|
152 |
-
{"name": "land, ground, soil", "id": 1401, "trainId": 118},
|
153 |
-
{"name": "board, plank", "id": 220, "trainId": 119},
|
154 |
-
{"name": "arcade machine", "id": 47, "trainId": 120},
|
155 |
-
{"name": "eiderdown, duvet, continental quilt", "id": 843, "trainId": 121},
|
156 |
-
{"name": "bar", "id": 123, "trainId": 122},
|
157 |
-
{"name": "stall, stand, sales booth", "id": 2537, "trainId": 123},
|
158 |
-
{"name": "playground", "id": 1927, "trainId": 124},
|
159 |
-
{"name": "ship", "id": 2337, "trainId": 125},
|
160 |
-
{"name": "ottoman, pouf, pouffe, puff, hassock", "id": 1702, "trainId": 126},
|
161 |
-
{
|
162 |
-
"name": "ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin",
|
163 |
-
"id": 64,
|
164 |
-
"trainId": 127,
|
165 |
-
},
|
166 |
-
{"name": "bottle", "id": 249, "trainId": 128},
|
167 |
-
{"name": "cradle", "id": 642, "trainId": 129},
|
168 |
-
{"name": "pot, flowerpot", "id": 1981, "trainId": 130},
|
169 |
-
{
|
170 |
-
"name": "conveyer belt, conveyor belt, conveyer, conveyor, transporter",
|
171 |
-
"id": 609,
|
172 |
-
"trainId": 131,
|
173 |
-
},
|
174 |
-
{"name": "train, railroad train", "id": 2840, "trainId": 132},
|
175 |
-
{"name": "stool", "id": 2586, "trainId": 133},
|
176 |
-
{"name": "lake", "id": 1393, "trainId": 134},
|
177 |
-
{"name": "tank, storage tank", "id": 2704, "trainId": 135},
|
178 |
-
{"name": "ice, water ice", "id": 1304, "trainId": 136},
|
179 |
-
{"name": "basket, handbasket", "id": 146, "trainId": 137},
|
180 |
-
{"name": "manhole", "id": 1494, "trainId": 138},
|
181 |
-
{"name": "tent, collapsible shelter", "id": 2739, "trainId": 139},
|
182 |
-
{"name": "canopy", "id": 389, "trainId": 140},
|
183 |
-
{"name": "microwave, microwave oven", "id": 1551, "trainId": 141},
|
184 |
-
{"name": "barrel, cask", "id": 131, "trainId": 142},
|
185 |
-
{"name": "dirt track", "id": 738, "trainId": 143},
|
186 |
-
{"name": "beam", "id": 161, "trainId": 144},
|
187 |
-
{"name": "dishwasher, dish washer, dishwashing machine", "id": 747, "trainId": 145},
|
188 |
-
{"name": "plate", "id": 1919, "trainId": 146},
|
189 |
-
{"name": "screen, crt screen", "id": 3109, "trainId": 147},
|
190 |
-
{"name": "ruins", "id": 2179, "trainId": 148},
|
191 |
-
{"name": "washer, automatic washer, washing machine", "id": 2989, "trainId": 149},
|
192 |
-
{"name": "blanket, cover", "id": 206, "trainId": 150},
|
193 |
-
{"name": "plaything, toy", "id": 1930, "trainId": 151},
|
194 |
-
{"name": "food, solid food", "id": 1002, "trainId": 152},
|
195 |
-
{"name": "screen, silver screen, projection screen", "id": 2254, "trainId": 153},
|
196 |
-
{"name": "oven", "id": 1708, "trainId": 154},
|
197 |
-
{"name": "stage", "id": 2526, "trainId": 155},
|
198 |
-
{"name": "beacon, lighthouse, beacon light, pharos", "id": 160, "trainId": 156},
|
199 |
-
{"name": "umbrella", "id": 2901, "trainId": 157},
|
200 |
-
{"name": "sculpture", "id": 2262, "trainId": 158},
|
201 |
-
{"name": "aqueduct", "id": 44, "trainId": 159},
|
202 |
-
{"name": "container", "id": 597, "trainId": 160},
|
203 |
-
{"name": "scaffolding, staging", "id": 2235, "trainId": 161},
|
204 |
-
{"name": "hood, exhaust hood", "id": 1260, "trainId": 162},
|
205 |
-
{"name": "curb, curbing, kerb", "id": 682, "trainId": 163},
|
206 |
-
{"name": "roller coaster", "id": 2151, "trainId": 164},
|
207 |
-
{"name": "horse, equus caballus", "id": 3107, "trainId": 165},
|
208 |
-
{"name": "catwalk", "id": 432, "trainId": 166},
|
209 |
-
{"name": "glass, drinking glass", "id": 1098, "trainId": 167},
|
210 |
-
{"name": "vase", "id": 2932, "trainId": 168},
|
211 |
-
{"name": "central reservation", "id": 461, "trainId": 169},
|
212 |
-
{"name": "carousel", "id": 410, "trainId": 170},
|
213 |
-
{"name": "radiator", "id": 2046, "trainId": 171},
|
214 |
-
{"name": "closet", "id": 533, "trainId": 172},
|
215 |
-
{"name": "machine", "id": 1481, "trainId": 173},
|
216 |
-
{"name": "pier, wharf, wharfage, dock", "id": 1858, "trainId": 174},
|
217 |
-
{"name": "fan", "id": 894, "trainId": 175},
|
218 |
-
{"name": "inflatable bounce game", "id": 1322, "trainId": 176},
|
219 |
-
{"name": "pitch", "id": 1891, "trainId": 177},
|
220 |
-
{"name": "paper", "id": 1756, "trainId": 178},
|
221 |
-
{"name": "arcade, colonnade", "id": 49, "trainId": 179},
|
222 |
-
{"name": "hot tub", "id": 1272, "trainId": 180},
|
223 |
-
{"name": "helicopter", "id": 1229, "trainId": 181},
|
224 |
-
{"name": "tray", "id": 2850, "trainId": 182},
|
225 |
-
{"name": "partition, divider", "id": 1784, "trainId": 183},
|
226 |
-
{"name": "vineyard", "id": 2962, "trainId": 184},
|
227 |
-
{"name": "bowl", "id": 259, "trainId": 185},
|
228 |
-
{"name": "bullring", "id": 319, "trainId": 186},
|
229 |
-
{"name": "flag", "id": 954, "trainId": 187},
|
230 |
-
{"name": "pot", "id": 1974, "trainId": 188},
|
231 |
-
{"name": "footbridge, overcrossing, pedestrian bridge", "id": 1013, "trainId": 189},
|
232 |
-
{"name": "shower", "id": 2356, "trainId": 190},
|
233 |
-
{"name": "bag, traveling bag, travelling bag, grip, suitcase", "id": 97, "trainId": 191},
|
234 |
-
{"name": "bulletin board, notice board", "id": 318, "trainId": 192},
|
235 |
-
{"name": "confessional booth", "id": 592, "trainId": 193},
|
236 |
-
{"name": "trunk, tree trunk, bole", "id": 2885, "trainId": 194},
|
237 |
-
{"name": "forest", "id": 1017, "trainId": 195},
|
238 |
-
{"name": "elevator door", "id": 851, "trainId": 196},
|
239 |
-
{"name": "laptop, laptop computer", "id": 1407, "trainId": 197},
|
240 |
-
{"name": "instrument panel", "id": 1332, "trainId": 198},
|
241 |
-
{"name": "bucket, pail", "id": 303, "trainId": 199},
|
242 |
-
{"name": "tapestry, tapis", "id": 2714, "trainId": 200},
|
243 |
-
{"name": "platform", "id": 1924, "trainId": 201},
|
244 |
-
{"name": "jacket", "id": 1346, "trainId": 202},
|
245 |
-
{"name": "gate", "id": 1081, "trainId": 203},
|
246 |
-
{"name": "monitor, monitoring device", "id": 1583, "trainId": 204},
|
247 |
-
{
|
248 |
-
"name": "telephone booth, phone booth, call box, telephone box, telephone kiosk",
|
249 |
-
"id": 2727,
|
250 |
-
"trainId": 205,
|
251 |
-
},
|
252 |
-
{"name": "spotlight, spot", "id": 2509, "trainId": 206},
|
253 |
-
{"name": "ring", "id": 2123, "trainId": 207},
|
254 |
-
{"name": "control panel", "id": 602, "trainId": 208},
|
255 |
-
{"name": "blackboard, chalkboard", "id": 202, "trainId": 209},
|
256 |
-
{"name": "air conditioner, air conditioning", "id": 10, "trainId": 210},
|
257 |
-
{"name": "chest", "id": 490, "trainId": 211},
|
258 |
-
{"name": "clock", "id": 530, "trainId": 212},
|
259 |
-
{"name": "sand dune", "id": 2213, "trainId": 213},
|
260 |
-
{"name": "pipe, pipage, piping", "id": 1884, "trainId": 214},
|
261 |
-
{"name": "vault", "id": 2934, "trainId": 215},
|
262 |
-
{"name": "table football", "id": 2687, "trainId": 216},
|
263 |
-
{"name": "cannon", "id": 387, "trainId": 217},
|
264 |
-
{"name": "swimming pool, swimming bath, natatorium", "id": 2668, "trainId": 218},
|
265 |
-
{"name": "fluorescent, fluorescent fixture", "id": 982, "trainId": 219},
|
266 |
-
{"name": "statue", "id": 2547, "trainId": 220},
|
267 |
-
{
|
268 |
-
"name": "loudspeaker, speaker, speaker unit, loudspeaker system, speaker system",
|
269 |
-
"id": 1474,
|
270 |
-
"trainId": 221,
|
271 |
-
},
|
272 |
-
{"name": "exhibitor", "id": 877, "trainId": 222},
|
273 |
-
{"name": "ladder", "id": 1391, "trainId": 223},
|
274 |
-
{"name": "carport", "id": 414, "trainId": 224},
|
275 |
-
{"name": "dam", "id": 698, "trainId": 225},
|
276 |
-
{"name": "pulpit", "id": 2019, "trainId": 226},
|
277 |
-
{"name": "skylight, fanlight", "id": 2422, "trainId": 227},
|
278 |
-
{"name": "water tower", "id": 3010, "trainId": 228},
|
279 |
-
{"name": "grill, grille, grillwork", "id": 1139, "trainId": 229},
|
280 |
-
{"name": "display board", "id": 753, "trainId": 230},
|
281 |
-
{"name": "pane, pane of glass, window glass", "id": 1747, "trainId": 231},
|
282 |
-
{"name": "rubbish, trash, scrap", "id": 2175, "trainId": 232},
|
283 |
-
{"name": "ice rink", "id": 1301, "trainId": 233},
|
284 |
-
{"name": "fruit", "id": 1033, "trainId": 234},
|
285 |
-
{"name": "patio", "id": 1789, "trainId": 235},
|
286 |
-
{"name": "vending machine", "id": 2939, "trainId": 236},
|
287 |
-
{"name": "telephone, phone, telephone set", "id": 2730, "trainId": 237},
|
288 |
-
{"name": "net", "id": 1652, "trainId": 238},
|
289 |
-
{
|
290 |
-
"name": "backpack, back pack, knapsack, packsack, rucksack, haversack",
|
291 |
-
"id": 90,
|
292 |
-
"trainId": 239,
|
293 |
-
},
|
294 |
-
{"name": "jar", "id": 1349, "trainId": 240},
|
295 |
-
{"name": "track", "id": 2830, "trainId": 241},
|
296 |
-
{"name": "magazine", "id": 1485, "trainId": 242},
|
297 |
-
{"name": "shutter", "id": 2370, "trainId": 243},
|
298 |
-
{"name": "roof", "id": 2155, "trainId": 244},
|
299 |
-
{"name": "banner, streamer", "id": 118, "trainId": 245},
|
300 |
-
{"name": "landfill", "id": 1402, "trainId": 246},
|
301 |
-
{"name": "post", "id": 1957, "trainId": 247},
|
302 |
-
{"name": "altarpiece, reredos", "id": 3130, "trainId": 248},
|
303 |
-
{"name": "hat, chapeau, lid", "id": 1197, "trainId": 249},
|
304 |
-
{"name": "arch, archway", "id": 52, "trainId": 250},
|
305 |
-
{"name": "table game", "id": 2688, "trainId": 251},
|
306 |
-
{"name": "bag, handbag, pocketbook, purse", "id": 96, "trainId": 252},
|
307 |
-
{"name": "document, written document, papers", "id": 762, "trainId": 253},
|
308 |
-
{"name": "dome", "id": 772, "trainId": 254},
|
309 |
-
{"name": "pier", "id": 1857, "trainId": 255},
|
310 |
-
{"name": "shanties", "id": 2315, "trainId": 256},
|
311 |
-
{"name": "forecourt", "id": 1016, "trainId": 257},
|
312 |
-
{"name": "crane", "id": 643, "trainId": 258},
|
313 |
-
{"name": "dog, domestic dog, canis familiaris", "id": 3105, "trainId": 259},
|
314 |
-
{"name": "piano, pianoforte, forte-piano", "id": 1849, "trainId": 260},
|
315 |
-
{"name": "drawing", "id": 791, "trainId": 261},
|
316 |
-
{"name": "cabin", "id": 349, "trainId": 262},
|
317 |
-
{
|
318 |
-
"name": "ad, advertisement, advertizement, advertising, advertizing, advert",
|
319 |
-
"id": 6,
|
320 |
-
"trainId": 263,
|
321 |
-
},
|
322 |
-
{"name": "amphitheater, amphitheatre, coliseum", "id": 3114, "trainId": 264},
|
323 |
-
{"name": "monument", "id": 1587, "trainId": 265},
|
324 |
-
{"name": "henhouse", "id": 1233, "trainId": 266},
|
325 |
-
{"name": "cockpit", "id": 559, "trainId": 267},
|
326 |
-
{"name": "heater, warmer", "id": 1223, "trainId": 268},
|
327 |
-
{"name": "windmill, aerogenerator, wind generator", "id": 3049, "trainId": 269},
|
328 |
-
{"name": "pool", "id": 1943, "trainId": 270},
|
329 |
-
{"name": "elevator, lift", "id": 853, "trainId": 271},
|
330 |
-
{"name": "decoration, ornament, ornamentation", "id": 709, "trainId": 272},
|
331 |
-
{"name": "labyrinth", "id": 1390, "trainId": 273},
|
332 |
-
{"name": "text, textual matter", "id": 2748, "trainId": 274},
|
333 |
-
{"name": "printer", "id": 2007, "trainId": 275},
|
334 |
-
{"name": "mezzanine, first balcony", "id": 1546, "trainId": 276},
|
335 |
-
{"name": "mattress", "id": 1513, "trainId": 277},
|
336 |
-
{"name": "straw", "id": 2600, "trainId": 278},
|
337 |
-
{"name": "stalls", "id": 2538, "trainId": 279},
|
338 |
-
{"name": "patio, terrace", "id": 1790, "trainId": 280},
|
339 |
-
{"name": "billboard, hoarding", "id": 194, "trainId": 281},
|
340 |
-
{"name": "bus stop", "id": 326, "trainId": 282},
|
341 |
-
{"name": "trouser, pant", "id": 2877, "trainId": 283},
|
342 |
-
{"name": "console table, console", "id": 594, "trainId": 284},
|
343 |
-
{"name": "rack", "id": 2036, "trainId": 285},
|
344 |
-
{"name": "notebook", "id": 1662, "trainId": 286},
|
345 |
-
{"name": "shrine", "id": 2366, "trainId": 287},
|
346 |
-
{"name": "pantry", "id": 1754, "trainId": 288},
|
347 |
-
{"name": "cart", "id": 418, "trainId": 289},
|
348 |
-
{"name": "steam shovel", "id": 2553, "trainId": 290},
|
349 |
-
{"name": "porch", "id": 1951, "trainId": 291},
|
350 |
-
{"name": "postbox, mailbox, letter box", "id": 1963, "trainId": 292},
|
351 |
-
{"name": "figurine, statuette", "id": 918, "trainId": 293},
|
352 |
-
{"name": "recycling bin", "id": 2086, "trainId": 294},
|
353 |
-
{"name": "folding screen", "id": 997, "trainId": 295},
|
354 |
-
{"name": "telescope", "id": 2731, "trainId": 296},
|
355 |
-
{"name": "deck chair, beach chair", "id": 704, "trainId": 297},
|
356 |
-
{"name": "kennel", "id": 1365, "trainId": 298},
|
357 |
-
{"name": "coffee maker", "id": 569, "trainId": 299},
|
358 |
-
{"name": "altar, communion table, lord's table", "id": 3108, "trainId": 300},
|
359 |
-
{"name": "fish", "id": 948, "trainId": 301},
|
360 |
-
{"name": "easel", "id": 839, "trainId": 302},
|
361 |
-
{"name": "artificial golf green", "id": 63, "trainId": 303},
|
362 |
-
{"name": "iceberg", "id": 1305, "trainId": 304},
|
363 |
-
{"name": "candlestick, candle holder", "id": 378, "trainId": 305},
|
364 |
-
{"name": "shower stall, shower bath", "id": 2362, "trainId": 306},
|
365 |
-
{"name": "television stand", "id": 2734, "trainId": 307},
|
366 |
-
{
|
367 |
-
"name": "wall socket, wall plug, electric outlet, electrical outlet, outlet, electric receptacle",
|
368 |
-
"id": 2982,
|
369 |
-
"trainId": 308,
|
370 |
-
},
|
371 |
-
{"name": "skeleton", "id": 2398, "trainId": 309},
|
372 |
-
{"name": "grand piano, grand", "id": 1119, "trainId": 310},
|
373 |
-
{"name": "candy, confect", "id": 382, "trainId": 311},
|
374 |
-
{"name": "grille door", "id": 1141, "trainId": 312},
|
375 |
-
{"name": "pedestal, plinth, footstall", "id": 1805, "trainId": 313},
|
376 |
-
{"name": "jersey, t-shirt, tee shirt", "id": 3102, "trainId": 314},
|
377 |
-
{"name": "shoe", "id": 2341, "trainId": 315},
|
378 |
-
{"name": "gravestone, headstone, tombstone", "id": 1131, "trainId": 316},
|
379 |
-
{"name": "shanty", "id": 2316, "trainId": 317},
|
380 |
-
{"name": "structure", "id": 2626, "trainId": 318},
|
381 |
-
{"name": "rocking chair, rocker", "id": 3104, "trainId": 319},
|
382 |
-
{"name": "bird", "id": 198, "trainId": 320},
|
383 |
-
{"name": "place mat", "id": 1896, "trainId": 321},
|
384 |
-
{"name": "tomb", "id": 2800, "trainId": 322},
|
385 |
-
{"name": "big top", "id": 190, "trainId": 323},
|
386 |
-
{"name": "gas pump, gasoline pump, petrol pump, island dispenser", "id": 3131, "trainId": 324},
|
387 |
-
{"name": "lockers", "id": 1463, "trainId": 325},
|
388 |
-
{"name": "cage", "id": 357, "trainId": 326},
|
389 |
-
{"name": "finger", "id": 929, "trainId": 327},
|
390 |
-
{"name": "bleachers", "id": 209, "trainId": 328},
|
391 |
-
{"name": "ferris wheel", "id": 912, "trainId": 329},
|
392 |
-
{"name": "hairdresser chair", "id": 1164, "trainId": 330},
|
393 |
-
{"name": "mat", "id": 1509, "trainId": 331},
|
394 |
-
{"name": "stands", "id": 2539, "trainId": 332},
|
395 |
-
{"name": "aquarium, fish tank, marine museum", "id": 3116, "trainId": 333},
|
396 |
-
{"name": "streetcar, tram, tramcar, trolley, trolley car", "id": 2615, "trainId": 334},
|
397 |
-
{"name": "napkin, table napkin, serviette", "id": 1644, "trainId": 335},
|
398 |
-
{"name": "dummy", "id": 818, "trainId": 336},
|
399 |
-
{"name": "booklet, brochure, folder, leaflet, pamphlet", "id": 242, "trainId": 337},
|
400 |
-
{"name": "sand trap", "id": 2217, "trainId": 338},
|
401 |
-
{"name": "shop, store", "id": 2347, "trainId": 339},
|
402 |
-
{"name": "table cloth", "id": 2686, "trainId": 340},
|
403 |
-
{"name": "service station", "id": 2300, "trainId": 341},
|
404 |
-
{"name": "coffin", "id": 572, "trainId": 342},
|
405 |
-
{"name": "drawer", "id": 789, "trainId": 343},
|
406 |
-
{"name": "cages", "id": 358, "trainId": 344},
|
407 |
-
{"name": "slot machine, coin machine", "id": 2443, "trainId": 345},
|
408 |
-
{"name": "balcony", "id": 101, "trainId": 346},
|
409 |
-
{"name": "volleyball court", "id": 2969, "trainId": 347},
|
410 |
-
{"name": "table tennis", "id": 2692, "trainId": 348},
|
411 |
-
{"name": "control table", "id": 606, "trainId": 349},
|
412 |
-
{"name": "shirt", "id": 2339, "trainId": 350},
|
413 |
-
{"name": "merchandise, ware, product", "id": 1533, "trainId": 351},
|
414 |
-
{"name": "railway", "id": 2060, "trainId": 352},
|
415 |
-
{"name": "parterre", "id": 1782, "trainId": 353},
|
416 |
-
{"name": "chimney", "id": 495, "trainId": 354},
|
417 |
-
{"name": "can, tin, tin can", "id": 371, "trainId": 355},
|
418 |
-
{"name": "tanks", "id": 2707, "trainId": 356},
|
419 |
-
{"name": "fabric, cloth, material, textile", "id": 889, "trainId": 357},
|
420 |
-
{"name": "alga, algae", "id": 3156, "trainId": 358},
|
421 |
-
{"name": "system", "id": 2683, "trainId": 359},
|
422 |
-
{"name": "map", "id": 1499, "trainId": 360},
|
423 |
-
{"name": "greenhouse", "id": 1135, "trainId": 361},
|
424 |
-
{"name": "mug", "id": 1619, "trainId": 362},
|
425 |
-
{"name": "barbecue", "id": 125, "trainId": 363},
|
426 |
-
{"name": "trailer", "id": 2838, "trainId": 364},
|
427 |
-
{"name": "toilet tissue, toilet paper, bathroom tissue", "id": 2792, "trainId": 365},
|
428 |
-
{"name": "organ", "id": 1695, "trainId": 366},
|
429 |
-
{"name": "dishrag, dishcloth", "id": 746, "trainId": 367},
|
430 |
-
{"name": "island", "id": 1343, "trainId": 368},
|
431 |
-
{"name": "keyboard", "id": 1370, "trainId": 369},
|
432 |
-
{"name": "trench", "id": 2858, "trainId": 370},
|
433 |
-
{"name": "basket, basketball hoop, hoop", "id": 145, "trainId": 371},
|
434 |
-
{"name": "steering wheel, wheel", "id": 2565, "trainId": 372},
|
435 |
-
{"name": "pitcher, ewer", "id": 1892, "trainId": 373},
|
436 |
-
{"name": "goal", "id": 1103, "trainId": 374},
|
437 |
-
{"name": "bread, breadstuff, staff of life", "id": 286, "trainId": 375},
|
438 |
-
{"name": "beds", "id": 170, "trainId": 376},
|
439 |
-
{"name": "wood", "id": 3073, "trainId": 377},
|
440 |
-
{"name": "file cabinet", "id": 922, "trainId": 378},
|
441 |
-
{"name": "newspaper, paper", "id": 1655, "trainId": 379},
|
442 |
-
{"name": "motorboat", "id": 1602, "trainId": 380},
|
443 |
-
{"name": "rope", "id": 2160, "trainId": 381},
|
444 |
-
{"name": "guitar", "id": 1151, "trainId": 382},
|
445 |
-
{"name": "rubble", "id": 2176, "trainId": 383},
|
446 |
-
{"name": "scarf", "id": 2239, "trainId": 384},
|
447 |
-
{"name": "barrels", "id": 132, "trainId": 385},
|
448 |
-
{"name": "cap", "id": 394, "trainId": 386},
|
449 |
-
{"name": "leaves", "id": 1424, "trainId": 387},
|
450 |
-
{"name": "control tower", "id": 607, "trainId": 388},
|
451 |
-
{"name": "dashboard", "id": 700, "trainId": 389},
|
452 |
-
{"name": "bandstand", "id": 116, "trainId": 390},
|
453 |
-
{"name": "lectern", "id": 1425, "trainId": 391},
|
454 |
-
{"name": "switch, electric switch, electrical switch", "id": 2676, "trainId": 392},
|
455 |
-
{"name": "baseboard, mopboard, skirting board", "id": 141, "trainId": 393},
|
456 |
-
{"name": "shower room", "id": 2360, "trainId": 394},
|
457 |
-
{"name": "smoke", "id": 2449, "trainId": 395},
|
458 |
-
{"name": "faucet, spigot", "id": 897, "trainId": 396},
|
459 |
-
{"name": "bulldozer", "id": 317, "trainId": 397},
|
460 |
-
{"name": "saucepan", "id": 2228, "trainId": 398},
|
461 |
-
{"name": "shops", "id": 2351, "trainId": 399},
|
462 |
-
{"name": "meter", "id": 1543, "trainId": 400},
|
463 |
-
{"name": "crevasse", "id": 656, "trainId": 401},
|
464 |
-
{"name": "gear", "id": 1088, "trainId": 402},
|
465 |
-
{"name": "candelabrum, candelabra", "id": 373, "trainId": 403},
|
466 |
-
{"name": "sofa bed", "id": 2472, "trainId": 404},
|
467 |
-
{"name": "tunnel", "id": 2892, "trainId": 405},
|
468 |
-
{"name": "pallet", "id": 1740, "trainId": 406},
|
469 |
-
{"name": "wire, conducting wire", "id": 3067, "trainId": 407},
|
470 |
-
{"name": "kettle, boiler", "id": 1367, "trainId": 408},
|
471 |
-
{"name": "bidet", "id": 188, "trainId": 409},
|
472 |
-
{
|
473 |
-
"name": "baby buggy, baby carriage, carriage, perambulator, pram, stroller, go-cart, pushchair, pusher",
|
474 |
-
"id": 79,
|
475 |
-
"trainId": 410,
|
476 |
-
},
|
477 |
-
{"name": "music stand", "id": 1633, "trainId": 411},
|
478 |
-
{"name": "pipe, tube", "id": 1885, "trainId": 412},
|
479 |
-
{"name": "cup", "id": 677, "trainId": 413},
|
480 |
-
{"name": "parking meter", "id": 1779, "trainId": 414},
|
481 |
-
{"name": "ice hockey rink", "id": 1297, "trainId": 415},
|
482 |
-
{"name": "shelter", "id": 2334, "trainId": 416},
|
483 |
-
{"name": "weeds", "id": 3027, "trainId": 417},
|
484 |
-
{"name": "temple", "id": 2735, "trainId": 418},
|
485 |
-
{"name": "patty, cake", "id": 1791, "trainId": 419},
|
486 |
-
{"name": "ski slope", "id": 2405, "trainId": 420},
|
487 |
-
{"name": "panel", "id": 1748, "trainId": 421},
|
488 |
-
{"name": "wallet", "id": 2983, "trainId": 422},
|
489 |
-
{"name": "wheel", "id": 3035, "trainId": 423},
|
490 |
-
{"name": "towel rack, towel horse", "id": 2824, "trainId": 424},
|
491 |
-
{"name": "roundabout", "id": 2168, "trainId": 425},
|
492 |
-
{"name": "canister, cannister, tin", "id": 385, "trainId": 426},
|
493 |
-
{"name": "rod", "id": 2148, "trainId": 427},
|
494 |
-
{"name": "soap dispenser", "id": 2465, "trainId": 428},
|
495 |
-
{"name": "bell", "id": 175, "trainId": 429},
|
496 |
-
{"name": "canvas", "id": 390, "trainId": 430},
|
497 |
-
{"name": "box office, ticket office, ticket booth", "id": 268, "trainId": 431},
|
498 |
-
{"name": "teacup", "id": 2722, "trainId": 432},
|
499 |
-
{"name": "trellis", "id": 2857, "trainId": 433},
|
500 |
-
{"name": "workbench", "id": 3088, "trainId": 434},
|
501 |
-
{"name": "valley, vale", "id": 2926, "trainId": 435},
|
502 |
-
{"name": "toaster", "id": 2782, "trainId": 436},
|
503 |
-
{"name": "knife", "id": 1378, "trainId": 437},
|
504 |
-
{"name": "podium", "id": 1934, "trainId": 438},
|
505 |
-
{"name": "ramp", "id": 2072, "trainId": 439},
|
506 |
-
{"name": "tumble dryer", "id": 2889, "trainId": 440},
|
507 |
-
{"name": "fireplug, fire hydrant, plug", "id": 944, "trainId": 441},
|
508 |
-
{"name": "gym shoe, sneaker, tennis shoe", "id": 1158, "trainId": 442},
|
509 |
-
{"name": "lab bench", "id": 1383, "trainId": 443},
|
510 |
-
{"name": "equipment", "id": 867, "trainId": 444},
|
511 |
-
{"name": "rocky formation", "id": 2145, "trainId": 445},
|
512 |
-
{"name": "plastic", "id": 1915, "trainId": 446},
|
513 |
-
{"name": "calendar", "id": 361, "trainId": 447},
|
514 |
-
{"name": "caravan", "id": 402, "trainId": 448},
|
515 |
-
{"name": "check-in-desk", "id": 482, "trainId": 449},
|
516 |
-
{"name": "ticket counter", "id": 2761, "trainId": 450},
|
517 |
-
{"name": "brush", "id": 300, "trainId": 451},
|
518 |
-
{"name": "mill", "id": 1554, "trainId": 452},
|
519 |
-
{"name": "covered bridge", "id": 636, "trainId": 453},
|
520 |
-
{"name": "bowling alley", "id": 260, "trainId": 454},
|
521 |
-
{"name": "hanger", "id": 1186, "trainId": 455},
|
522 |
-
{"name": "excavator", "id": 871, "trainId": 456},
|
523 |
-
{"name": "trestle", "id": 2859, "trainId": 457},
|
524 |
-
{"name": "revolving door", "id": 2103, "trainId": 458},
|
525 |
-
{"name": "blast furnace", "id": 208, "trainId": 459},
|
526 |
-
{"name": "scale, weighing machine", "id": 2236, "trainId": 460},
|
527 |
-
{"name": "projector", "id": 2012, "trainId": 461},
|
528 |
-
{"name": "soap", "id": 2462, "trainId": 462},
|
529 |
-
{"name": "locker", "id": 1462, "trainId": 463},
|
530 |
-
{"name": "tractor", "id": 2832, "trainId": 464},
|
531 |
-
{"name": "stretcher", "id": 2617, "trainId": 465},
|
532 |
-
{"name": "frame", "id": 1024, "trainId": 466},
|
533 |
-
{"name": "grating", "id": 1129, "trainId": 467},
|
534 |
-
{"name": "alembic", "id": 18, "trainId": 468},
|
535 |
-
{"name": "candle, taper, wax light", "id": 376, "trainId": 469},
|
536 |
-
{"name": "barrier", "id": 134, "trainId": 470},
|
537 |
-
{"name": "cardboard", "id": 407, "trainId": 471},
|
538 |
-
{"name": "cave", "id": 434, "trainId": 472},
|
539 |
-
{"name": "puddle", "id": 2017, "trainId": 473},
|
540 |
-
{"name": "tarp", "id": 2717, "trainId": 474},
|
541 |
-
{"name": "price tag", "id": 2005, "trainId": 475},
|
542 |
-
{"name": "watchtower", "id": 2993, "trainId": 476},
|
543 |
-
{"name": "meters", "id": 1545, "trainId": 477},
|
544 |
-
{
|
545 |
-
"name": "light bulb, lightbulb, bulb, incandescent lamp, electric light, electric-light bulb",
|
546 |
-
"id": 1445,
|
547 |
-
"trainId": 478,
|
548 |
-
},
|
549 |
-
{"name": "tracks", "id": 2831, "trainId": 479},
|
550 |
-
{"name": "hair dryer", "id": 1161, "trainId": 480},
|
551 |
-
{"name": "skirt", "id": 2411, "trainId": 481},
|
552 |
-
{"name": "viaduct", "id": 2949, "trainId": 482},
|
553 |
-
{"name": "paper towel", "id": 1769, "trainId": 483},
|
554 |
-
{"name": "coat", "id": 552, "trainId": 484},
|
555 |
-
{"name": "sheet", "id": 2327, "trainId": 485},
|
556 |
-
{"name": "fire extinguisher, extinguisher, asphyxiator", "id": 939, "trainId": 486},
|
557 |
-
{"name": "water wheel", "id": 3013, "trainId": 487},
|
558 |
-
{"name": "pottery, clayware", "id": 1986, "trainId": 488},
|
559 |
-
{"name": "magazine rack", "id": 1486, "trainId": 489},
|
560 |
-
{"name": "teapot", "id": 2723, "trainId": 490},
|
561 |
-
{"name": "microphone, mike", "id": 1549, "trainId": 491},
|
562 |
-
{"name": "support", "id": 2649, "trainId": 492},
|
563 |
-
{"name": "forklift", "id": 1020, "trainId": 493},
|
564 |
-
{"name": "canyon", "id": 392, "trainId": 494},
|
565 |
-
{"name": "cash register, register", "id": 422, "trainId": 495},
|
566 |
-
{"name": "leaf, leafage, foliage", "id": 1419, "trainId": 496},
|
567 |
-
{"name": "remote control, remote", "id": 2099, "trainId": 497},
|
568 |
-
{"name": "soap dish", "id": 2464, "trainId": 498},
|
569 |
-
{"name": "windshield, windscreen", "id": 3058, "trainId": 499},
|
570 |
-
{"name": "cat", "id": 430, "trainId": 500},
|
571 |
-
{"name": "cue, cue stick, pool cue, pool stick", "id": 675, "trainId": 501},
|
572 |
-
{"name": "vent, venthole, vent-hole, blowhole", "id": 2941, "trainId": 502},
|
573 |
-
{"name": "videos", "id": 2955, "trainId": 503},
|
574 |
-
{"name": "shovel", "id": 2355, "trainId": 504},
|
575 |
-
{"name": "eaves", "id": 840, "trainId": 505},
|
576 |
-
{"name": "antenna, aerial, transmitting aerial", "id": 32, "trainId": 506},
|
577 |
-
{"name": "shipyard", "id": 2338, "trainId": 507},
|
578 |
-
{"name": "hen, biddy", "id": 1232, "trainId": 508},
|
579 |
-
{"name": "traffic cone", "id": 2834, "trainId": 509},
|
580 |
-
{"name": "washing machines", "id": 2991, "trainId": 510},
|
581 |
-
{"name": "truck crane", "id": 2879, "trainId": 511},
|
582 |
-
{"name": "cds", "id": 444, "trainId": 512},
|
583 |
-
{"name": "niche", "id": 1657, "trainId": 513},
|
584 |
-
{"name": "scoreboard", "id": 2246, "trainId": 514},
|
585 |
-
{"name": "briefcase", "id": 296, "trainId": 515},
|
586 |
-
{"name": "boot", "id": 245, "trainId": 516},
|
587 |
-
{"name": "sweater, jumper", "id": 2661, "trainId": 517},
|
588 |
-
{"name": "hay", "id": 1202, "trainId": 518},
|
589 |
-
{"name": "pack", "id": 1714, "trainId": 519},
|
590 |
-
{"name": "bottle rack", "id": 251, "trainId": 520},
|
591 |
-
{"name": "glacier", "id": 1095, "trainId": 521},
|
592 |
-
{"name": "pergola", "id": 1828, "trainId": 522},
|
593 |
-
{"name": "building materials", "id": 311, "trainId": 523},
|
594 |
-
{"name": "television camera", "id": 2732, "trainId": 524},
|
595 |
-
{"name": "first floor", "id": 947, "trainId": 525},
|
596 |
-
{"name": "rifle", "id": 2115, "trainId": 526},
|
597 |
-
{"name": "tennis table", "id": 2738, "trainId": 527},
|
598 |
-
{"name": "stadium", "id": 2525, "trainId": 528},
|
599 |
-
{"name": "safety belt", "id": 2194, "trainId": 529},
|
600 |
-
{"name": "cover", "id": 634, "trainId": 530},
|
601 |
-
{"name": "dish rack", "id": 740, "trainId": 531},
|
602 |
-
{"name": "synthesizer", "id": 2682, "trainId": 532},
|
603 |
-
{"name": "pumpkin", "id": 2020, "trainId": 533},
|
604 |
-
{"name": "gutter", "id": 1156, "trainId": 534},
|
605 |
-
{"name": "fruit stand", "id": 1036, "trainId": 535},
|
606 |
-
{"name": "ice floe, floe", "id": 1295, "trainId": 536},
|
607 |
-
{"name": "handle, grip, handgrip, hold", "id": 1181, "trainId": 537},
|
608 |
-
{"name": "wheelchair", "id": 3037, "trainId": 538},
|
609 |
-
{"name": "mousepad, mouse mat", "id": 1614, "trainId": 539},
|
610 |
-
{"name": "diploma", "id": 736, "trainId": 540},
|
611 |
-
{"name": "fairground ride", "id": 893, "trainId": 541},
|
612 |
-
{"name": "radio", "id": 2047, "trainId": 542},
|
613 |
-
{"name": "hotplate", "id": 1274, "trainId": 543},
|
614 |
-
{"name": "junk", "id": 1361, "trainId": 544},
|
615 |
-
{"name": "wheelbarrow", "id": 3036, "trainId": 545},
|
616 |
-
{"name": "stream", "id": 2606, "trainId": 546},
|
617 |
-
{"name": "toll plaza", "id": 2797, "trainId": 547},
|
618 |
-
{"name": "punching bag", "id": 2022, "trainId": 548},
|
619 |
-
{"name": "trough", "id": 2876, "trainId": 549},
|
620 |
-
{"name": "throne", "id": 2758, "trainId": 550},
|
621 |
-
{"name": "chair desk", "id": 472, "trainId": 551},
|
622 |
-
{"name": "weighbridge", "id": 3028, "trainId": 552},
|
623 |
-
{"name": "extractor fan", "id": 882, "trainId": 553},
|
624 |
-
{"name": "hanging clothes", "id": 1189, "trainId": 554},
|
625 |
-
{"name": "dish, dish aerial, dish antenna, saucer", "id": 743, "trainId": 555},
|
626 |
-
{"name": "alarm clock, alarm", "id": 3122, "trainId": 556},
|
627 |
-
{"name": "ski lift", "id": 2401, "trainId": 557},
|
628 |
-
{"name": "chain", "id": 468, "trainId": 558},
|
629 |
-
{"name": "garage", "id": 1061, "trainId": 559},
|
630 |
-
{"name": "mechanical shovel", "id": 1523, "trainId": 560},
|
631 |
-
{"name": "wine rack", "id": 3059, "trainId": 561},
|
632 |
-
{"name": "tramway", "id": 2843, "trainId": 562},
|
633 |
-
{"name": "treadmill", "id": 2853, "trainId": 563},
|
634 |
-
{"name": "menu", "id": 1529, "trainId": 564},
|
635 |
-
{"name": "block", "id": 214, "trainId": 565},
|
636 |
-
{"name": "well", "id": 3032, "trainId": 566},
|
637 |
-
{"name": "witness stand", "id": 3071, "trainId": 567},
|
638 |
-
{"name": "branch", "id": 277, "trainId": 568},
|
639 |
-
{"name": "duck", "id": 813, "trainId": 569},
|
640 |
-
{"name": "casserole", "id": 426, "trainId": 570},
|
641 |
-
{"name": "frying pan", "id": 1039, "trainId": 571},
|
642 |
-
{"name": "desk organizer", "id": 727, "trainId": 572},
|
643 |
-
{"name": "mast", "id": 1508, "trainId": 573},
|
644 |
-
{"name": "spectacles, specs, eyeglasses, glasses", "id": 2490, "trainId": 574},
|
645 |
-
{"name": "service elevator", "id": 2299, "trainId": 575},
|
646 |
-
{"name": "dollhouse", "id": 768, "trainId": 576},
|
647 |
-
{"name": "hammock", "id": 1172, "trainId": 577},
|
648 |
-
{"name": "clothes hanging", "id": 537, "trainId": 578},
|
649 |
-
{"name": "photocopier", "id": 1847, "trainId": 579},
|
650 |
-
{"name": "notepad", "id": 1664, "trainId": 580},
|
651 |
-
{"name": "golf cart", "id": 1110, "trainId": 581},
|
652 |
-
{"name": "footpath", "id": 1014, "trainId": 582},
|
653 |
-
{"name": "cross", "id": 662, "trainId": 583},
|
654 |
-
{"name": "baptismal font", "id": 121, "trainId": 584},
|
655 |
-
{"name": "boiler", "id": 227, "trainId": 585},
|
656 |
-
{"name": "skip", "id": 2410, "trainId": 586},
|
657 |
-
{"name": "rotisserie", "id": 2165, "trainId": 587},
|
658 |
-
{"name": "tables", "id": 2696, "trainId": 588},
|
659 |
-
{"name": "water mill", "id": 3005, "trainId": 589},
|
660 |
-
{"name": "helmet", "id": 1231, "trainId": 590},
|
661 |
-
{"name": "cover curtain", "id": 635, "trainId": 591},
|
662 |
-
{"name": "brick", "id": 292, "trainId": 592},
|
663 |
-
{"name": "table runner", "id": 2690, "trainId": 593},
|
664 |
-
{"name": "ashtray", "id": 65, "trainId": 594},
|
665 |
-
{"name": "street box", "id": 2607, "trainId": 595},
|
666 |
-
{"name": "stick", "id": 2574, "trainId": 596},
|
667 |
-
{"name": "hangers", "id": 1188, "trainId": 597},
|
668 |
-
{"name": "cells", "id": 456, "trainId": 598},
|
669 |
-
{"name": "urinal", "id": 2913, "trainId": 599},
|
670 |
-
{"name": "centerpiece", "id": 459, "trainId": 600},
|
671 |
-
{"name": "portable fridge", "id": 1955, "trainId": 601},
|
672 |
-
{"name": "dvds", "id": 827, "trainId": 602},
|
673 |
-
{"name": "golf club", "id": 1111, "trainId": 603},
|
674 |
-
{"name": "skirting board", "id": 2412, "trainId": 604},
|
675 |
-
{"name": "water cooler", "id": 2997, "trainId": 605},
|
676 |
-
{"name": "clipboard", "id": 528, "trainId": 606},
|
677 |
-
{"name": "camera, photographic camera", "id": 366, "trainId": 607},
|
678 |
-
{"name": "pigeonhole", "id": 1863, "trainId": 608},
|
679 |
-
{"name": "chips", "id": 500, "trainId": 609},
|
680 |
-
{"name": "food processor", "id": 1001, "trainId": 610},
|
681 |
-
{"name": "post box", "id": 1958, "trainId": 611},
|
682 |
-
{"name": "lid", "id": 1441, "trainId": 612},
|
683 |
-
{"name": "drum", "id": 809, "trainId": 613},
|
684 |
-
{"name": "blender", "id": 210, "trainId": 614},
|
685 |
-
{"name": "cave entrance", "id": 435, "trainId": 615},
|
686 |
-
{"name": "dental chair", "id": 718, "trainId": 616},
|
687 |
-
{"name": "obelisk", "id": 1674, "trainId": 617},
|
688 |
-
{"name": "canoe", "id": 388, "trainId": 618},
|
689 |
-
{"name": "mobile", "id": 1572, "trainId": 619},
|
690 |
-
{"name": "monitors", "id": 1584, "trainId": 620},
|
691 |
-
{"name": "pool ball", "id": 1944, "trainId": 621},
|
692 |
-
{"name": "cue rack", "id": 674, "trainId": 622},
|
693 |
-
{"name": "baggage carts", "id": 99, "trainId": 623},
|
694 |
-
{"name": "shore", "id": 2352, "trainId": 624},
|
695 |
-
{"name": "fork", "id": 1019, "trainId": 625},
|
696 |
-
{"name": "paper filer", "id": 1763, "trainId": 626},
|
697 |
-
{"name": "bicycle rack", "id": 185, "trainId": 627},
|
698 |
-
{"name": "coat rack", "id": 554, "trainId": 628},
|
699 |
-
{"name": "garland", "id": 1066, "trainId": 629},
|
700 |
-
{"name": "sports bag", "id": 2508, "trainId": 630},
|
701 |
-
{"name": "fish tank", "id": 951, "trainId": 631},
|
702 |
-
{"name": "towel dispenser", "id": 2822, "trainId": 632},
|
703 |
-
{"name": "carriage", "id": 415, "trainId": 633},
|
704 |
-
{"name": "brochure", "id": 297, "trainId": 634},
|
705 |
-
{"name": "plaque", "id": 1914, "trainId": 635},
|
706 |
-
{"name": "stringer", "id": 2619, "trainId": 636},
|
707 |
-
{"name": "iron", "id": 1338, "trainId": 637},
|
708 |
-
{"name": "spoon", "id": 2505, "trainId": 638},
|
709 |
-
{"name": "flag pole", "id": 955, "trainId": 639},
|
710 |
-
{"name": "toilet brush", "id": 2786, "trainId": 640},
|
711 |
-
{"name": "book stand", "id": 238, "trainId": 641},
|
712 |
-
{"name": "water faucet, water tap, tap, hydrant", "id": 3000, "trainId": 642},
|
713 |
-
{"name": "ticket office", "id": 2763, "trainId": 643},
|
714 |
-
{"name": "broom", "id": 299, "trainId": 644},
|
715 |
-
{"name": "dvd", "id": 822, "trainId": 645},
|
716 |
-
{"name": "ice bucket", "id": 1288, "trainId": 646},
|
717 |
-
{"name": "carapace, shell, cuticle, shield", "id": 3101, "trainId": 647},
|
718 |
-
{"name": "tureen", "id": 2894, "trainId": 648},
|
719 |
-
{"name": "folders", "id": 992, "trainId": 649},
|
720 |
-
{"name": "chess", "id": 489, "trainId": 650},
|
721 |
-
{"name": "root", "id": 2157, "trainId": 651},
|
722 |
-
{"name": "sewing machine", "id": 2309, "trainId": 652},
|
723 |
-
{"name": "model", "id": 1576, "trainId": 653},
|
724 |
-
{"name": "pen", "id": 1810, "trainId": 654},
|
725 |
-
{"name": "violin", "id": 2964, "trainId": 655},
|
726 |
-
{"name": "sweatshirt", "id": 2662, "trainId": 656},
|
727 |
-
{"name": "recycling materials", "id": 2087, "trainId": 657},
|
728 |
-
{"name": "mitten", "id": 1569, "trainId": 658},
|
729 |
-
{"name": "chopping board, cutting board", "id": 503, "trainId": 659},
|
730 |
-
{"name": "mask", "id": 1505, "trainId": 660},
|
731 |
-
{"name": "log", "id": 1468, "trainId": 661},
|
732 |
-
{"name": "mouse, computer mouse", "id": 1613, "trainId": 662},
|
733 |
-
{"name": "grill", "id": 1138, "trainId": 663},
|
734 |
-
{"name": "hole", "id": 1256, "trainId": 664},
|
735 |
-
{"name": "target", "id": 2715, "trainId": 665},
|
736 |
-
{"name": "trash bag", "id": 2846, "trainId": 666},
|
737 |
-
{"name": "chalk", "id": 477, "trainId": 667},
|
738 |
-
{"name": "sticks", "id": 2576, "trainId": 668},
|
739 |
-
{"name": "balloon", "id": 108, "trainId": 669},
|
740 |
-
{"name": "score", "id": 2245, "trainId": 670},
|
741 |
-
{"name": "hair spray", "id": 1162, "trainId": 671},
|
742 |
-
{"name": "roll", "id": 2149, "trainId": 672},
|
743 |
-
{"name": "runner", "id": 2183, "trainId": 673},
|
744 |
-
{"name": "engine", "id": 858, "trainId": 674},
|
745 |
-
{"name": "inflatable glove", "id": 1324, "trainId": 675},
|
746 |
-
{"name": "games", "id": 1055, "trainId": 676},
|
747 |
-
{"name": "pallets", "id": 1741, "trainId": 677},
|
748 |
-
{"name": "baskets", "id": 149, "trainId": 678},
|
749 |
-
{"name": "coop", "id": 615, "trainId": 679},
|
750 |
-
{"name": "dvd player", "id": 825, "trainId": 680},
|
751 |
-
{"name": "rocking horse", "id": 2143, "trainId": 681},
|
752 |
-
{"name": "buckets", "id": 304, "trainId": 682},
|
753 |
-
{"name": "bread rolls", "id": 283, "trainId": 683},
|
754 |
-
{"name": "shawl", "id": 2322, "trainId": 684},
|
755 |
-
{"name": "watering can", "id": 3017, "trainId": 685},
|
756 |
-
{"name": "spotlights", "id": 2510, "trainId": 686},
|
757 |
-
{"name": "post-it", "id": 1960, "trainId": 687},
|
758 |
-
{"name": "bowls", "id": 265, "trainId": 688},
|
759 |
-
{"name": "security camera", "id": 2282, "trainId": 689},
|
760 |
-
{"name": "runner cloth", "id": 2184, "trainId": 690},
|
761 |
-
{"name": "lock", "id": 1461, "trainId": 691},
|
762 |
-
{"name": "alarm, warning device, alarm system", "id": 3113, "trainId": 692},
|
763 |
-
{"name": "side", "id": 2372, "trainId": 693},
|
764 |
-
{"name": "roulette", "id": 2166, "trainId": 694},
|
765 |
-
{"name": "bone", "id": 232, "trainId": 695},
|
766 |
-
{"name": "cutlery", "id": 693, "trainId": 696},
|
767 |
-
{"name": "pool balls", "id": 1945, "trainId": 697},
|
768 |
-
{"name": "wheels", "id": 3039, "trainId": 698},
|
769 |
-
{"name": "spice rack", "id": 2494, "trainId": 699},
|
770 |
-
{"name": "plant pots", "id": 1908, "trainId": 700},
|
771 |
-
{"name": "towel ring", "id": 2827, "trainId": 701},
|
772 |
-
{"name": "bread box", "id": 280, "trainId": 702},
|
773 |
-
{"name": "video", "id": 2950, "trainId": 703},
|
774 |
-
{"name": "funfair", "id": 1044, "trainId": 704},
|
775 |
-
{"name": "breads", "id": 288, "trainId": 705},
|
776 |
-
{"name": "tripod", "id": 2863, "trainId": 706},
|
777 |
-
{"name": "ironing board", "id": 1342, "trainId": 707},
|
778 |
-
{"name": "skimmer", "id": 2409, "trainId": 708},
|
779 |
-
{"name": "hollow", "id": 1258, "trainId": 709},
|
780 |
-
{"name": "scratching post", "id": 2249, "trainId": 710},
|
781 |
-
{"name": "tricycle", "id": 2862, "trainId": 711},
|
782 |
-
{"name": "file box", "id": 920, "trainId": 712},
|
783 |
-
{"name": "mountain pass", "id": 1607, "trainId": 713},
|
784 |
-
{"name": "tombstones", "id": 2802, "trainId": 714},
|
785 |
-
{"name": "cooker", "id": 610, "trainId": 715},
|
786 |
-
{"name": "card game, cards", "id": 3129, "trainId": 716},
|
787 |
-
{"name": "golf bag", "id": 1108, "trainId": 717},
|
788 |
-
{"name": "towel paper", "id": 2823, "trainId": 718},
|
789 |
-
{"name": "chaise lounge", "id": 476, "trainId": 719},
|
790 |
-
{"name": "sun", "id": 2641, "trainId": 720},
|
791 |
-
{"name": "toilet paper holder", "id": 2788, "trainId": 721},
|
792 |
-
{"name": "rake", "id": 2070, "trainId": 722},
|
793 |
-
{"name": "key", "id": 1368, "trainId": 723},
|
794 |
-
{"name": "umbrella stand", "id": 2903, "trainId": 724},
|
795 |
-
{"name": "dartboard", "id": 699, "trainId": 725},
|
796 |
-
{"name": "transformer", "id": 2844, "trainId": 726},
|
797 |
-
{"name": "fireplace utensils", "id": 942, "trainId": 727},
|
798 |
-
{"name": "sweatshirts", "id": 2663, "trainId": 728},
|
799 |
-
{
|
800 |
-
"name": "cellular telephone, cellular phone, cellphone, cell, mobile phone",
|
801 |
-
"id": 457,
|
802 |
-
"trainId": 729,
|
803 |
-
},
|
804 |
-
{"name": "tallboy", "id": 2701, "trainId": 730},
|
805 |
-
{"name": "stapler", "id": 2540, "trainId": 731},
|
806 |
-
{"name": "sauna", "id": 2231, "trainId": 732},
|
807 |
-
{"name": "test tube", "id": 2746, "trainId": 733},
|
808 |
-
{"name": "palette", "id": 1738, "trainId": 734},
|
809 |
-
{"name": "shopping carts", "id": 2350, "trainId": 735},
|
810 |
-
{"name": "tools", "id": 2808, "trainId": 736},
|
811 |
-
{"name": "push button, push, button", "id": 2025, "trainId": 737},
|
812 |
-
{"name": "star", "id": 2541, "trainId": 738},
|
813 |
-
{"name": "roof rack", "id": 2156, "trainId": 739},
|
814 |
-
{"name": "barbed wire", "id": 126, "trainId": 740},
|
815 |
-
{"name": "spray", "id": 2512, "trainId": 741},
|
816 |
-
{"name": "ear", "id": 831, "trainId": 742},
|
817 |
-
{"name": "sponge", "id": 2503, "trainId": 743},
|
818 |
-
{"name": "racket", "id": 2039, "trainId": 744},
|
819 |
-
{"name": "tins", "id": 2774, "trainId": 745},
|
820 |
-
{"name": "eyeglasses", "id": 886, "trainId": 746},
|
821 |
-
{"name": "file", "id": 919, "trainId": 747},
|
822 |
-
{"name": "scarfs", "id": 2240, "trainId": 748},
|
823 |
-
{"name": "sugar bowl", "id": 2636, "trainId": 749},
|
824 |
-
{"name": "flip flop", "id": 963, "trainId": 750},
|
825 |
-
{"name": "headstones", "id": 1218, "trainId": 751},
|
826 |
-
{"name": "laptop bag", "id": 1406, "trainId": 752},
|
827 |
-
{"name": "leash", "id": 1420, "trainId": 753},
|
828 |
-
{"name": "climbing frame", "id": 526, "trainId": 754},
|
829 |
-
{"name": "suit hanger", "id": 2639, "trainId": 755},
|
830 |
-
{"name": "floor spotlight", "id": 975, "trainId": 756},
|
831 |
-
{"name": "plate rack", "id": 1921, "trainId": 757},
|
832 |
-
{"name": "sewer", "id": 2305, "trainId": 758},
|
833 |
-
{"name": "hard drive", "id": 1193, "trainId": 759},
|
834 |
-
{"name": "sprinkler", "id": 2517, "trainId": 760},
|
835 |
-
{"name": "tools box", "id": 2809, "trainId": 761},
|
836 |
-
{"name": "necklace", "id": 1647, "trainId": 762},
|
837 |
-
{"name": "bulbs", "id": 314, "trainId": 763},
|
838 |
-
{"name": "steel industry", "id": 2560, "trainId": 764},
|
839 |
-
{"name": "club", "id": 545, "trainId": 765},
|
840 |
-
{"name": "jack", "id": 1345, "trainId": 766},
|
841 |
-
{"name": "door bars", "id": 775, "trainId": 767},
|
842 |
-
{
|
843 |
-
"name": "control panel, instrument panel, control board, board, panel",
|
844 |
-
"id": 603,
|
845 |
-
"trainId": 768,
|
846 |
-
},
|
847 |
-
{"name": "hairbrush", "id": 1163, "trainId": 769},
|
848 |
-
{"name": "napkin holder", "id": 1641, "trainId": 770},
|
849 |
-
{"name": "office", "id": 1678, "trainId": 771},
|
850 |
-
{"name": "smoke detector", "id": 2450, "trainId": 772},
|
851 |
-
{"name": "utensils", "id": 2915, "trainId": 773},
|
852 |
-
{"name": "apron", "id": 42, "trainId": 774},
|
853 |
-
{"name": "scissors", "id": 2242, "trainId": 775},
|
854 |
-
{"name": "terminal", "id": 2741, "trainId": 776},
|
855 |
-
{"name": "grinder", "id": 1143, "trainId": 777},
|
856 |
-
{"name": "entry phone", "id": 862, "trainId": 778},
|
857 |
-
{"name": "newspaper stand", "id": 1654, "trainId": 779},
|
858 |
-
{"name": "pepper shaker", "id": 1826, "trainId": 780},
|
859 |
-
{"name": "onions", "id": 1689, "trainId": 781},
|
860 |
-
{
|
861 |
-
"name": "central processing unit, cpu, c p u , central processor, processor, mainframe",
|
862 |
-
"id": 3124,
|
863 |
-
"trainId": 782,
|
864 |
-
},
|
865 |
-
{"name": "tape", "id": 2710, "trainId": 783},
|
866 |
-
{"name": "bat", "id": 152, "trainId": 784},
|
867 |
-
{"name": "coaster", "id": 549, "trainId": 785},
|
868 |
-
{"name": "calculator", "id": 360, "trainId": 786},
|
869 |
-
{"name": "potatoes", "id": 1982, "trainId": 787},
|
870 |
-
{"name": "luggage rack", "id": 1478, "trainId": 788},
|
871 |
-
{"name": "salt", "id": 2203, "trainId": 789},
|
872 |
-
{"name": "street number", "id": 2612, "trainId": 790},
|
873 |
-
{"name": "viewpoint", "id": 2956, "trainId": 791},
|
874 |
-
{"name": "sword", "id": 2681, "trainId": 792},
|
875 |
-
{"name": "cd", "id": 437, "trainId": 793},
|
876 |
-
{"name": "rowing machine", "id": 2171, "trainId": 794},
|
877 |
-
{"name": "plug", "id": 1933, "trainId": 795},
|
878 |
-
{"name": "andiron, firedog, dog, dog-iron", "id": 3110, "trainId": 796},
|
879 |
-
{"name": "pepper", "id": 1824, "trainId": 797},
|
880 |
-
{"name": "tongs", "id": 2803, "trainId": 798},
|
881 |
-
{"name": "bonfire", "id": 234, "trainId": 799},
|
882 |
-
{"name": "dog dish", "id": 764, "trainId": 800},
|
883 |
-
{"name": "belt", "id": 177, "trainId": 801},
|
884 |
-
{"name": "dumbbells", "id": 817, "trainId": 802},
|
885 |
-
{"name": "videocassette recorder, vcr", "id": 3145, "trainId": 803},
|
886 |
-
{"name": "hook", "id": 1262, "trainId": 804},
|
887 |
-
{"name": "envelopes", "id": 864, "trainId": 805},
|
888 |
-
{"name": "shower faucet", "id": 2359, "trainId": 806},
|
889 |
-
{"name": "watch", "id": 2992, "trainId": 807},
|
890 |
-
{"name": "padlock", "id": 1725, "trainId": 808},
|
891 |
-
{"name": "swimming pool ladder", "id": 2667, "trainId": 809},
|
892 |
-
{"name": "spanners", "id": 2484, "trainId": 810},
|
893 |
-
{"name": "gravy boat", "id": 1133, "trainId": 811},
|
894 |
-
{"name": "notice board", "id": 1667, "trainId": 812},
|
895 |
-
{"name": "trash bags", "id": 2847, "trainId": 813},
|
896 |
-
{"name": "fire alarm", "id": 932, "trainId": 814},
|
897 |
-
{"name": "ladle", "id": 1392, "trainId": 815},
|
898 |
-
{"name": "stethoscope", "id": 2573, "trainId": 816},
|
899 |
-
{"name": "rocket", "id": 2140, "trainId": 817},
|
900 |
-
{"name": "funnel", "id": 1046, "trainId": 818},
|
901 |
-
{"name": "bowling pins", "id": 264, "trainId": 819},
|
902 |
-
{"name": "valve", "id": 2927, "trainId": 820},
|
903 |
-
{"name": "thermometer", "id": 2752, "trainId": 821},
|
904 |
-
{"name": "cups", "id": 679, "trainId": 822},
|
905 |
-
{"name": "spice jar", "id": 2493, "trainId": 823},
|
906 |
-
{"name": "night light", "id": 1658, "trainId": 824},
|
907 |
-
{"name": "soaps", "id": 2466, "trainId": 825},
|
908 |
-
{"name": "games table", "id": 1057, "trainId": 826},
|
909 |
-
{"name": "slotted spoon", "id": 2444, "trainId": 827},
|
910 |
-
{"name": "reel", "id": 2093, "trainId": 828},
|
911 |
-
{"name": "scourer", "id": 2248, "trainId": 829},
|
912 |
-
{"name": "sleeping robe", "id": 2432, "trainId": 830},
|
913 |
-
{"name": "desk mat", "id": 726, "trainId": 831},
|
914 |
-
{"name": "dumbbell", "id": 816, "trainId": 832},
|
915 |
-
{"name": "hammer", "id": 1171, "trainId": 833},
|
916 |
-
{"name": "tie", "id": 2766, "trainId": 834},
|
917 |
-
{"name": "typewriter", "id": 2900, "trainId": 835},
|
918 |
-
{"name": "shaker", "id": 2313, "trainId": 836},
|
919 |
-
{"name": "cheese dish", "id": 488, "trainId": 837},
|
920 |
-
{"name": "sea star", "id": 2265, "trainId": 838},
|
921 |
-
{"name": "racquet", "id": 2043, "trainId": 839},
|
922 |
-
{"name": "butane gas cylinder", "id": 332, "trainId": 840},
|
923 |
-
{"name": "paper weight", "id": 1771, "trainId": 841},
|
924 |
-
{"name": "shaving brush", "id": 2320, "trainId": 842},
|
925 |
-
{"name": "sunglasses", "id": 2646, "trainId": 843},
|
926 |
-
{"name": "gear shift", "id": 1089, "trainId": 844},
|
927 |
-
{"name": "towel rail", "id": 2826, "trainId": 845},
|
928 |
-
{"name": "adding machine, totalizer, totaliser", "id": 3148, "trainId": 846},
|
929 |
-
]
|
930 |
-
|
931 |
-
|
932 |
-
def loadAde20K(file):
|
933 |
-
fileseg = file.replace(".jpg", "_seg.png")
|
934 |
-
with Image.open(fileseg) as io:
|
935 |
-
seg = np.array(io)
|
936 |
-
|
937 |
-
R = seg[:, :, 0]
|
938 |
-
G = seg[:, :, 1]
|
939 |
-
ObjectClassMasks = (R / 10).astype(np.int32) * 256 + (G.astype(np.int32))
|
940 |
-
|
941 |
-
return {"img_name": file, "segm_name": fileseg, "class_mask": ObjectClassMasks}
|
942 |
-
|
943 |
-
|
944 |
-
if __name__ == "__main__":
|
945 |
-
dataset_dir = Path(os.getenv("DETECTRON2_DATASETS", "datasets"))
|
946 |
-
index_file = dataset_dir / "ADE20K_2021_17_01" / "index_ade20k.pkl"
|
947 |
-
print('Caution: we only generate the validation set!')
|
948 |
-
with open(index_file, "rb") as f:
|
949 |
-
index_ade20k = pkl.load(f)
|
950 |
-
|
951 |
-
id_map = {}
|
952 |
-
for cat in ADE20K_SEM_SEG_FULL_CATEGORIES:
|
953 |
-
id_map[cat["id"]] = cat["trainId"]
|
954 |
-
|
955 |
-
# make output dir
|
956 |
-
for name in ["training", "validation"]:
|
957 |
-
image_dir = dataset_dir / "ADE20K_2021_17_01" / "images_detectron2" / name
|
958 |
-
image_dir.mkdir(parents=True, exist_ok=True)
|
959 |
-
annotation_dir = dataset_dir / "ADE20K_2021_17_01" / "annotations_detectron2" / name
|
960 |
-
annotation_dir.mkdir(parents=True, exist_ok=True)
|
961 |
-
|
962 |
-
# process image and gt
|
963 |
-
for i, (folder_name, file_name) in tqdm.tqdm(
|
964 |
-
enumerate(zip(index_ade20k["folder"], index_ade20k["filename"])),
|
965 |
-
total=len(index_ade20k["filename"]),
|
966 |
-
):
|
967 |
-
split = "validation" if file_name.split("_")[1] == "val" else "training"
|
968 |
-
if split == 'training':
|
969 |
-
# FIXME: If you want to generate training set, delete this condition
|
970 |
-
continue
|
971 |
-
info = loadAde20K(str(dataset_dir / folder_name / file_name))
|
972 |
-
|
973 |
-
# resize image and label
|
974 |
-
img = np.asarray(Image.open(info["img_name"]))
|
975 |
-
lab = np.asarray(info["class_mask"])
|
976 |
-
|
977 |
-
h, w = img.shape[0], img.shape[1]
|
978 |
-
max_size = 512
|
979 |
-
resize = True
|
980 |
-
if w >= h > max_size:
|
981 |
-
h_new, w_new = max_size, round(w / float(h) * max_size)
|
982 |
-
elif h >= w > max_size:
|
983 |
-
h_new, w_new = round(h / float(w) * max_size), max_size
|
984 |
-
else:
|
985 |
-
resize = False
|
986 |
-
|
987 |
-
if resize:
|
988 |
-
img = cv2.resize(img, (w_new, h_new), interpolation=cv2.INTER_LINEAR)
|
989 |
-
lab = cv2.resize(lab, (w_new, h_new), interpolation=cv2.INTER_NEAREST)
|
990 |
-
|
991 |
-
assert img.dtype == np.uint8
|
992 |
-
assert lab.dtype == np.int32
|
993 |
-
|
994 |
-
# apply label conversion and save into uint16 images
|
995 |
-
output = np.zeros_like(lab, dtype=np.uint16) + 65535
|
996 |
-
for obj_id in np.unique(lab):
|
997 |
-
if obj_id in id_map:
|
998 |
-
output[lab == obj_id] = id_map[obj_id]
|
999 |
-
|
1000 |
-
output_img = dataset_dir / "ADE20K_2021_17_01" / "images_detectron2" / split / file_name
|
1001 |
-
output_lab = (
|
1002 |
-
dataset_dir
|
1003 |
-
/ "ADE20K_2021_17_01"
|
1004 |
-
/ "annotations_detectron2"
|
1005 |
-
/ split
|
1006 |
-
/ file_name.replace(".jpg", ".tif")
|
1007 |
-
)
|
1008 |
-
Image.fromarray(img).save(output_img)
|
1009 |
-
|
1010 |
-
assert output.dtype == np.uint16
|
1011 |
-
Image.fromarray(output).save(output_lab)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
datasets/prepare_ade20k_sem_seg.py
DELETED
@@ -1,35 +0,0 @@
|
|
1 |
-
# Copyright (c) Facebook, Inc. and its affiliates.
|
2 |
-
# Copyright (c) Meta Platforms, Inc. All Rights Reserved
|
3 |
-
|
4 |
-
import os
|
5 |
-
from pathlib import Path
|
6 |
-
|
7 |
-
import numpy as np
|
8 |
-
import tqdm
|
9 |
-
from PIL import Image
|
10 |
-
|
11 |
-
|
12 |
-
def convert(input, output, index=None):
|
13 |
-
img = np.asarray(Image.open(input))
|
14 |
-
assert img.dtype == np.uint8
|
15 |
-
img = img - 1 # 0 (ignore) becomes 255. others are shifted by 1
|
16 |
-
if index is not None:
|
17 |
-
mapping = {i: k for k, i in enumerate(index)}
|
18 |
-
img = np.vectorize(lambda x: mapping[x] if x in mapping else 255)(
|
19 |
-
img.astype(np.float)
|
20 |
-
).astype(np.uint8)
|
21 |
-
Image.fromarray(img).save(output)
|
22 |
-
|
23 |
-
|
24 |
-
if __name__ == "__main__":
|
25 |
-
dataset_dir = (
|
26 |
-
Path(os.getenv("DETECTRON2_DATASETS", "datasets")) / "ADEChallengeData2016"
|
27 |
-
)
|
28 |
-
print('Caution: we only generate the validation set!')
|
29 |
-
for name in ["validation"]:
|
30 |
-
annotation_dir = dataset_dir / "annotations" / name
|
31 |
-
output_dir = dataset_dir / "annotations_detectron2" / name
|
32 |
-
output_dir.mkdir(parents=True, exist_ok=True)
|
33 |
-
for file in tqdm.tqdm(list(annotation_dir.iterdir())):
|
34 |
-
output_file = output_dir / file.name
|
35 |
-
convert(file, output_file)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
datasets/prepare_coco_stuff_sem_seg.py
DELETED
@@ -1,219 +0,0 @@
|
|
1 |
-
# Copyright (c) Facebook, Inc. and its affiliates.
|
2 |
-
# Copyright (c) Meta Platforms, Inc. All Rights Reserved
|
3 |
-
# Modified by Feng Liang from
|
4 |
-
# https://github.com/MendelXu/zsseg.baseline/blob/master/datasets/prepare_coco_stuff_164k_sem_seg.py
|
5 |
-
|
6 |
-
import os
|
7 |
-
import os.path as osp
|
8 |
-
from pathlib import Path
|
9 |
-
import tqdm
|
10 |
-
from glob import glob
|
11 |
-
|
12 |
-
import numpy as np
|
13 |
-
from PIL import Image
|
14 |
-
|
15 |
-
|
16 |
-
full_clsID_to_trID = {
|
17 |
-
0: 0,
|
18 |
-
1: 1,
|
19 |
-
2: 2,
|
20 |
-
3: 3,
|
21 |
-
4: 4,
|
22 |
-
5: 5,
|
23 |
-
6: 6,
|
24 |
-
7: 7,
|
25 |
-
8: 8,
|
26 |
-
9: 9,
|
27 |
-
10: 10,
|
28 |
-
12: 11,
|
29 |
-
13: 12,
|
30 |
-
14: 13,
|
31 |
-
15: 14,
|
32 |
-
16: 15,
|
33 |
-
17: 16,
|
34 |
-
18: 17,
|
35 |
-
19: 18,
|
36 |
-
20: 19,
|
37 |
-
21: 20,
|
38 |
-
22: 21,
|
39 |
-
23: 22,
|
40 |
-
24: 23,
|
41 |
-
26: 24,
|
42 |
-
27: 25,
|
43 |
-
30: 26,
|
44 |
-
31: 27,
|
45 |
-
32: 28,
|
46 |
-
33: 29,
|
47 |
-
34: 30,
|
48 |
-
35: 31,
|
49 |
-
36: 32,
|
50 |
-
37: 33,
|
51 |
-
38: 34,
|
52 |
-
39: 35,
|
53 |
-
40: 36,
|
54 |
-
41: 37,
|
55 |
-
42: 38,
|
56 |
-
43: 39,
|
57 |
-
45: 40,
|
58 |
-
46: 41,
|
59 |
-
47: 42,
|
60 |
-
48: 43,
|
61 |
-
49: 44,
|
62 |
-
50: 45,
|
63 |
-
51: 46,
|
64 |
-
52: 47,
|
65 |
-
53: 48,
|
66 |
-
54: 49,
|
67 |
-
55: 50,
|
68 |
-
56: 51,
|
69 |
-
57: 52,
|
70 |
-
58: 53,
|
71 |
-
59: 54,
|
72 |
-
60: 55,
|
73 |
-
61: 56,
|
74 |
-
62: 57,
|
75 |
-
63: 58,
|
76 |
-
64: 59,
|
77 |
-
66: 60,
|
78 |
-
69: 61,
|
79 |
-
71: 62,
|
80 |
-
72: 63,
|
81 |
-
73: 64,
|
82 |
-
74: 65,
|
83 |
-
75: 66,
|
84 |
-
76: 67,
|
85 |
-
77: 68,
|
86 |
-
78: 69,
|
87 |
-
79: 70,
|
88 |
-
80: 71,
|
89 |
-
81: 72,
|
90 |
-
83: 73,
|
91 |
-
84: 74,
|
92 |
-
85: 75,
|
93 |
-
86: 76,
|
94 |
-
87: 77,
|
95 |
-
88: 78,
|
96 |
-
89: 79,
|
97 |
-
91: 80,
|
98 |
-
92: 81,
|
99 |
-
93: 82,
|
100 |
-
94: 83,
|
101 |
-
95: 84,
|
102 |
-
96: 85,
|
103 |
-
97: 86,
|
104 |
-
98: 87,
|
105 |
-
99: 88,
|
106 |
-
100: 89,
|
107 |
-
101: 90,
|
108 |
-
102: 91,
|
109 |
-
103: 92,
|
110 |
-
104: 93,
|
111 |
-
105: 94,
|
112 |
-
106: 95,
|
113 |
-
107: 96,
|
114 |
-
108: 97,
|
115 |
-
109: 98,
|
116 |
-
110: 99,
|
117 |
-
111: 100,
|
118 |
-
112: 101,
|
119 |
-
113: 102,
|
120 |
-
114: 103,
|
121 |
-
115: 104,
|
122 |
-
116: 105,
|
123 |
-
117: 106,
|
124 |
-
118: 107,
|
125 |
-
119: 108,
|
126 |
-
120: 109,
|
127 |
-
121: 110,
|
128 |
-
122: 111,
|
129 |
-
123: 112,
|
130 |
-
124: 113,
|
131 |
-
125: 114,
|
132 |
-
126: 115,
|
133 |
-
127: 116,
|
134 |
-
128: 117,
|
135 |
-
129: 118,
|
136 |
-
130: 119,
|
137 |
-
131: 120,
|
138 |
-
132: 121,
|
139 |
-
133: 122,
|
140 |
-
134: 123,
|
141 |
-
135: 124,
|
142 |
-
136: 125,
|
143 |
-
137: 126,
|
144 |
-
138: 127,
|
145 |
-
139: 128,
|
146 |
-
140: 129,
|
147 |
-
141: 130,
|
148 |
-
142: 131,
|
149 |
-
143: 132,
|
150 |
-
144: 133,
|
151 |
-
145: 134,
|
152 |
-
146: 135,
|
153 |
-
147: 136,
|
154 |
-
148: 137,
|
155 |
-
149: 138,
|
156 |
-
150: 139,
|
157 |
-
151: 140,
|
158 |
-
152: 141,
|
159 |
-
153: 142,
|
160 |
-
154: 143,
|
161 |
-
155: 144,
|
162 |
-
156: 145,
|
163 |
-
157: 146,
|
164 |
-
158: 147,
|
165 |
-
159: 148,
|
166 |
-
160: 149,
|
167 |
-
161: 150,
|
168 |
-
162: 151,
|
169 |
-
163: 152,
|
170 |
-
164: 153,
|
171 |
-
165: 154,
|
172 |
-
166: 155,
|
173 |
-
167: 156,
|
174 |
-
168: 157,
|
175 |
-
169: 158,
|
176 |
-
170: 159,
|
177 |
-
171: 160,
|
178 |
-
172: 161,
|
179 |
-
173: 162,
|
180 |
-
174: 163,
|
181 |
-
175: 164,
|
182 |
-
176: 165,
|
183 |
-
177: 166,
|
184 |
-
178: 167,
|
185 |
-
179: 168,
|
186 |
-
180: 169,
|
187 |
-
181: 170,
|
188 |
-
255: 255,
|
189 |
-
}
|
190 |
-
|
191 |
-
def convert_to_trainID(
|
192 |
-
maskpath, out_mask_dir, is_train, clsID_to_trID=full_clsID_to_trID, suffix=""
|
193 |
-
):
|
194 |
-
mask = np.array(Image.open(maskpath))
|
195 |
-
mask_copy = np.ones_like(mask, dtype=np.uint8) * 255
|
196 |
-
for clsID, trID in clsID_to_trID.items():
|
197 |
-
mask_copy[mask == clsID] = trID
|
198 |
-
seg_filename = (
|
199 |
-
osp.join(out_mask_dir, "train2017" + suffix, osp.basename(maskpath))
|
200 |
-
if is_train
|
201 |
-
else osp.join(out_mask_dir, "val2017" + suffix, osp.basename(maskpath))
|
202 |
-
)
|
203 |
-
if len(np.unique(mask_copy)) == 1 and np.unique(mask_copy)[0] == 255:
|
204 |
-
return
|
205 |
-
Image.fromarray(mask_copy).save(seg_filename, "PNG")
|
206 |
-
|
207 |
-
|
208 |
-
|
209 |
-
if __name__ == "__main__":
|
210 |
-
dataset_dir = Path(os.getenv("DETECTRON2_DATASETS", "datasets"))
|
211 |
-
print('Caution: we only generate the training set!')
|
212 |
-
coco_path = dataset_dir / "coco"
|
213 |
-
mask_dir = coco_path / "stuffthingmaps"
|
214 |
-
out_mask_dir = coco_path / "stuffthingmaps_detectron2"
|
215 |
-
for name in ["train2017"]:
|
216 |
-
os.makedirs((out_mask_dir / name), exist_ok=True)
|
217 |
-
train_list = glob(osp.join(mask_dir, "train2017", "*.png"))
|
218 |
-
for file in tqdm.tqdm(train_list):
|
219 |
-
convert_to_trainID(file, out_mask_dir, is_train=True)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
datasets/prepare_pascal_context.py
DELETED
@@ -1,69 +0,0 @@
|
|
1 |
-
# Copyright (c) Facebook, Inc. and its affiliates.
|
2 |
-
# Copyright (c) Meta Platforms, Inc. All Rights Reserved
|
3 |
-
|
4 |
-
import tqdm
|
5 |
-
import os
|
6 |
-
import os.path as osp
|
7 |
-
from pathlib import Path
|
8 |
-
|
9 |
-
import numpy as np
|
10 |
-
from PIL import Image
|
11 |
-
import scipy.io
|
12 |
-
|
13 |
-
def convert_pc59(mask_path, new_mask_path, pc59_dict):
|
14 |
-
mat = scipy.io.loadmat(mask_path)
|
15 |
-
mask = mat['LabelMap']
|
16 |
-
|
17 |
-
mask_copy = np.ones_like(mask, dtype=np.uint8) * 255
|
18 |
-
for trID, clsID in pc59_dict.items():
|
19 |
-
mask_copy[mask == clsID] = trID
|
20 |
-
|
21 |
-
min_value = np.amin(mask_copy)
|
22 |
-
assert min_value >= 0, print(min_value)
|
23 |
-
Image.fromarray(mask_copy).save(new_mask_path, "PNG")
|
24 |
-
|
25 |
-
def convert_pc459(mask_path, new_mask_path):
|
26 |
-
mat = scipy.io.loadmat(mask_path)
|
27 |
-
mask = mat['LabelMap']
|
28 |
-
mask = mask - 1
|
29 |
-
min_value = np.amin(mask)
|
30 |
-
assert min_value >= 0, print(min_value)
|
31 |
-
Image.fromarray(mask).save(new_mask_path, "TIFF")
|
32 |
-
|
33 |
-
|
34 |
-
if __name__ == "__main__":
|
35 |
-
dataset_dir = Path(os.getenv("DETECTRON2_DATASETS", "datasets"))
|
36 |
-
print('Caution: we only generate the validation set!')
|
37 |
-
pc_path = dataset_dir / "VOCdevkit/VOC2010"
|
38 |
-
|
39 |
-
val_list = open(pc_path / "pascalcontext_val.txt", "r")
|
40 |
-
pc459_labels = open(pc_path / "labels.txt", "r")
|
41 |
-
pc59_labels = open(pc_path / "59_labels.txt", "r")
|
42 |
-
|
43 |
-
pc459_dict = {}
|
44 |
-
for line in pc459_labels.readlines():
|
45 |
-
if ':' in line:
|
46 |
-
idx, name = line.split(':')
|
47 |
-
idx = int(idx.strip())
|
48 |
-
name = name.strip()
|
49 |
-
pc459_dict[name] = idx
|
50 |
-
|
51 |
-
pc59_dict = {}
|
52 |
-
for i, line in enumerate(pc59_labels.readlines()):
|
53 |
-
name = line.split(':')[-1].strip()
|
54 |
-
if name is not '':
|
55 |
-
pc59_dict[i] = pc459_dict[name]
|
56 |
-
|
57 |
-
pc459_dir = pc_path / "annotations_detectron2" / "pc459_val"
|
58 |
-
pc459_dir.mkdir(parents=True, exist_ok=True)
|
59 |
-
pc59_dir = pc_path / "annotations_detectron2" / "pc59_val"
|
60 |
-
pc59_dir.mkdir(parents=True, exist_ok=True)
|
61 |
-
|
62 |
-
for line in tqdm.tqdm(val_list.readlines()):
|
63 |
-
fileid = line.strip()
|
64 |
-
ori_mask = f'{pc_path}/trainval/{fileid}.mat'
|
65 |
-
pc459_dst = f'{pc459_dir}/{fileid}.tif'
|
66 |
-
pc59_dst = f'{pc59_dir}/{fileid}.png'
|
67 |
-
if osp.exists(ori_mask):
|
68 |
-
convert_pc459(ori_mask, pc459_dst)
|
69 |
-
convert_pc59(ori_mask, pc59_dst, pc59_dict)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
datasets/prepare_voc_sem_seg.py
DELETED
@@ -1,71 +0,0 @@
|
|
1 |
-
# Copyright (c) Facebook, Inc. and its affiliates.
|
2 |
-
# Copyright (c) Meta Platforms, Inc. All Rights Reserved
|
3 |
-
# Modified by Feng Liang from https://github.com/MendelXu/zsseg.baseline/blob/master/datasets/prepare_voc_sem_seg.py
|
4 |
-
|
5 |
-
import os
|
6 |
-
import os.path as osp
|
7 |
-
from pathlib import Path
|
8 |
-
import tqdm
|
9 |
-
|
10 |
-
import numpy as np
|
11 |
-
from PIL import Image
|
12 |
-
|
13 |
-
|
14 |
-
clsID_to_trID = {
|
15 |
-
0: 255,
|
16 |
-
1: 0,
|
17 |
-
2: 1,
|
18 |
-
3: 2,
|
19 |
-
4: 3,
|
20 |
-
5: 4,
|
21 |
-
6: 5,
|
22 |
-
7: 6,
|
23 |
-
8: 7,
|
24 |
-
9: 8,
|
25 |
-
10: 9,
|
26 |
-
11: 10,
|
27 |
-
12: 11,
|
28 |
-
13: 12,
|
29 |
-
14: 13,
|
30 |
-
15: 14,
|
31 |
-
16: 15,
|
32 |
-
17: 16,
|
33 |
-
18: 17,
|
34 |
-
19: 18,
|
35 |
-
20: 19,
|
36 |
-
255: 255,
|
37 |
-
}
|
38 |
-
|
39 |
-
def convert_to_trainID(
|
40 |
-
maskpath, out_mask_dir, is_train, clsID_to_trID=clsID_to_trID, suffix=""
|
41 |
-
):
|
42 |
-
mask = np.array(Image.open(maskpath))
|
43 |
-
mask_copy = np.ones_like(mask, dtype=np.uint8) * 255
|
44 |
-
for clsID, trID in clsID_to_trID.items():
|
45 |
-
mask_copy[mask == clsID] = trID
|
46 |
-
seg_filename = (
|
47 |
-
osp.join(out_mask_dir, "train" + suffix, osp.basename(maskpath))
|
48 |
-
if is_train
|
49 |
-
else osp.join(out_mask_dir, "val" + suffix, osp.basename(maskpath))
|
50 |
-
)
|
51 |
-
if len(np.unique(mask_copy)) == 1 and np.unique(mask_copy)[0] == 255:
|
52 |
-
return
|
53 |
-
Image.fromarray(mask_copy).save(seg_filename, "PNG")
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
if __name__ == "__main__":
|
58 |
-
dataset_dir = Path(os.getenv("DETECTRON2_DATASETS", "datasets"))
|
59 |
-
print('Caution: we only generate the validation set!')
|
60 |
-
voc_path = dataset_dir / "VOCdevkit" / "VOC2012"
|
61 |
-
out_mask_dir = voc_path / "annotations_detectron2"
|
62 |
-
out_image_dir = voc_path / "images_detectron2"
|
63 |
-
for name in ["val"]:
|
64 |
-
os.makedirs((out_mask_dir / name), exist_ok=True)
|
65 |
-
os.makedirs((out_image_dir / name), exist_ok=True)
|
66 |
-
val_list = [
|
67 |
-
osp.join(voc_path, "SegmentationClassAug", f + ".png")
|
68 |
-
for f in np.loadtxt(osp.join(voc_path, "ImageSets/Segmentation/val.txt"), dtype=np.str).tolist()
|
69 |
-
]
|
70 |
-
for file in tqdm.tqdm(val_list):
|
71 |
-
convert_to_trainID(file, out_mask_dir, is_train=False)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
open_vocab_seg/.DS_Store
CHANGED
Binary files a/open_vocab_seg/.DS_Store and b/open_vocab_seg/.DS_Store differ
|
|
open_vocab_seg/modeling/.DS_Store
CHANGED
Binary files a/open_vocab_seg/modeling/.DS_Store and b/open_vocab_seg/modeling/.DS_Store differ
|
|
open_vocab_seg/modeling/clip_adapter/__init__.py
CHANGED
@@ -21,3 +21,5 @@ def build_text_prompt(cfg):
|
|
21 |
"Prompt learner {} is not supported".format(cfg.TEXT_TEMPLATES)
|
22 |
)
|
23 |
return text_templates
|
|
|
|
|
|
21 |
"Prompt learner {} is not supported".format(cfg.TEXT_TEMPLATES)
|
22 |
)
|
23 |
return text_templates
|
24 |
+
|
25 |
+
from .clip import tokenize
|
open_vocab_seg/modeling/clip_adapter/clip/__init__.py
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
from .clip import *
|
open_vocab_seg/modeling/clip_adapter/clip/bpe_simple_vocab_16e6.txt.gz
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:924691ac288e54409236115652ad4aa250f48203de50a9e4722a6ecd48d6804a
|
3 |
+
size 1356917
|
open_vocab_seg/modeling/clip_adapter/clip/clip.py
ADDED
@@ -0,0 +1,285 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import hashlib
|
2 |
+
import os
|
3 |
+
import urllib
|
4 |
+
import warnings
|
5 |
+
from collections import OrderedDict
|
6 |
+
from typing import Union, List
|
7 |
+
|
8 |
+
import torch
|
9 |
+
from PIL import Image
|
10 |
+
from torchvision.transforms import Compose, Resize, CenterCrop, ToTensor, Normalize
|
11 |
+
from tqdm import tqdm
|
12 |
+
|
13 |
+
from .model import build_model
|
14 |
+
from .simple_tokenizer import SimpleTokenizer as _Tokenizer
|
15 |
+
|
16 |
+
try:
|
17 |
+
from torchvision.transforms import InterpolationMode
|
18 |
+
|
19 |
+
BICUBIC = InterpolationMode.BICUBIC
|
20 |
+
except ImportError:
|
21 |
+
BICUBIC = Image.BICUBIC
|
22 |
+
|
23 |
+
|
24 |
+
if torch.__version__.split(".") < ["1", "7", "1"]:
|
25 |
+
warnings.warn("PyTorch version 1.7.1 or higher is recommended")
|
26 |
+
|
27 |
+
|
28 |
+
__all__ = ["available_models", "load", "tokenize"]
|
29 |
+
_tokenizer = _Tokenizer()
|
30 |
+
|
31 |
+
_MODELS = {
|
32 |
+
"RN50": "https://openaipublic.azureedge.net/clip/models/afeb0e10f9e5a86da6080e35cf09123aca3b358a0c3e3b6c78a7b63bc04b6762/RN50.pt",
|
33 |
+
"RN101": "https://openaipublic.azureedge.net/clip/models/8fa8567bab74a42d41c5915025a8e4538c3bdbe8804a470a72f30b0d94fab599/RN101.pt",
|
34 |
+
"RN50x4": "https://openaipublic.azureedge.net/clip/models/7e526bd135e493cef0776de27d5f42653e6b4c8bf9e0f653bb11773263205fdd/RN50x4.pt",
|
35 |
+
"RN50x16": "https://openaipublic.azureedge.net/clip/models/52378b407f34354e150460fe41077663dd5b39c54cd0bfd2b27167a4a06ec9aa/RN50x16.pt",
|
36 |
+
"ViT-B/32": "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt",
|
37 |
+
"ViT-B/16": "https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt",
|
38 |
+
"ViT-L/14": "https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt",
|
39 |
+
"ViT-L/14@336px": "https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt",
|
40 |
+
}
|
41 |
+
|
42 |
+
|
43 |
+
def _download(url: str, root: str = os.path.expanduser("~/.cache/clip")):
|
44 |
+
os.makedirs(root, exist_ok=True)
|
45 |
+
filename = os.path.basename(url)
|
46 |
+
|
47 |
+
expected_sha256 = url.split("/")[-2]
|
48 |
+
download_target = os.path.join(root, filename)
|
49 |
+
|
50 |
+
if os.path.exists(download_target) and not os.path.isfile(download_target):
|
51 |
+
raise RuntimeError(f"{download_target} exists and is not a regular file")
|
52 |
+
|
53 |
+
if os.path.isfile(download_target):
|
54 |
+
if (
|
55 |
+
hashlib.sha256(open(download_target, "rb").read()).hexdigest()
|
56 |
+
== expected_sha256
|
57 |
+
):
|
58 |
+
return download_target
|
59 |
+
else:
|
60 |
+
warnings.warn(
|
61 |
+
f"{download_target} exists, but the SHA256 checksum does not match; re-downloading the file"
|
62 |
+
)
|
63 |
+
|
64 |
+
with urllib.request.urlopen(url) as source, open(download_target, "wb") as output:
|
65 |
+
with tqdm(
|
66 |
+
total=int(source.info().get("Content-Length")),
|
67 |
+
ncols=80,
|
68 |
+
unit="iB",
|
69 |
+
unit_scale=True,
|
70 |
+
) as loop:
|
71 |
+
while True:
|
72 |
+
buffer = source.read(8192)
|
73 |
+
if not buffer:
|
74 |
+
break
|
75 |
+
|
76 |
+
output.write(buffer)
|
77 |
+
loop.update(len(buffer))
|
78 |
+
|
79 |
+
if (
|
80 |
+
hashlib.sha256(open(download_target, "rb").read()).hexdigest()
|
81 |
+
!= expected_sha256
|
82 |
+
):
|
83 |
+
raise RuntimeError(
|
84 |
+
f"Model has been downloaded but the SHA256 checksum does not not match"
|
85 |
+
)
|
86 |
+
|
87 |
+
return download_target
|
88 |
+
|
89 |
+
|
90 |
+
def _transform(n_px):
|
91 |
+
return Compose(
|
92 |
+
[
|
93 |
+
Resize(n_px, interpolation=BICUBIC),
|
94 |
+
CenterCrop(n_px),
|
95 |
+
lambda image: image.convert("RGB"),
|
96 |
+
ToTensor(),
|
97 |
+
Normalize(
|
98 |
+
(0.48145466, 0.4578275, 0.40821073),
|
99 |
+
(0.26862954, 0.26130258, 0.27577711),
|
100 |
+
),
|
101 |
+
]
|
102 |
+
)
|
103 |
+
|
104 |
+
|
105 |
+
def available_models() -> List[str]:
|
106 |
+
"""Returns the names of available CLIP models"""
|
107 |
+
return list(_MODELS.keys())
|
108 |
+
|
109 |
+
|
110 |
+
def load(
|
111 |
+
name: str,
|
112 |
+
mask_prompt_depth: int = 0,
|
113 |
+
device: Union[str, torch.device] = "cuda" if torch.cuda.is_available() else "cpu",
|
114 |
+
jit=False,
|
115 |
+
):
|
116 |
+
"""Load a CLIP model
|
117 |
+
|
118 |
+
Parameters
|
119 |
+
----------
|
120 |
+
name : str
|
121 |
+
A model name listed by `clip.available_models()`, or the path to a model checkpoint containing the state_dict
|
122 |
+
|
123 |
+
device : Union[str, torch.device]
|
124 |
+
The device to put the loaded model
|
125 |
+
|
126 |
+
jit : bool
|
127 |
+
Whether to load the optimized JIT model or more hackable non-JIT model (default).
|
128 |
+
|
129 |
+
Returns
|
130 |
+
-------
|
131 |
+
model : torch.nn.Module
|
132 |
+
The CLIP model
|
133 |
+
|
134 |
+
preprocess : Callable[[PIL.Image], torch.Tensor]
|
135 |
+
A torchvision transform that converts a PIL image into a tensor that the returned model can take as its input
|
136 |
+
"""
|
137 |
+
if name in _MODELS:
|
138 |
+
model_path = _download(_MODELS[name])
|
139 |
+
elif os.path.isfile(name):
|
140 |
+
model_path = name
|
141 |
+
else:
|
142 |
+
raise RuntimeError(
|
143 |
+
f"Model {name} not found; available models = {available_models()}"
|
144 |
+
)
|
145 |
+
|
146 |
+
try:
|
147 |
+
# loading JIT archive
|
148 |
+
model = torch.jit.load(model_path, map_location=device if jit else "cpu").eval()
|
149 |
+
state_dict = None
|
150 |
+
except RuntimeError:
|
151 |
+
# loading saved state dict
|
152 |
+
if jit:
|
153 |
+
warnings.warn(
|
154 |
+
f"File {model_path} is not a JIT archive. Loading as a state dict instead"
|
155 |
+
)
|
156 |
+
jit = False
|
157 |
+
state_dict = torch.load(model_path, map_location="cpu")
|
158 |
+
if 'state_dict' in state_dict:
|
159 |
+
new_state_dict = OrderedDict()
|
160 |
+
for k, v in state_dict['state_dict'].items():
|
161 |
+
if k.startswith('module.'):
|
162 |
+
name = k[7:] # remove `module.`
|
163 |
+
new_state_dict[name] = v
|
164 |
+
state_dict = new_state_dict
|
165 |
+
|
166 |
+
if not jit:
|
167 |
+
model = build_model(state_dict or model.state_dict(), mask_prompt_depth).to(device)
|
168 |
+
if str(device) == "cpu":
|
169 |
+
model.float()
|
170 |
+
return model, _transform(model.visual.input_resolution)
|
171 |
+
|
172 |
+
# patch the device names
|
173 |
+
device_holder = torch.jit.trace(
|
174 |
+
lambda: torch.ones([]).to(torch.device(device)), example_inputs=[]
|
175 |
+
)
|
176 |
+
device_node = [
|
177 |
+
n
|
178 |
+
for n in device_holder.graph.findAllNodes("prim::Constant")
|
179 |
+
if "Device" in repr(n)
|
180 |
+
][-1]
|
181 |
+
|
182 |
+
def patch_device(module):
|
183 |
+
try:
|
184 |
+
graphs = [module.graph] if hasattr(module, "graph") else []
|
185 |
+
except RuntimeError:
|
186 |
+
graphs = []
|
187 |
+
|
188 |
+
if hasattr(module, "forward1"):
|
189 |
+
graphs.append(module.forward1.graph)
|
190 |
+
|
191 |
+
for graph in graphs:
|
192 |
+
for node in graph.findAllNodes("prim::Constant"):
|
193 |
+
if "value" in node.attributeNames() and str(node["value"]).startswith(
|
194 |
+
"cuda"
|
195 |
+
):
|
196 |
+
node.copyAttributes(device_node)
|
197 |
+
|
198 |
+
model.apply(patch_device)
|
199 |
+
patch_device(model.encode_image)
|
200 |
+
patch_device(model.encode_text)
|
201 |
+
|
202 |
+
# patch dtype to float32 on CPU
|
203 |
+
if str(device) == "cpu":
|
204 |
+
float_holder = torch.jit.trace(
|
205 |
+
lambda: torch.ones([]).float(), example_inputs=[]
|
206 |
+
)
|
207 |
+
float_input = list(float_holder.graph.findNode("aten::to").inputs())[1]
|
208 |
+
float_node = float_input.node()
|
209 |
+
|
210 |
+
def patch_float(module):
|
211 |
+
try:
|
212 |
+
graphs = [module.graph] if hasattr(module, "graph") else []
|
213 |
+
except RuntimeError:
|
214 |
+
graphs = []
|
215 |
+
|
216 |
+
if hasattr(module, "forward1"):
|
217 |
+
graphs.append(module.forward1.graph)
|
218 |
+
|
219 |
+
for graph in graphs:
|
220 |
+
for node in graph.findAllNodes("aten::to"):
|
221 |
+
inputs = list(node.inputs())
|
222 |
+
for i in [
|
223 |
+
1,
|
224 |
+
2,
|
225 |
+
]: # dtype can be the second or third argument to aten::to()
|
226 |
+
if inputs[i].node()["value"] == 5:
|
227 |
+
inputs[i].node().copyAttributes(float_node)
|
228 |
+
|
229 |
+
model.apply(patch_float)
|
230 |
+
patch_float(model.encode_image)
|
231 |
+
patch_float(model.encode_text)
|
232 |
+
|
233 |
+
model.float()
|
234 |
+
|
235 |
+
return model, _transform(model.input_resolution.item())
|
236 |
+
|
237 |
+
|
238 |
+
def tokenize(
|
239 |
+
texts: Union[str, List[str]],
|
240 |
+
context_length: int = 77,
|
241 |
+
truncate: bool = False,
|
242 |
+
return_length: bool = False,
|
243 |
+
) -> torch.LongTensor:
|
244 |
+
"""
|
245 |
+
Returns the tokenized representation of given input string(s)
|
246 |
+
|
247 |
+
Parameters
|
248 |
+
----------
|
249 |
+
texts : Union[str, List[str]]
|
250 |
+
An input string or a list of input strings to tokenize
|
251 |
+
|
252 |
+
context_length : int
|
253 |
+
The context length to use; all CLIP models use 77 as the context length
|
254 |
+
|
255 |
+
truncate: bool
|
256 |
+
Whether to truncate the text in case its encoding is longer than the context length
|
257 |
+
|
258 |
+
Returns
|
259 |
+
-------
|
260 |
+
A two-dimensional tensor containing the resulting tokens, shape = [number of input strings, context_length]
|
261 |
+
"""
|
262 |
+
if isinstance(texts, str):
|
263 |
+
texts = [texts]
|
264 |
+
|
265 |
+
sot_token = _tokenizer.encoder["<|startoftext|>"]
|
266 |
+
eot_token = _tokenizer.encoder["<|endoftext|>"]
|
267 |
+
all_tokens = [[sot_token] + _tokenizer.encode(text) + [eot_token] for text in texts]
|
268 |
+
result = torch.zeros(len(all_tokens), context_length, dtype=torch.long)
|
269 |
+
length = []
|
270 |
+
for i, tokens in enumerate(all_tokens):
|
271 |
+
if len(tokens) > context_length:
|
272 |
+
if truncate:
|
273 |
+
tokens = tokens[:context_length]
|
274 |
+
tokens[-1] = eot_token
|
275 |
+
length.append(context_length)
|
276 |
+
else:
|
277 |
+
raise RuntimeError(
|
278 |
+
f"Input {texts[i]} is too long for context length {context_length}"
|
279 |
+
)
|
280 |
+
else:
|
281 |
+
length.append(len(tokens))
|
282 |
+
result[i, : len(tokens)] = torch.tensor(tokens)
|
283 |
+
if return_length:
|
284 |
+
return result, length
|
285 |
+
return result
|
open_vocab_seg/modeling/clip_adapter/clip/model.py
ADDED
@@ -0,0 +1,613 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Copyright (c) Facebook, Inc. and its affiliates.
|
2 |
+
# Copyright (c) Meta Platforms, Inc. All Rights Reserved
|
3 |
+
# Modified by Feng Liang from https://github.com/openai/CLIP/blob/main/clip/model.py
|
4 |
+
|
5 |
+
from collections import OrderedDict
|
6 |
+
from typing import Tuple, Union
|
7 |
+
|
8 |
+
import numpy as np
|
9 |
+
import torch
|
10 |
+
import torch.nn.functional as F
|
11 |
+
from torch import nn
|
12 |
+
|
13 |
+
|
14 |
+
class Bottleneck(nn.Module):
|
15 |
+
expansion = 4
|
16 |
+
|
17 |
+
def __init__(self, inplanes, planes, stride=1):
|
18 |
+
super().__init__()
|
19 |
+
|
20 |
+
# all conv layers have stride 1. an avgpool is performed after the second convolution when stride > 1
|
21 |
+
self.conv1 = nn.Conv2d(inplanes, planes, 1, bias=False)
|
22 |
+
self.bn1 = nn.BatchNorm2d(planes)
|
23 |
+
|
24 |
+
self.conv2 = nn.Conv2d(planes, planes, 3, padding=1, bias=False)
|
25 |
+
self.bn2 = nn.BatchNorm2d(planes)
|
26 |
+
|
27 |
+
self.avgpool = nn.AvgPool2d(stride) if stride > 1 else nn.Identity()
|
28 |
+
|
29 |
+
self.conv3 = nn.Conv2d(planes, planes * self.expansion, 1, bias=False)
|
30 |
+
self.bn3 = nn.BatchNorm2d(planes * self.expansion)
|
31 |
+
|
32 |
+
self.relu = nn.ReLU(inplace=True)
|
33 |
+
self.downsample = None
|
34 |
+
self.stride = stride
|
35 |
+
|
36 |
+
if stride > 1 or inplanes != planes * Bottleneck.expansion:
|
37 |
+
# downsampling layer is prepended with an avgpool, and the subsequent convolution has stride 1
|
38 |
+
self.downsample = nn.Sequential(
|
39 |
+
OrderedDict(
|
40 |
+
[
|
41 |
+
("-1", nn.AvgPool2d(stride)),
|
42 |
+
(
|
43 |
+
"0",
|
44 |
+
nn.Conv2d(
|
45 |
+
inplanes,
|
46 |
+
planes * self.expansion,
|
47 |
+
1,
|
48 |
+
stride=1,
|
49 |
+
bias=False,
|
50 |
+
),
|
51 |
+
),
|
52 |
+
("1", nn.BatchNorm2d(planes * self.expansion)),
|
53 |
+
]
|
54 |
+
)
|
55 |
+
)
|
56 |
+
|
57 |
+
def forward(self, x: torch.Tensor):
|
58 |
+
identity = x
|
59 |
+
|
60 |
+
out = self.relu(self.bn1(self.conv1(x)))
|
61 |
+
out = self.relu(self.bn2(self.conv2(out)))
|
62 |
+
out = self.avgpool(out)
|
63 |
+
out = self.bn3(self.conv3(out))
|
64 |
+
|
65 |
+
if self.downsample is not None:
|
66 |
+
identity = self.downsample(x)
|
67 |
+
|
68 |
+
out += identity
|
69 |
+
out = self.relu(out)
|
70 |
+
return out
|
71 |
+
|
72 |
+
|
73 |
+
class AttentionPool2d(nn.Module):
|
74 |
+
def __init__(
|
75 |
+
self, spacial_dim: int, embed_dim: int, num_heads: int, output_dim: int = None
|
76 |
+
):
|
77 |
+
super().__init__()
|
78 |
+
self.positional_embedding = nn.Parameter(
|
79 |
+
torch.randn(spacial_dim ** 2 + 1, embed_dim) / embed_dim ** 0.5
|
80 |
+
)
|
81 |
+
self.k_proj = nn.Linear(embed_dim, embed_dim)
|
82 |
+
self.q_proj = nn.Linear(embed_dim, embed_dim)
|
83 |
+
self.v_proj = nn.Linear(embed_dim, embed_dim)
|
84 |
+
self.c_proj = nn.Linear(embed_dim, output_dim or embed_dim)
|
85 |
+
self.num_heads = num_heads
|
86 |
+
self.grid_size = spacial_dim
|
87 |
+
|
88 |
+
def forward(self, x, mask=None, return_cls=True):
|
89 |
+
b, c, gh, gw = x.shape
|
90 |
+
# remove irrelated feature
|
91 |
+
if mask is not None:
|
92 |
+
mask = F.interpolate(mask[:, None, ...], size=(gh, gw)).squeeze(
|
93 |
+
1
|
94 |
+
) # [N,H,W] -> [N,grid,grid]
|
95 |
+
mask = (mask > 0.5).reshape(mask.shape[0], -1)
|
96 |
+
mask = torch.cat([mask, mask.new_ones(mask.shape[0], 1)], dim=1)
|
97 |
+
if x.size()[0] == 1:
|
98 |
+
x = x.expand(mask.shape[0], c, gh, gw)
|
99 |
+
|
100 |
+
x = x.reshape(x.shape[0], c, gh * gw).permute(2, 0, 1) # NCHW -> (HW)NC
|
101 |
+
|
102 |
+
x = torch.cat([x.mean(dim=0, keepdim=True), x], dim=0) # (HW+1)NC
|
103 |
+
positional_embedding = self.positional_embedding
|
104 |
+
if not (self.positional_embedding.shape[0] == x.shape[0]):
|
105 |
+
cls_pos = positional_embedding[0:1, :]
|
106 |
+
per_pos_embedding = (
|
107 |
+
F.interpolate(
|
108 |
+
positional_embedding[1:, :]
|
109 |
+
.permute(1, 0)
|
110 |
+
.view(1, -1, self.grid_size, self.grid_size),
|
111 |
+
size=(gh, gw),
|
112 |
+
mode="bicubic",
|
113 |
+
)
|
114 |
+
.reshape(-1, gh * gw)
|
115 |
+
.permute(1, 0)
|
116 |
+
)
|
117 |
+
positional_embedding = torch.cat([cls_pos, per_pos_embedding])
|
118 |
+
|
119 |
+
x = x + positional_embedding[:, None, :].to(x.dtype) # (HW+1)NC
|
120 |
+
x, _ = F.multi_head_attention_forward(
|
121 |
+
query=x,
|
122 |
+
key=x,
|
123 |
+
value=x,
|
124 |
+
embed_dim_to_check=x.shape[-1],
|
125 |
+
num_heads=self.num_heads,
|
126 |
+
q_proj_weight=self.q_proj.weight,
|
127 |
+
k_proj_weight=self.k_proj.weight,
|
128 |
+
v_proj_weight=self.v_proj.weight,
|
129 |
+
in_proj_weight=None,
|
130 |
+
in_proj_bias=torch.cat(
|
131 |
+
[self.q_proj.bias, self.k_proj.bias, self.v_proj.bias]
|
132 |
+
),
|
133 |
+
bias_k=None,
|
134 |
+
bias_v=None,
|
135 |
+
add_zero_attn=False,
|
136 |
+
dropout_p=0,
|
137 |
+
out_proj_weight=self.c_proj.weight,
|
138 |
+
out_proj_bias=self.c_proj.bias,
|
139 |
+
use_separate_proj_weight=True,
|
140 |
+
training=self.training,
|
141 |
+
need_weights=False,
|
142 |
+
key_padding_mask=mask,
|
143 |
+
)
|
144 |
+
|
145 |
+
if return_cls:
|
146 |
+
return x[0]
|
147 |
+
else:
|
148 |
+
return x
|
149 |
+
|
150 |
+
|
151 |
+
class ModifiedResNet(nn.Module):
|
152 |
+
"""
|
153 |
+
A ResNet class that is similar to torchvision's but contains the following changes:
|
154 |
+
- There are now 3 "stem" convolutions as opposed to 1, with an average pool instead of a max pool.
|
155 |
+
- Performs anti-aliasing strided convolutions, where an avgpool is prepended to convolutions with stride > 1
|
156 |
+
- The final pooling layer is a QKV attention instead of an average pool
|
157 |
+
"""
|
158 |
+
|
159 |
+
def __init__(self, layers, output_dim, heads, input_resolution=224, width=64):
|
160 |
+
super().__init__()
|
161 |
+
self.output_dim = output_dim
|
162 |
+
self.input_resolution = input_resolution
|
163 |
+
|
164 |
+
# the 3-layer stem
|
165 |
+
self.conv1 = nn.Conv2d(
|
166 |
+
3, width // 2, kernel_size=3, stride=2, padding=1, bias=False
|
167 |
+
)
|
168 |
+
self.bn1 = nn.BatchNorm2d(width // 2)
|
169 |
+
self.conv2 = nn.Conv2d(
|
170 |
+
width // 2, width // 2, kernel_size=3, padding=1, bias=False
|
171 |
+
)
|
172 |
+
self.bn2 = nn.BatchNorm2d(width // 2)
|
173 |
+
self.conv3 = nn.Conv2d(width // 2, width, kernel_size=3, padding=1, bias=False)
|
174 |
+
self.bn3 = nn.BatchNorm2d(width)
|
175 |
+
self.avgpool = nn.AvgPool2d(2)
|
176 |
+
self.relu = nn.ReLU(inplace=True)
|
177 |
+
|
178 |
+
# residual layers
|
179 |
+
self._inplanes = width # this is a *mutable* variable used during construction
|
180 |
+
self.layer1 = self._make_layer(width, layers[0])
|
181 |
+
self.layer2 = self._make_layer(width * 2, layers[1], stride=2)
|
182 |
+
self.layer3 = self._make_layer(width * 4, layers[2], stride=2)
|
183 |
+
self.layer4 = self._make_layer(width * 8, layers[3], stride=2)
|
184 |
+
|
185 |
+
embed_dim = width * 32 # the ResNet feature dimension
|
186 |
+
self.attnpool = AttentionPool2d(
|
187 |
+
input_resolution // 32, embed_dim, heads, output_dim
|
188 |
+
)
|
189 |
+
|
190 |
+
def _make_layer(self, planes, blocks, stride=1):
|
191 |
+
layers = [Bottleneck(self._inplanes, planes, stride)]
|
192 |
+
|
193 |
+
self._inplanes = planes * Bottleneck.expansion
|
194 |
+
for _ in range(1, blocks):
|
195 |
+
layers.append(Bottleneck(self._inplanes, planes))
|
196 |
+
|
197 |
+
return nn.Sequential(*layers)
|
198 |
+
|
199 |
+
def forward(self, x, mask: torch.Tensor = None, return_cls=True):
|
200 |
+
def stem(x):
|
201 |
+
for conv, bn in [
|
202 |
+
(self.conv1, self.bn1),
|
203 |
+
(self.conv2, self.bn2),
|
204 |
+
(self.conv3, self.bn3),
|
205 |
+
]:
|
206 |
+
x = self.relu(bn(conv(x)))
|
207 |
+
x = self.avgpool(x)
|
208 |
+
return x
|
209 |
+
|
210 |
+
x = x.type(self.conv1.weight.dtype)
|
211 |
+
x = stem(x) # 1/4,1/4
|
212 |
+
x = self.layer1(x)
|
213 |
+
x = self.layer2(x) # 1/8,1/8
|
214 |
+
x = self.layer3(x) # 1/16,1/16
|
215 |
+
x = self.layer4(x) # 1/32,1/32
|
216 |
+
b, c, gh, gw = x.shape
|
217 |
+
x = self.attnpool(x, mask, return_cls)
|
218 |
+
if not return_cls:
|
219 |
+
return x[1:].permute(1, 0, 2).reshape(b, gh, gw, x.shape[-1]) # N,L,C
|
220 |
+
return x
|
221 |
+
|
222 |
+
|
223 |
+
class LayerNorm(nn.LayerNorm):
|
224 |
+
"""Subclass torch's LayerNorm to handle fp16."""
|
225 |
+
|
226 |
+
def forward(self, x: torch.Tensor):
|
227 |
+
orig_type = x.dtype
|
228 |
+
ret = super().forward(x.type(torch.float32))
|
229 |
+
return ret.type(orig_type)
|
230 |
+
|
231 |
+
|
232 |
+
class QuickGELU(nn.Module):
|
233 |
+
def forward(self, x: torch.Tensor):
|
234 |
+
return x * torch.sigmoid(1.702 * x)
|
235 |
+
|
236 |
+
|
237 |
+
class ResidualAttentionBlock(nn.Module):
|
238 |
+
def __init__(self, d_model: int, n_head: int, attn_mask: torch.Tensor = None):
|
239 |
+
super().__init__()
|
240 |
+
|
241 |
+
self.attn = nn.MultiheadAttention(d_model, n_head)
|
242 |
+
self.ln_1 = LayerNorm(d_model)
|
243 |
+
self.mlp = nn.Sequential(
|
244 |
+
OrderedDict(
|
245 |
+
[
|
246 |
+
("c_fc", nn.Linear(d_model, d_model * 4)),
|
247 |
+
("gelu", QuickGELU()),
|
248 |
+
("c_proj", nn.Linear(d_model * 4, d_model)),
|
249 |
+
]
|
250 |
+
)
|
251 |
+
)
|
252 |
+
self.ln_2 = LayerNorm(d_model)
|
253 |
+
self.attn_mask = attn_mask
|
254 |
+
|
255 |
+
def attention(self, x: torch.Tensor, **kwargs):
|
256 |
+
self.attn_mask = (
|
257 |
+
self.attn_mask.to(dtype=x.dtype, device=x.device)
|
258 |
+
if self.attn_mask is not None
|
259 |
+
else None
|
260 |
+
)
|
261 |
+
return self.attn(
|
262 |
+
x, x, x, need_weights=False, attn_mask=self.attn_mask, **kwargs
|
263 |
+
)[0]
|
264 |
+
|
265 |
+
def forward(self, x: torch.Tensor, **kwargs):
|
266 |
+
x = x + self.attention(self.ln_1(x), **kwargs)
|
267 |
+
x = x + self.mlp(self.ln_2(x))
|
268 |
+
return x
|
269 |
+
|
270 |
+
|
271 |
+
class Transformer(nn.Module):
|
272 |
+
def __init__(
|
273 |
+
self, width: int, layers: int, heads: int, attn_mask: torch.Tensor = None
|
274 |
+
):
|
275 |
+
super().__init__()
|
276 |
+
self.width = width
|
277 |
+
self.layers = layers
|
278 |
+
self.resblocks = nn.Sequential(
|
279 |
+
*[ResidualAttentionBlock(width, heads, attn_mask) for _ in range(layers)]
|
280 |
+
)
|
281 |
+
|
282 |
+
def forward(self, x: torch.Tensor, **kwargs):
|
283 |
+
for block in self.resblocks:
|
284 |
+
x = block(x, **kwargs)
|
285 |
+
return x
|
286 |
+
|
287 |
+
|
288 |
+
class VisionTransformer(nn.Module):
|
289 |
+
def __init__(
|
290 |
+
self,
|
291 |
+
input_resolution: int,
|
292 |
+
patch_size: int,
|
293 |
+
mask_prompt_depth: int,
|
294 |
+
width: int,
|
295 |
+
layers: int,
|
296 |
+
heads: int,
|
297 |
+
output_dim: int,
|
298 |
+
):
|
299 |
+
super().__init__()
|
300 |
+
self.input_resolution = input_resolution
|
301 |
+
self.output_dim = output_dim
|
302 |
+
self.conv1 = nn.Conv2d(
|
303 |
+
in_channels=3,
|
304 |
+
out_channels=width,
|
305 |
+
kernel_size=patch_size,
|
306 |
+
stride=patch_size,
|
307 |
+
bias=False,
|
308 |
+
)
|
309 |
+
|
310 |
+
scale = width ** -0.5
|
311 |
+
self.class_embedding = nn.Parameter(scale * torch.randn(width))
|
312 |
+
self.positional_embedding = nn.Parameter(
|
313 |
+
scale * torch.randn((input_resolution // patch_size) ** 2 + 1, width)
|
314 |
+
)
|
315 |
+
self.grid_size = input_resolution // patch_size
|
316 |
+
self.ln_pre = LayerNorm(width)
|
317 |
+
|
318 |
+
self.transformer = Transformer(width, layers, heads)
|
319 |
+
|
320 |
+
self.ln_post = LayerNorm(width)
|
321 |
+
self.proj = nn.Parameter(scale * torch.randn(width, output_dim))
|
322 |
+
|
323 |
+
self.mask_pool = nn.AvgPool2d(patch_size, stride=patch_size)
|
324 |
+
self.mask_prompt_depth = mask_prompt_depth
|
325 |
+
self.mask_embedding = nn.Parameter(torch.zeros(self.mask_prompt_depth, self.grid_size * self.grid_size, width))
|
326 |
+
|
327 |
+
def forward(self, x: torch.Tensor, m: torch.Tensor = None):
|
328 |
+
x = self.conv1(x) # shape = [*, width, grid, grid]
|
329 |
+
x = x.reshape(x.shape[0], x.shape[1], -1) # shape = [*, width, grid ** 2]
|
330 |
+
x = x.permute(0, 2, 1) # shape = [*, grid ** 2, width]
|
331 |
+
if m is not None:
|
332 |
+
m = self.mask_pool(m.to(torch.float).squeeze()).reshape(m.shape[0], -1).unsqueeze(-1)
|
333 |
+
m = torch.ceil(m)
|
334 |
+
if self.mask_embedding.shape[1] == 1:
|
335 |
+
mask_embedding = self.mask_embedding.to(x.dtype).repeat(1, x.shape[1], 1)
|
336 |
+
else:
|
337 |
+
mask_embedding = self.mask_embedding.to(x.dtype)
|
338 |
+
x = x * m + mask_embedding[0].unsqueeze(0) * (1 - m)
|
339 |
+
|
340 |
+
x = torch.cat([self.class_embedding.to(x.dtype) + torch.zeros(x.shape[0], 1, x.shape[-1], dtype=x.dtype, device=x.device), x], dim=1) # shape = [*, grid ** 2 + 1, width]
|
341 |
+
x = x + self.positional_embedding.to(x.dtype)
|
342 |
+
x = self.ln_pre(x)
|
343 |
+
|
344 |
+
x = x.permute(1, 0, 2) # NLD -> LND
|
345 |
+
if m is not None:
|
346 |
+
for i, blk in enumerate(self.transformer.resblocks):
|
347 |
+
d = i + 1
|
348 |
+
x = blk(x)
|
349 |
+
if d < self.mask_prompt_depth:
|
350 |
+
masked_x = x[1:, :, :] * m.permute(1, 0, 2) + \
|
351 |
+
mask_embedding[d].unsqueeze(0).permute(1, 0, 2) * (1 - m.permute(1, 0, 2))
|
352 |
+
x = torch.cat([x[:1, :, :], masked_x], dim=0)
|
353 |
+
else:
|
354 |
+
x = self.transformer(x)
|
355 |
+
x = x.permute(1, 0, 2) # LND -> NLD
|
356 |
+
|
357 |
+
x = self.ln_post(x[:, 0, :])
|
358 |
+
|
359 |
+
if self.proj is not None:
|
360 |
+
x = x @ self.proj
|
361 |
+
|
362 |
+
return x
|
363 |
+
|
364 |
+
|
365 |
+
|
366 |
+
class CLIP(nn.Module):
|
367 |
+
def __init__(
|
368 |
+
self,
|
369 |
+
embed_dim: int,
|
370 |
+
# vision
|
371 |
+
image_resolution: int,
|
372 |
+
vision_layers: Union[Tuple[int, int, int, int], int],
|
373 |
+
vision_width: int,
|
374 |
+
vision_patch_size: int,
|
375 |
+
mask_prompt_depth: int,
|
376 |
+
# text
|
377 |
+
context_length: int,
|
378 |
+
vocab_size: int,
|
379 |
+
transformer_width: int,
|
380 |
+
transformer_heads: int,
|
381 |
+
transformer_layers: int,
|
382 |
+
):
|
383 |
+
super().__init__()
|
384 |
+
|
385 |
+
self.context_length = context_length
|
386 |
+
|
387 |
+
if isinstance(vision_layers, (tuple, list)):
|
388 |
+
vision_heads = vision_width * 32 // 64
|
389 |
+
self.visual = ModifiedResNet(
|
390 |
+
layers=vision_layers,
|
391 |
+
output_dim=embed_dim,
|
392 |
+
heads=vision_heads,
|
393 |
+
input_resolution=image_resolution,
|
394 |
+
width=vision_width,
|
395 |
+
)
|
396 |
+
else:
|
397 |
+
vision_heads = vision_width // 64
|
398 |
+
self.visual = VisionTransformer(
|
399 |
+
input_resolution=image_resolution,
|
400 |
+
patch_size=vision_patch_size,
|
401 |
+
mask_prompt_depth=mask_prompt_depth,
|
402 |
+
width=vision_width,
|
403 |
+
layers=vision_layers,
|
404 |
+
heads=vision_heads,
|
405 |
+
output_dim=embed_dim,
|
406 |
+
)
|
407 |
+
|
408 |
+
self.transformer = Transformer(
|
409 |
+
width=transformer_width,
|
410 |
+
layers=transformer_layers,
|
411 |
+
heads=transformer_heads,
|
412 |
+
attn_mask=self.build_attention_mask(),
|
413 |
+
)
|
414 |
+
|
415 |
+
self.vocab_size = vocab_size
|
416 |
+
self.token_embedding = nn.Embedding(vocab_size, transformer_width)
|
417 |
+
self.positional_embedding = nn.Parameter(
|
418 |
+
torch.empty(self.context_length, transformer_width)
|
419 |
+
)
|
420 |
+
self.ln_final = LayerNorm(transformer_width)
|
421 |
+
|
422 |
+
self.text_projection = nn.Parameter(torch.empty(transformer_width, embed_dim))
|
423 |
+
self.logit_scale = nn.Parameter(torch.ones([]) * np.log(1 / 0.07))
|
424 |
+
|
425 |
+
self.initialize_parameters()
|
426 |
+
|
427 |
+
def initialize_parameters(self):
|
428 |
+
nn.init.normal_(self.token_embedding.weight, std=0.02)
|
429 |
+
nn.init.normal_(self.positional_embedding, std=0.01)
|
430 |
+
|
431 |
+
if isinstance(self.visual, ModifiedResNet):
|
432 |
+
if self.visual.attnpool is not None:
|
433 |
+
std = self.visual.attnpool.c_proj.in_features ** -0.5
|
434 |
+
nn.init.normal_(self.visual.attnpool.q_proj.weight, std=std)
|
435 |
+
nn.init.normal_(self.visual.attnpool.k_proj.weight, std=std)
|
436 |
+
nn.init.normal_(self.visual.attnpool.v_proj.weight, std=std)
|
437 |
+
nn.init.normal_(self.visual.attnpool.c_proj.weight, std=std)
|
438 |
+
|
439 |
+
for resnet_block in [
|
440 |
+
self.visual.layer1,
|
441 |
+
self.visual.layer2,
|
442 |
+
self.visual.layer3,
|
443 |
+
self.visual.layer4,
|
444 |
+
]:
|
445 |
+
for name, param in resnet_block.named_parameters():
|
446 |
+
if name.endswith("bn3.weight"):
|
447 |
+
nn.init.zeros_(param)
|
448 |
+
|
449 |
+
proj_std = (self.transformer.width ** -0.5) * (
|
450 |
+
(2 * self.transformer.layers) ** -0.5
|
451 |
+
)
|
452 |
+
attn_std = self.transformer.width ** -0.5
|
453 |
+
fc_std = (2 * self.transformer.width) ** -0.5
|
454 |
+
for block in self.transformer.resblocks:
|
455 |
+
nn.init.normal_(block.attn.in_proj_weight, std=attn_std)
|
456 |
+
nn.init.normal_(block.attn.out_proj.weight, std=proj_std)
|
457 |
+
nn.init.normal_(block.mlp.c_fc.weight, std=fc_std)
|
458 |
+
nn.init.normal_(block.mlp.c_proj.weight, std=proj_std)
|
459 |
+
|
460 |
+
if self.text_projection is not None:
|
461 |
+
nn.init.normal_(self.text_projection, std=self.transformer.width ** -0.5)
|
462 |
+
|
463 |
+
def build_attention_mask(self):
|
464 |
+
# lazily create causal attention mask, with full attention between the vision tokens
|
465 |
+
# pytorch uses additive attention mask; fill with -inf
|
466 |
+
mask = torch.empty(self.context_length, self.context_length)
|
467 |
+
mask.fill_(float("-inf"))
|
468 |
+
mask.triu_(1) # zero out the lower diagonal
|
469 |
+
return mask
|
470 |
+
|
471 |
+
@property
|
472 |
+
def dtype(self):
|
473 |
+
return self.visual.conv1.weight.dtype
|
474 |
+
|
475 |
+
def encode_image(self, image, **kwargs):
|
476 |
+
return self.visual(image.type(self.dtype), **kwargs)
|
477 |
+
|
478 |
+
def encode_text(self, text):
|
479 |
+
x = self.token_embedding(text).type(self.dtype) # [batch_size, n_ctx, d_model]
|
480 |
+
|
481 |
+
x = x + self.positional_embedding.type(self.dtype)
|
482 |
+
x = x.permute(1, 0, 2) # NLD -> LND
|
483 |
+
x = self.transformer(x)
|
484 |
+
x = x.permute(1, 0, 2) # LND -> NLD
|
485 |
+
x = self.ln_final(x).type(self.dtype)
|
486 |
+
|
487 |
+
# x.shape = [batch_size, n_ctx, transformer.width]
|
488 |
+
# take features from the eot embedding (eot_token is the highest number in each sequence)
|
489 |
+
x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)] @ self.text_projection
|
490 |
+
|
491 |
+
return x
|
492 |
+
|
493 |
+
def forward(self, image, text):
|
494 |
+
image_features = self.encode_image(image)
|
495 |
+
text_features = self.encode_text(text)
|
496 |
+
|
497 |
+
# normalized features
|
498 |
+
image_features = image_features / image_features.norm(dim=-1, keepdim=True)
|
499 |
+
text_features = text_features / text_features.norm(dim=-1, keepdim=True)
|
500 |
+
|
501 |
+
# cosine similarity as logits
|
502 |
+
logit_scale = self.logit_scale.exp()
|
503 |
+
logits_per_image = logit_scale * image_features @ text_features.t()
|
504 |
+
logits_per_text = logit_scale * text_features @ image_features.t()
|
505 |
+
|
506 |
+
# shape = [global_batch_size, global_batch_size]
|
507 |
+
return logits_per_image, logits_per_text
|
508 |
+
|
509 |
+
|
510 |
+
def convert_weights(model: nn.Module):
|
511 |
+
"""Convert applicable model parameters to fp16"""
|
512 |
+
|
513 |
+
def _convert_weights_to_fp16(l):
|
514 |
+
if isinstance(l, (nn.Conv1d, nn.Conv2d, nn.Linear)):
|
515 |
+
l.weight.data = l.weight.data.half()
|
516 |
+
if l.bias is not None:
|
517 |
+
l.bias.data = l.bias.data.half()
|
518 |
+
|
519 |
+
if isinstance(l, nn.MultiheadAttention):
|
520 |
+
for attr in [
|
521 |
+
*[f"{s}_proj_weight" for s in ["in", "q", "k", "v"]],
|
522 |
+
"in_proj_bias",
|
523 |
+
"bias_k",
|
524 |
+
"bias_v",
|
525 |
+
]:
|
526 |
+
tensor = getattr(l, attr)
|
527 |
+
if tensor is not None:
|
528 |
+
tensor.data = tensor.data.half()
|
529 |
+
|
530 |
+
for name in ["text_projection", "proj"]:
|
531 |
+
if hasattr(l, name):
|
532 |
+
attr = getattr(l, name)
|
533 |
+
if attr is not None:
|
534 |
+
attr.data = attr.data.half()
|
535 |
+
|
536 |
+
model.apply(_convert_weights_to_fp16)
|
537 |
+
|
538 |
+
|
539 |
+
def build_model(state_dict: dict, mask_prompt_depth: int = 0):
|
540 |
+
vit = "visual.proj" in state_dict
|
541 |
+
|
542 |
+
if vit:
|
543 |
+
vision_width = state_dict["visual.conv1.weight"].shape[0]
|
544 |
+
vision_layers = len(
|
545 |
+
[
|
546 |
+
k
|
547 |
+
for k in state_dict.keys()
|
548 |
+
if k.startswith("visual.") and k.endswith(".attn.in_proj_weight")
|
549 |
+
]
|
550 |
+
)
|
551 |
+
vision_patch_size = state_dict["visual.conv1.weight"].shape[-1]
|
552 |
+
grid_size = round(
|
553 |
+
(state_dict["visual.positional_embedding"].shape[0] - 1) ** 0.5
|
554 |
+
)
|
555 |
+
image_resolution = vision_patch_size * grid_size
|
556 |
+
else:
|
557 |
+
assert mask_prompt_depth == 0, 'ResNets do not support mask prompt tuning'
|
558 |
+
counts: list = [
|
559 |
+
len(
|
560 |
+
set(
|
561 |
+
k.split(".")[2]
|
562 |
+
for k in state_dict
|
563 |
+
if k.startswith(f"visual.layer{b}")
|
564 |
+
)
|
565 |
+
)
|
566 |
+
for b in [1, 2, 3, 4]
|
567 |
+
]
|
568 |
+
vision_layers = tuple(counts)
|
569 |
+
vision_width = state_dict["visual.layer1.0.conv1.weight"].shape[0]
|
570 |
+
output_width = round(
|
571 |
+
(state_dict["visual.attnpool.positional_embedding"].shape[0] - 1) ** 0.5
|
572 |
+
)
|
573 |
+
vision_patch_size = None
|
574 |
+
assert (
|
575 |
+
output_width ** 2 + 1
|
576 |
+
== state_dict["visual.attnpool.positional_embedding"].shape[0]
|
577 |
+
)
|
578 |
+
image_resolution = output_width * 32
|
579 |
+
|
580 |
+
embed_dim = state_dict["text_projection"].shape[1]
|
581 |
+
context_length = state_dict["positional_embedding"].shape[0]
|
582 |
+
vocab_size = state_dict["token_embedding.weight"].shape[0]
|
583 |
+
transformer_width = state_dict["ln_final.weight"].shape[0]
|
584 |
+
transformer_heads = transformer_width // 64
|
585 |
+
transformer_layers = len(
|
586 |
+
set(
|
587 |
+
k.split(".")[2]
|
588 |
+
for k in state_dict
|
589 |
+
if k.startswith(f"transformer.resblocks")
|
590 |
+
)
|
591 |
+
)
|
592 |
+
|
593 |
+
model = CLIP(
|
594 |
+
embed_dim,
|
595 |
+
image_resolution,
|
596 |
+
vision_layers,
|
597 |
+
vision_width,
|
598 |
+
vision_patch_size,
|
599 |
+
mask_prompt_depth,
|
600 |
+
context_length,
|
601 |
+
vocab_size,
|
602 |
+
transformer_width,
|
603 |
+
transformer_heads,
|
604 |
+
transformer_layers,
|
605 |
+
)
|
606 |
+
|
607 |
+
for key in ["input_resolution", "context_length", "vocab_size"]:
|
608 |
+
if key in state_dict:
|
609 |
+
del state_dict[key]
|
610 |
+
|
611 |
+
convert_weights(model)
|
612 |
+
model.load_state_dict(state_dict, strict=False)
|
613 |
+
return model.eval()
|
open_vocab_seg/modeling/clip_adapter/clip/simple_tokenizer.py
ADDED
@@ -0,0 +1,150 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gzip
|
2 |
+
import html
|
3 |
+
import os
|
4 |
+
from functools import lru_cache
|
5 |
+
|
6 |
+
import ftfy
|
7 |
+
import regex as re
|
8 |
+
|
9 |
+
|
10 |
+
@lru_cache()
|
11 |
+
def default_bpe():
|
12 |
+
return os.path.join(
|
13 |
+
os.path.dirname(os.path.abspath(__file__)), "bpe_simple_vocab_16e6.txt.gz"
|
14 |
+
)
|
15 |
+
|
16 |
+
|
17 |
+
@lru_cache()
|
18 |
+
def bytes_to_unicode():
|
19 |
+
"""
|
20 |
+
Returns list of utf-8 byte and a corresponding list of unicode strings.
|
21 |
+
The reversible bpe codes work on unicode strings.
|
22 |
+
This means you need a large # of unicode characters in your vocab if you want to avoid UNKs.
|
23 |
+
When you're at something like a 10B token dataset you end up needing around 5K for decent coverage.
|
24 |
+
This is a signficant percentage of your normal, say, 32K bpe vocab.
|
25 |
+
To avoid that, we want lookup tables between utf-8 bytes and unicode strings.
|
26 |
+
And avoids mapping to whitespace/control characters the bpe code barfs on.
|
27 |
+
"""
|
28 |
+
bs = (
|
29 |
+
list(range(ord("!"), ord("~") + 1))
|
30 |
+
+ list(range(ord("¡"), ord("¬") + 1))
|
31 |
+
+ list(range(ord("®"), ord("ÿ") + 1))
|
32 |
+
)
|
33 |
+
cs = bs[:]
|
34 |
+
n = 0
|
35 |
+
for b in range(2 ** 8):
|
36 |
+
if b not in bs:
|
37 |
+
bs.append(b)
|
38 |
+
cs.append(2 ** 8 + n)
|
39 |
+
n += 1
|
40 |
+
cs = [chr(n) for n in cs]
|
41 |
+
return dict(zip(bs, cs))
|
42 |
+
|
43 |
+
|
44 |
+
def get_pairs(word):
|
45 |
+
"""Return set of symbol pairs in a word.
|
46 |
+
Word is represented as tuple of symbols (symbols being variable-length strings).
|
47 |
+
"""
|
48 |
+
pairs = set()
|
49 |
+
prev_char = word[0]
|
50 |
+
for char in word[1:]:
|
51 |
+
pairs.add((prev_char, char))
|
52 |
+
prev_char = char
|
53 |
+
return pairs
|
54 |
+
|
55 |
+
|
56 |
+
def basic_clean(text):
|
57 |
+
text = ftfy.fix_text(text)
|
58 |
+
text = html.unescape(html.unescape(text))
|
59 |
+
return text.strip()
|
60 |
+
|
61 |
+
|
62 |
+
def whitespace_clean(text):
|
63 |
+
text = re.sub(r"\s+", " ", text)
|
64 |
+
text = text.strip()
|
65 |
+
return text
|
66 |
+
|
67 |
+
|
68 |
+
class SimpleTokenizer(object):
|
69 |
+
def __init__(self, bpe_path: str = default_bpe()):
|
70 |
+
self.byte_encoder = bytes_to_unicode()
|
71 |
+
self.byte_decoder = {v: k for k, v in self.byte_encoder.items()}
|
72 |
+
merges = gzip.open(bpe_path).read().decode("utf-8").split("\n")
|
73 |
+
merges = merges[1 : 49152 - 256 - 2 + 1]
|
74 |
+
merges = [tuple(merge.split()) for merge in merges]
|
75 |
+
vocab = list(bytes_to_unicode().values())
|
76 |
+
vocab = vocab + [v + "</w>" for v in vocab]
|
77 |
+
for merge in merges:
|
78 |
+
vocab.append("".join(merge))
|
79 |
+
vocab.extend(["<|startoftext|>", "<|endoftext|>"])
|
80 |
+
self.encoder = dict(zip(vocab, range(len(vocab))))
|
81 |
+
self.decoder = {v: k for k, v in self.encoder.items()}
|
82 |
+
self.bpe_ranks = dict(zip(merges, range(len(merges))))
|
83 |
+
self.cache = {
|
84 |
+
"<|startoftext|>": "<|startoftext|>",
|
85 |
+
"<|endoftext|>": "<|endoftext|>",
|
86 |
+
}
|
87 |
+
self.pat = re.compile(
|
88 |
+
r"""<\|startoftext\|>|<\|endoftext\|>|'s|'t|'re|'ve|'m|'ll|'d|[\p{L}]+|[\p{N}]|[^\s\p{L}\p{N}]+""",
|
89 |
+
re.IGNORECASE,
|
90 |
+
)
|
91 |
+
|
92 |
+
def bpe(self, token):
|
93 |
+
if token in self.cache:
|
94 |
+
return self.cache[token]
|
95 |
+
word = tuple(token[:-1]) + (token[-1] + "</w>",)
|
96 |
+
pairs = get_pairs(word)
|
97 |
+
|
98 |
+
if not pairs:
|
99 |
+
return token + "</w>"
|
100 |
+
|
101 |
+
while True:
|
102 |
+
bigram = min(pairs, key=lambda pair: self.bpe_ranks.get(pair, float("inf")))
|
103 |
+
if bigram not in self.bpe_ranks:
|
104 |
+
break
|
105 |
+
first, second = bigram
|
106 |
+
new_word = []
|
107 |
+
i = 0
|
108 |
+
while i < len(word):
|
109 |
+
try:
|
110 |
+
j = word.index(first, i)
|
111 |
+
new_word.extend(word[i:j])
|
112 |
+
i = j
|
113 |
+
except:
|
114 |
+
new_word.extend(word[i:])
|
115 |
+
break
|
116 |
+
|
117 |
+
if word[i] == first and i < len(word) - 1 and word[i + 1] == second:
|
118 |
+
new_word.append(first + second)
|
119 |
+
i += 2
|
120 |
+
else:
|
121 |
+
new_word.append(word[i])
|
122 |
+
i += 1
|
123 |
+
new_word = tuple(new_word)
|
124 |
+
word = new_word
|
125 |
+
if len(word) == 1:
|
126 |
+
break
|
127 |
+
else:
|
128 |
+
pairs = get_pairs(word)
|
129 |
+
word = " ".join(word)
|
130 |
+
self.cache[token] = word
|
131 |
+
return word
|
132 |
+
|
133 |
+
def encode(self, text):
|
134 |
+
bpe_tokens = []
|
135 |
+
text = whitespace_clean(basic_clean(text)).lower()
|
136 |
+
for token in re.findall(self.pat, text):
|
137 |
+
token = "".join(self.byte_encoder[b] for b in token.encode("utf-8"))
|
138 |
+
bpe_tokens.extend(
|
139 |
+
self.encoder[bpe_token] for bpe_token in self.bpe(token).split(" ")
|
140 |
+
)
|
141 |
+
return bpe_tokens
|
142 |
+
|
143 |
+
def decode(self, tokens):
|
144 |
+
text = "".join([self.decoder[token] for token in tokens])
|
145 |
+
text = (
|
146 |
+
bytearray([self.byte_decoder[c] for c in text])
|
147 |
+
.decode("utf-8", errors="replace")
|
148 |
+
.replace("</w>", " ")
|
149 |
+
)
|
150 |
+
return text
|
open_vocab_seg/modeling/clip_adapter/text_template.py
CHANGED
@@ -6,7 +6,8 @@
|
|
6 |
|
7 |
from typing import List
|
8 |
|
9 |
-
import clip
|
|
|
10 |
import torch
|
11 |
from torch import nn
|
12 |
|
@@ -130,7 +131,7 @@ class PredefinedPromptExtractor(PromptExtractor):
|
|
130 |
def forward(self, noun_list: List[str], clip_model: nn.Module):
|
131 |
text_features_bucket = []
|
132 |
for template in self.templates:
|
133 |
-
noun_tokens = [
|
134 |
text_inputs = torch.cat(noun_tokens).to(
|
135 |
clip_model.text_projection.data.device
|
136 |
)
|
|
|
6 |
|
7 |
from typing import List
|
8 |
|
9 |
+
# import clip
|
10 |
+
from .clip import tokenize
|
11 |
import torch
|
12 |
from torch import nn
|
13 |
|
|
|
131 |
def forward(self, noun_list: List[str], clip_model: nn.Module):
|
132 |
text_features_bucket = []
|
133 |
for template in self.templates:
|
134 |
+
noun_tokens = [tokenize(template.format(noun)) for noun in noun_list]
|
135 |
text_inputs = torch.cat(noun_tokens).to(
|
136 |
clip_model.text_projection.data.device
|
137 |
)
|
open_vocab_seg/modeling/clip_adapter/utils.py
CHANGED
@@ -4,7 +4,7 @@
|
|
4 |
from typing import Tuple
|
5 |
import numpy as np
|
6 |
import torch
|
7 |
-
import
|
8 |
from detectron2.utils.comm import get_local_rank, synchronize
|
9 |
|
10 |
|
@@ -70,10 +70,10 @@ def build_clip_model(model: str, mask_prompt_depth: int = 0, frozen: bool = True
|
|
70 |
rank = get_local_rank()
|
71 |
if rank == 0:
|
72 |
# download on rank 0 only
|
73 |
-
model, _ =
|
74 |
synchronize()
|
75 |
if rank != 0:
|
76 |
-
model, _ =
|
77 |
synchronize()
|
78 |
if frozen:
|
79 |
for param in model.parameters():
|
|
|
4 |
from typing import Tuple
|
5 |
import numpy as np
|
6 |
import torch
|
7 |
+
from .clip import load as clip_load
|
8 |
from detectron2.utils.comm import get_local_rank, synchronize
|
9 |
|
10 |
|
|
|
70 |
rank = get_local_rank()
|
71 |
if rank == 0:
|
72 |
# download on rank 0 only
|
73 |
+
model, _ = clip_load(model, mask_prompt_depth=mask_prompt_depth, device="cpu")
|
74 |
synchronize()
|
75 |
if rank != 0:
|
76 |
+
model, _ = clip_load(model, mask_prompt_depth=mask_prompt_depth, device="cpu")
|
77 |
synchronize()
|
78 |
if frozen:
|
79 |
for param in model.parameters():
|
configs/ovseg_swinB_vitL_demo.yaml → ovseg_swinB_vitL_demo.yaml
RENAMED
@@ -12,7 +12,7 @@ MODEL:
|
|
12 |
DROP_PATH_RATE: 0.3
|
13 |
PATCH_NORM: True
|
14 |
PRETRAIN_IMG_SIZE: 384
|
15 |
-
WEIGHTS: "
|
16 |
PIXEL_MEAN: [123.675, 116.280, 103.530]
|
17 |
PIXEL_STD: [58.395, 57.120, 57.375]
|
18 |
SEM_SEG_HEAD:
|
|
|
12 |
DROP_PATH_RATE: 0.3
|
13 |
PATCH_NORM: True
|
14 |
PRETRAIN_IMG_SIZE: 384
|
15 |
+
WEIGHTS: "./ovseg_swinbase_vitL14_ft_mpt.pth"
|
16 |
PIXEL_MEAN: [123.675, 116.280, 103.530]
|
17 |
PIXEL_STD: [58.395, 57.120, 57.375]
|
18 |
SEM_SEG_HEAD:
|
requirements.txt
CHANGED
@@ -7,8 +7,14 @@ wandb
|
|
7 |
fire
|
8 |
opencv-python
|
9 |
pandas
|
10 |
-
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
# Detectron
|
14 |
--find-links https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html
|
|
|
7 |
fire
|
8 |
opencv-python
|
9 |
pandas
|
10 |
+
ftfy
|
11 |
+
regex
|
12 |
+
tqdm
|
13 |
+
gdown
|
14 |
+
# Torch
|
15 |
+
--find-links https://download.pytorch.org/whl/cu113/torch_stable.html
|
16 |
+
torch==1.10.1+cu113
|
17 |
+
torchvision==0.11.2+cu113
|
18 |
|
19 |
# Detectron
|
20 |
--find-links https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html
|
resources/demo_samples/sample_01.jpeg
ADDED
Git LFS Details
|
resources/demo_samples/sample_02.jpeg
ADDED
Git LFS Details
|
tools/convert-pretrained-clip-model-to-d2.py
DELETED
@@ -1,69 +0,0 @@
|
|
1 |
-
# Copyright (c) Facebook, Inc. and its affiliates.
|
2 |
-
# Copyright (c) Meta Platforms, Inc. All Rights Reserved
|
3 |
-
|
4 |
-
import pickle as pkl
|
5 |
-
import sys
|
6 |
-
|
7 |
-
import torch
|
8 |
-
|
9 |
-
"""
|
10 |
-
Usage:
|
11 |
-
# download pretrained swin model:
|
12 |
-
wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth
|
13 |
-
# run the conversion
|
14 |
-
./convert-pretrained-model-to-d2.py swin_tiny_patch4_window7_224.pth swin_tiny_patch4_window7_224.pkl
|
15 |
-
# Then, use swin_tiny_patch4_window7_224.pkl with the following changes in config:
|
16 |
-
MODEL:
|
17 |
-
WEIGHTS: "/path/to/swin_tiny_patch4_window7_224.pkl"
|
18 |
-
INPUT:
|
19 |
-
FORMAT: "RGB"
|
20 |
-
"""
|
21 |
-
|
22 |
-
|
23 |
-
def transform(path):
|
24 |
-
model = torch.load(path, map_location="cpu")
|
25 |
-
print(f"loading {path}......")
|
26 |
-
state_dict = model["model"]
|
27 |
-
state_dict = {
|
28 |
-
k.replace("visual_model.", ""): v
|
29 |
-
for k, v in state_dict.items()
|
30 |
-
if k.startswith("visual_model")
|
31 |
-
}
|
32 |
-
source_keys = [k for k in state_dict.keys() if "relative_coords" in k]
|
33 |
-
for k in source_keys:
|
34 |
-
state_dict[
|
35 |
-
k.replace("relative_coords", "relative_position_index")
|
36 |
-
] = state_dict[k]
|
37 |
-
del state_dict[k]
|
38 |
-
|
39 |
-
source_keys = [k for k in state_dict.keys() if "atten_mask_matrix" in k]
|
40 |
-
for k in source_keys:
|
41 |
-
state_dict[k.replace("atten_mask_matrix", "attn_mask")] = state_dict[k]
|
42 |
-
del state_dict[k]
|
43 |
-
|
44 |
-
source_keys = [k for k in state_dict.keys() if "rel_pos_embed_table" in k]
|
45 |
-
for k in source_keys:
|
46 |
-
state_dict[
|
47 |
-
k.replace("rel_pos_embed_table", "relative_position_bias_table")
|
48 |
-
] = state_dict[k]
|
49 |
-
del state_dict[k]
|
50 |
-
|
51 |
-
source_keys = [k for k in state_dict.keys() if "channel_reduction" in k]
|
52 |
-
for k in source_keys:
|
53 |
-
state_dict[k.replace("channel_reduction", "reduction")] = state_dict[k]
|
54 |
-
del state_dict[k]
|
55 |
-
return {
|
56 |
-
k if k.startswith("backbone.") else "backbone." + k: v
|
57 |
-
for k, v in state_dict.items()
|
58 |
-
}
|
59 |
-
|
60 |
-
|
61 |
-
if __name__ == "__main__":
|
62 |
-
input = sys.argv[1]
|
63 |
-
res = {
|
64 |
-
"model": transform(input),
|
65 |
-
"__author__": "third_party",
|
66 |
-
"matching_heuristics": True,
|
67 |
-
}
|
68 |
-
with open(sys.argv[2], "wb") as f:
|
69 |
-
pkl.dump(res, f)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tools/convert-pretrained-swin-model-to-d2.py
DELETED
@@ -1,30 +0,0 @@
|
|
1 |
-
# Copyright (c) Facebook, Inc. and its affiliates.
|
2 |
-
# Copyright (c) Meta Platforms, Inc. All Rights Reserved
|
3 |
-
|
4 |
-
import pickle as pkl
|
5 |
-
import sys
|
6 |
-
|
7 |
-
import torch
|
8 |
-
|
9 |
-
"""
|
10 |
-
Usage:
|
11 |
-
# download pretrained swin model:
|
12 |
-
wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth
|
13 |
-
# run the conversion
|
14 |
-
./convert-pretrained-model-to-d2.py swin_tiny_patch4_window7_224.pth swin_tiny_patch4_window7_224.pkl
|
15 |
-
# Then, use swin_tiny_patch4_window7_224.pkl with the following changes in config:
|
16 |
-
MODEL:
|
17 |
-
WEIGHTS: "/path/to/swin_tiny_patch4_window7_224.pkl"
|
18 |
-
INPUT:
|
19 |
-
FORMAT: "RGB"
|
20 |
-
"""
|
21 |
-
|
22 |
-
if __name__ == "__main__":
|
23 |
-
input = sys.argv[1]
|
24 |
-
|
25 |
-
obj = torch.load(input, map_location="cpu")["model"]
|
26 |
-
|
27 |
-
res = {"model": obj, "__author__": "third_party", "matching_heuristics": True}
|
28 |
-
|
29 |
-
with open(sys.argv[2], "wb") as f:
|
30 |
-
pkl.dump(res, f)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tools/convert-torchvision-to-d2.py
DELETED
@@ -1,54 +0,0 @@
|
|
1 |
-
# Copyright (c) Facebook, Inc. and its affiliates.
|
2 |
-
# Copyright (c) Meta Platforms, Inc. All Rights Reserved
|
3 |
-
|
4 |
-
import pickle as pkl
|
5 |
-
import sys
|
6 |
-
|
7 |
-
import torch
|
8 |
-
|
9 |
-
"""
|
10 |
-
Usage:
|
11 |
-
# download one of the ResNet{18,34,50,101,152} models from torchvision:
|
12 |
-
wget https://download.pytorch.org/models/resnet50-19c8e357.pth -O r50.pth
|
13 |
-
# run the conversion
|
14 |
-
./convert-torchvision-to-d2.py r50.pth r50.pkl
|
15 |
-
# Then, use r50.pkl with the following changes in config:
|
16 |
-
MODEL:
|
17 |
-
WEIGHTS: "/path/to/r50.pkl"
|
18 |
-
PIXEL_MEAN: [123.675, 116.280, 103.530]
|
19 |
-
PIXEL_STD: [58.395, 57.120, 57.375]
|
20 |
-
RESNETS:
|
21 |
-
DEPTH: 50
|
22 |
-
STRIDE_IN_1X1: False
|
23 |
-
INPUT:
|
24 |
-
FORMAT: "RGB"
|
25 |
-
These models typically produce slightly worse results than the
|
26 |
-
pre-trained ResNets we use in official configs, which are the
|
27 |
-
original ResNet models released by MSRA.
|
28 |
-
"""
|
29 |
-
|
30 |
-
if __name__ == "__main__":
|
31 |
-
input = sys.argv[1]
|
32 |
-
|
33 |
-
obj = torch.load(input, map_location="cpu")
|
34 |
-
|
35 |
-
newmodel = {}
|
36 |
-
for k in list(obj.keys()):
|
37 |
-
old_k = k
|
38 |
-
if "layer" not in k:
|
39 |
-
k = "stem." + k
|
40 |
-
for t in [1, 2, 3, 4]:
|
41 |
-
k = k.replace("layer{}".format(t), "res{}".format(t + 1))
|
42 |
-
for t in [1, 2, 3]:
|
43 |
-
k = k.replace("bn{}".format(t), "conv{}.norm".format(t))
|
44 |
-
k = k.replace("downsample.0", "shortcut")
|
45 |
-
k = k.replace("downsample.1", "shortcut.norm")
|
46 |
-
print(old_k, "->", k)
|
47 |
-
newmodel[k] = obj.pop(old_k).detach().numpy()
|
48 |
-
|
49 |
-
res = {"model": newmodel, "__author__": "torchvision", "matching_heuristics": True}
|
50 |
-
|
51 |
-
with open(sys.argv[2], "wb") as f:
|
52 |
-
pkl.dump(res, f)
|
53 |
-
if obj:
|
54 |
-
print("Unconverted keys:", obj.keys())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tools/ovseg_replace_clip.py
DELETED
@@ -1,30 +0,0 @@
|
|
1 |
-
# Copyright (c) Facebook, Inc. and its affiliates.
|
2 |
-
# Copyright (c) Meta Platforms, Inc. All Rights Reserved
|
3 |
-
|
4 |
-
import torch
|
5 |
-
from collections import OrderedDict
|
6 |
-
|
7 |
-
|
8 |
-
# PATH to new clip model
|
9 |
-
clip_ckpt = torch.load('xx/open_clip/src/logs/2022_xx/checkpoints/epoch_x.pt')
|
10 |
-
|
11 |
-
new_model = OrderedDict()
|
12 |
-
state_dict = clip_ckpt['state_dict']
|
13 |
-
|
14 |
-
for k, v in state_dict.items():
|
15 |
-
new_key = k.replace('module.','')
|
16 |
-
new_model[new_key] = v
|
17 |
-
|
18 |
-
# PATH to trained ovseg model
|
19 |
-
ovseg_model = torch.load('xx/ovseg/output/model_final.pth', 'cpu')
|
20 |
-
|
21 |
-
for k, v in new_model.items():
|
22 |
-
new_k = 'clip_adapter.clip_model.' + k
|
23 |
-
if new_k in ovseg_model['model'].keys():
|
24 |
-
ovseg_model['model'][new_k] = v
|
25 |
-
else:
|
26 |
-
print(f'{new_k} does not exist in ckpt')
|
27 |
-
|
28 |
-
# ovseg_model['model']['clip_adapter.clip_model.visual.mask_embedding'] = new_model['visual.mask_embedding']
|
29 |
-
|
30 |
-
torch.save(ovseg_model, 'xx/ovseg/output/ovseg_ft_mpt.pth')
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tools/search_thr_ensemble_w.sh
DELETED
@@ -1,11 +0,0 @@
|
|
1 |
-
or MASK_THR in 0.35 0.4 0.45
|
2 |
-
o
|
3 |
-
for ENSEMBLE_WEIGHT in 0.6 0.65 0.7 0.75 0.8
|
4 |
-
do
|
5 |
-
python train_net.py --num-gpu 8 --eval-only --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml \
|
6 |
-
MODEL.WEIGHTS #PATH_of_ovseg_swinbase_vitL14_ft_mpt.pth DATASETS.TEST \(\"ade20k_sem_seg_val\"\) \
|
7 |
-
MODEL.CLIP_ADAPTER.CLIP_ENSEMBLE_WEIGHT $ENSEMBLE_WEIGHT MODEL.CLIP_ADAPTER.MASK_THR $MASK_THR
|
8 |
-
done
|
9 |
-
one
|
10 |
-
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tools/web_demo.py
DELETED
@@ -1,76 +0,0 @@
|
|
1 |
-
# Copyright (c) Facebook, Inc. and its affiliates.
|
2 |
-
# Copyright (c) Meta Platforms, Inc. All Rights Reserved
|
3 |
-
|
4 |
-
import multiprocessing as mp
|
5 |
-
|
6 |
-
import numpy as np
|
7 |
-
from PIL import Image
|
8 |
-
|
9 |
-
from detectron2.config import get_cfg
|
10 |
-
|
11 |
-
from detectron2.projects.deeplab import add_deeplab_config
|
12 |
-
from detectron2.data.detection_utils import read_image
|
13 |
-
from open_vocab_seg import add_ovseg_config
|
14 |
-
from open_vocab_seg.utils import VisualizationDemo
|
15 |
-
|
16 |
-
import gradio as gr
|
17 |
-
|
18 |
-
def setup_cfg(config_file):
|
19 |
-
# load config from file and command-line arguments
|
20 |
-
cfg = get_cfg()
|
21 |
-
add_deeplab_config(cfg)
|
22 |
-
add_ovseg_config(cfg)
|
23 |
-
cfg.merge_from_file(config_file)
|
24 |
-
cfg.freeze()
|
25 |
-
return cfg
|
26 |
-
|
27 |
-
|
28 |
-
def inference(class_names, input_img):
|
29 |
-
mp.set_start_method("spawn", force=True)
|
30 |
-
config_file = './configs/ovseg_swinB_vitL_demo.yaml'
|
31 |
-
cfg = setup_cfg(config_file)
|
32 |
-
|
33 |
-
demo = VisualizationDemo(cfg)
|
34 |
-
|
35 |
-
class_names = class_names.split(',')
|
36 |
-
img = read_image(input_img, format="BGR")
|
37 |
-
_, visualized_output = demo.run_on_image(img, class_names)
|
38 |
-
|
39 |
-
return Image.fromarray(np.uint8(visualized_output.get_image())).convert('RGB')
|
40 |
-
|
41 |
-
# demo = gr.Interface(fn=greet, inputs="text", outputs="text")
|
42 |
-
# demo.launch()
|
43 |
-
|
44 |
-
|
45 |
-
examples = [['Oculus, Ukulele', './resources/demo_samples/sample_03.jpeg'],]
|
46 |
-
output_labels = ['segmentation map']
|
47 |
-
|
48 |
-
title = 'OVSeg'
|
49 |
-
|
50 |
-
description = """
|
51 |
-
Gradio Demo for Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP \n
|
52 |
-
You may click on of the examples or upload your own image. \n
|
53 |
-
OVSeg could perform open vocabulary segmentation, you may input more classes (seperate by comma).
|
54 |
-
"""
|
55 |
-
|
56 |
-
article = """
|
57 |
-
<p style='text-align: center'>
|
58 |
-
<a href='https://arxiv.org/abs/2210.04150' target='_blank'>
|
59 |
-
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
|
60 |
-
</a>
|
61 |
-
|
|
62 |
-
<a href='https://github.com' target='_blank'>Github Repo</a></p>
|
63 |
-
"""
|
64 |
-
|
65 |
-
gr.Interface(
|
66 |
-
inference,
|
67 |
-
inputs=[
|
68 |
-
gr.inputs.Textbox(
|
69 |
-
lines=1, placeholder=None, default='', label='class names'),
|
70 |
-
gr.inputs.Image(type='filepath')
|
71 |
-
],
|
72 |
-
outputs=gr.outputs.Image(label='segmentation map'),
|
73 |
-
title=title,
|
74 |
-
description=description,
|
75 |
-
article=article,
|
76 |
-
examples=examples).launch(enable_queue=True)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
train_net.py
DELETED
@@ -1,309 +0,0 @@
|
|
1 |
-
# Copyright (c) Facebook, Inc. and its affiliates.
|
2 |
-
# Copyright (c) Meta Platforms, Inc. All Rights Reserved
|
3 |
-
# Modified by Feng Liang from https://github.com/MendelXu/zsseg.baseline/blob/master/train_net.py
|
4 |
-
|
5 |
-
"""
|
6 |
-
OVSeg Training Script.
|
7 |
-
|
8 |
-
This script is a simplified version of the training script in detectron2/tools.
|
9 |
-
"""
|
10 |
-
import copy
|
11 |
-
import itertools
|
12 |
-
import logging
|
13 |
-
import os
|
14 |
-
from collections import OrderedDict
|
15 |
-
from typing import Any, Dict, List, Set
|
16 |
-
|
17 |
-
import detectron2.utils.comm as comm
|
18 |
-
import torch
|
19 |
-
from detectron2.checkpoint import DetectionCheckpointer
|
20 |
-
from detectron2.config import get_cfg
|
21 |
-
from detectron2.data import MetadataCatalog
|
22 |
-
from detectron2.engine import (
|
23 |
-
DefaultTrainer,
|
24 |
-
default_argument_parser,
|
25 |
-
default_setup,
|
26 |
-
launch,
|
27 |
-
)
|
28 |
-
from detectron2.evaluation import (
|
29 |
-
DatasetEvaluator,
|
30 |
-
CityscapesSemSegEvaluator,
|
31 |
-
COCOEvaluator,
|
32 |
-
DatasetEvaluators,
|
33 |
-
verify_results,
|
34 |
-
)
|
35 |
-
from detectron2.projects.deeplab import add_deeplab_config, build_lr_scheduler
|
36 |
-
from detectron2.solver.build import maybe_add_gradient_clipping
|
37 |
-
from detectron2.utils.logger import setup_logger
|
38 |
-
from detectron2.utils.events import CommonMetricPrinter, JSONWriter
|
39 |
-
|
40 |
-
# OVSeg
|
41 |
-
from open_vocab_seg import SemanticSegmentorWithTTA, add_ovseg_config
|
42 |
-
from open_vocab_seg.data import (
|
43 |
-
MaskFormerSemanticDatasetMapper,
|
44 |
-
)
|
45 |
-
|
46 |
-
from open_vocab_seg.data import (
|
47 |
-
build_detection_test_loader,
|
48 |
-
build_detection_train_loader,
|
49 |
-
)
|
50 |
-
from open_vocab_seg.evaluation import (
|
51 |
-
GeneralizedSemSegEvaluator,
|
52 |
-
)
|
53 |
-
from open_vocab_seg.utils.events import WandbWriter, setup_wandb
|
54 |
-
from open_vocab_seg.utils.post_process_utils import dense_crf_post_process
|
55 |
-
|
56 |
-
|
57 |
-
class Trainer(DefaultTrainer):
|
58 |
-
"""
|
59 |
-
Extension of the Trainer class adapted to DETR.
|
60 |
-
"""
|
61 |
-
|
62 |
-
@classmethod
|
63 |
-
def build_evaluator(cls, cfg, dataset_name, output_folder=None):
|
64 |
-
"""
|
65 |
-
Create evaluator(s) for a given dataset.
|
66 |
-
This uses the special metadata "evaluator_type" associated with each
|
67 |
-
builtin dataset. For your own dataset, you can simply create an
|
68 |
-
evaluator manually in your script and do not have to worry about the
|
69 |
-
hacky if-else logic here.
|
70 |
-
"""
|
71 |
-
if output_folder is None:
|
72 |
-
output_folder = os.path.join(cfg.OUTPUT_DIR, "inference")
|
73 |
-
evaluator_list = []
|
74 |
-
evaluator_type = MetadataCatalog.get(dataset_name).evaluator_type
|
75 |
-
if evaluator_type in ["sem_seg"]:
|
76 |
-
evaluator = GeneralizedSemSegEvaluator
|
77 |
-
evaluator_list.append(
|
78 |
-
evaluator(
|
79 |
-
dataset_name,
|
80 |
-
distributed=True,
|
81 |
-
output_dir=output_folder,
|
82 |
-
post_process_func=dense_crf_post_process
|
83 |
-
if cfg.TEST.DENSE_CRF
|
84 |
-
else None,
|
85 |
-
)
|
86 |
-
)
|
87 |
-
|
88 |
-
if len(evaluator_list) == 0:
|
89 |
-
raise NotImplementedError(
|
90 |
-
"no Evaluator for the dataset {} with the type {}".format(
|
91 |
-
dataset_name, evaluator_type
|
92 |
-
)
|
93 |
-
)
|
94 |
-
elif len(evaluator_list) == 1:
|
95 |
-
return evaluator_list[0]
|
96 |
-
return DatasetEvaluators(evaluator_list)
|
97 |
-
|
98 |
-
@classmethod
|
99 |
-
def build_train_loader(cls, cfg):
|
100 |
-
dataset = None
|
101 |
-
# Semantic segmentation dataset mapper
|
102 |
-
if cfg.INPUT.DATASET_MAPPER_NAME == "mask_former_semantic":
|
103 |
-
mapper = MaskFormerSemanticDatasetMapper(cfg, True)
|
104 |
-
else:
|
105 |
-
raise NotImplementedError
|
106 |
-
return build_detection_train_loader(cfg, mapper=mapper, dataset=dataset)
|
107 |
-
|
108 |
-
@classmethod
|
109 |
-
def build_test_loader(cls, cfg, dataset_name):
|
110 |
-
"""
|
111 |
-
Returns:
|
112 |
-
iterable
|
113 |
-
It now calls :func:`detectron2.data.build_detection_test_loader`.
|
114 |
-
Overwrite it if you'd like a different data loader.
|
115 |
-
"""
|
116 |
-
return build_detection_test_loader(cfg, dataset_name, mapper=None)
|
117 |
-
|
118 |
-
def build_writers(self):
|
119 |
-
"""
|
120 |
-
Build a list of writers to be used. By default it contains
|
121 |
-
writers that write metrics to the screen,
|
122 |
-
a json file, and a tensorboard event file respectively.
|
123 |
-
If you'd like a different list of writers, you can overwrite it in
|
124 |
-
your trainer.
|
125 |
-
|
126 |
-
Returns:
|
127 |
-
list[EventWriter]: a list of :class:`EventWriter` objects.
|
128 |
-
|
129 |
-
It is now implemented by:
|
130 |
-
::
|
131 |
-
return [
|
132 |
-
CommonMetricPrinter(self.max_iter),
|
133 |
-
JSONWriter(os.path.join(self.cfg.OUTPUT_DIR, "metrics.json")),
|
134 |
-
TensorboardXWriter(self.cfg.OUTPUT_DIR),
|
135 |
-
]
|
136 |
-
|
137 |
-
"""
|
138 |
-
# Here the default print/log frequency of each writer is used.
|
139 |
-
return [
|
140 |
-
# It may not always print what you want to see, since it prints "common" metrics only.
|
141 |
-
CommonMetricPrinter(self.max_iter),
|
142 |
-
JSONWriter(os.path.join(self.cfg.OUTPUT_DIR, "metrics.json")),
|
143 |
-
WandbWriter(),
|
144 |
-
]
|
145 |
-
|
146 |
-
@classmethod
|
147 |
-
def build_lr_scheduler(cls, cfg, optimizer):
|
148 |
-
"""
|
149 |
-
It now calls :func:`detectron2.solver.build_lr_scheduler`.
|
150 |
-
Overwrite it if you'd like a different scheduler.
|
151 |
-
"""
|
152 |
-
return build_lr_scheduler(cfg, optimizer)
|
153 |
-
|
154 |
-
@classmethod
|
155 |
-
def build_optimizer(cls, cfg, model):
|
156 |
-
weight_decay_norm = cfg.SOLVER.WEIGHT_DECAY_NORM
|
157 |
-
weight_decay_embed = cfg.SOLVER.WEIGHT_DECAY_EMBED
|
158 |
-
|
159 |
-
defaults = {}
|
160 |
-
defaults["lr"] = cfg.SOLVER.BASE_LR
|
161 |
-
defaults["weight_decay"] = cfg.SOLVER.WEIGHT_DECAY
|
162 |
-
|
163 |
-
norm_module_types = (
|
164 |
-
torch.nn.BatchNorm1d,
|
165 |
-
torch.nn.BatchNorm2d,
|
166 |
-
torch.nn.BatchNorm3d,
|
167 |
-
torch.nn.SyncBatchNorm,
|
168 |
-
# NaiveSyncBatchNorm inherits from BatchNorm2d
|
169 |
-
torch.nn.GroupNorm,
|
170 |
-
torch.nn.InstanceNorm1d,
|
171 |
-
torch.nn.InstanceNorm2d,
|
172 |
-
torch.nn.InstanceNorm3d,
|
173 |
-
torch.nn.LayerNorm,
|
174 |
-
torch.nn.LocalResponseNorm,
|
175 |
-
)
|
176 |
-
|
177 |
-
params: List[Dict[str, Any]] = []
|
178 |
-
memo: Set[torch.nn.parameter.Parameter] = set()
|
179 |
-
for module_name, module in model.named_modules():
|
180 |
-
for module_param_name, value in module.named_parameters(recurse=False):
|
181 |
-
if not value.requires_grad:
|
182 |
-
continue
|
183 |
-
# Avoid duplicating parameters
|
184 |
-
if value in memo:
|
185 |
-
continue
|
186 |
-
memo.add(value)
|
187 |
-
|
188 |
-
hyperparams = copy.copy(defaults)
|
189 |
-
if "backbone" in module_name:
|
190 |
-
hyperparams["lr"] = (
|
191 |
-
hyperparams["lr"] * cfg.SOLVER.BACKBONE_MULTIPLIER
|
192 |
-
)
|
193 |
-
if (
|
194 |
-
"relative_position_bias_table" in module_param_name
|
195 |
-
or "absolute_pos_embed" in module_param_name
|
196 |
-
):
|
197 |
-
print(module_param_name)
|
198 |
-
hyperparams["weight_decay"] = 0.0
|
199 |
-
if isinstance(module, norm_module_types):
|
200 |
-
hyperparams["weight_decay"] = weight_decay_norm
|
201 |
-
if isinstance(module, torch.nn.Embedding):
|
202 |
-
hyperparams["weight_decay"] = weight_decay_embed
|
203 |
-
params.append({"params": [value], **hyperparams})
|
204 |
-
|
205 |
-
def maybe_add_full_model_gradient_clipping(optim):
|
206 |
-
# detectron2 doesn't have full model gradient clipping now
|
207 |
-
clip_norm_val = cfg.SOLVER.CLIP_GRADIENTS.CLIP_VALUE
|
208 |
-
enable = (
|
209 |
-
cfg.SOLVER.CLIP_GRADIENTS.ENABLED
|
210 |
-
and cfg.SOLVER.CLIP_GRADIENTS.CLIP_TYPE == "full_model"
|
211 |
-
and clip_norm_val > 0.0
|
212 |
-
)
|
213 |
-
|
214 |
-
class FullModelGradientClippingOptimizer(optim):
|
215 |
-
def step(self, closure=None):
|
216 |
-
all_params = itertools.chain(
|
217 |
-
*[x["params"] for x in self.param_groups]
|
218 |
-
)
|
219 |
-
torch.nn.utils.clip_grad_norm_(all_params, clip_norm_val)
|
220 |
-
super().step(closure=closure)
|
221 |
-
|
222 |
-
return FullModelGradientClippingOptimizer if enable else optim
|
223 |
-
|
224 |
-
optimizer_type = cfg.SOLVER.OPTIMIZER
|
225 |
-
if optimizer_type == "SGD":
|
226 |
-
optimizer = maybe_add_full_model_gradient_clipping(torch.optim.SGD)(
|
227 |
-
params, cfg.SOLVER.BASE_LR, momentum=cfg.SOLVER.MOMENTUM
|
228 |
-
)
|
229 |
-
elif optimizer_type == "ADAMW":
|
230 |
-
optimizer = maybe_add_full_model_gradient_clipping(torch.optim.AdamW)(
|
231 |
-
params, cfg.SOLVER.BASE_LR
|
232 |
-
)
|
233 |
-
else:
|
234 |
-
raise NotImplementedError(f"no optimizer type {optimizer_type}")
|
235 |
-
if not cfg.SOLVER.CLIP_GRADIENTS.CLIP_TYPE == "full_model":
|
236 |
-
optimizer = maybe_add_gradient_clipping(cfg, optimizer)
|
237 |
-
return optimizer
|
238 |
-
|
239 |
-
@classmethod
|
240 |
-
def test_with_TTA(cls, cfg, model):
|
241 |
-
logger = logging.getLogger("detectron2.trainer")
|
242 |
-
# In the end of training, run an evaluation with TTA.
|
243 |
-
logger.info("Running inference with test-time augmentation ...")
|
244 |
-
model = SemanticSegmentorWithTTA(cfg, model)
|
245 |
-
evaluators = [
|
246 |
-
cls.build_evaluator(
|
247 |
-
cfg, name, output_folder=os.path.join(cfg.OUTPUT_DIR, "inference_TTA")
|
248 |
-
)
|
249 |
-
for name in cfg.DATASETS.TEST
|
250 |
-
]
|
251 |
-
res = cls.test(cfg, model, evaluators)
|
252 |
-
res = OrderedDict({k + "_TTA": v for k, v in res.items()})
|
253 |
-
return res
|
254 |
-
|
255 |
-
|
256 |
-
def setup(args):
|
257 |
-
"""
|
258 |
-
Create configs and perform basic setups.
|
259 |
-
"""
|
260 |
-
cfg = get_cfg()
|
261 |
-
# for poly lr schedule
|
262 |
-
add_deeplab_config(cfg)
|
263 |
-
add_ovseg_config(cfg)
|
264 |
-
cfg.merge_from_file(args.config_file)
|
265 |
-
cfg.merge_from_list(args.opts)
|
266 |
-
cfg.freeze()
|
267 |
-
default_setup(cfg, args)
|
268 |
-
# Setup logger for "ovseg" module
|
269 |
-
if not args.eval_only:
|
270 |
-
setup_wandb(cfg, args)
|
271 |
-
setup_logger(
|
272 |
-
output=cfg.OUTPUT_DIR, distributed_rank=comm.get_rank(), name="ovseg"
|
273 |
-
)
|
274 |
-
return cfg
|
275 |
-
|
276 |
-
|
277 |
-
def main(args):
|
278 |
-
cfg = setup(args)
|
279 |
-
|
280 |
-
if args.eval_only:
|
281 |
-
model = Trainer.build_model(cfg)
|
282 |
-
DetectionCheckpointer(model, save_dir=cfg.OUTPUT_DIR).resume_or_load(
|
283 |
-
cfg.MODEL.WEIGHTS, resume=args.resume
|
284 |
-
)
|
285 |
-
|
286 |
-
if cfg.TEST.AUG.ENABLED:
|
287 |
-
res = Trainer.test_with_TTA(cfg, model)
|
288 |
-
else:
|
289 |
-
res = Trainer.test(cfg, model)
|
290 |
-
if comm.is_main_process():
|
291 |
-
verify_results(cfg, res)
|
292 |
-
return res
|
293 |
-
|
294 |
-
trainer = Trainer(cfg)
|
295 |
-
trainer.resume_or_load(resume=args.resume)
|
296 |
-
return trainer.train()
|
297 |
-
|
298 |
-
|
299 |
-
if __name__ == "__main__":
|
300 |
-
args = default_argument_parser().parse_args()
|
301 |
-
print("Command Line Args:", args)
|
302 |
-
launch(
|
303 |
-
main,
|
304 |
-
args.num_gpus,
|
305 |
-
num_machines=args.num_machines,
|
306 |
-
machine_rank=args.machine_rank,
|
307 |
-
dist_url=args.dist_url,
|
308 |
-
args=(args,),
|
309 |
-
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|