raylim commited on
Commit
7866064
1 Parent(s): 447aa03

initial fork

Browse files
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - zero-shot-image-classification
4
+ - clip
5
+ - vision
6
+ - language
7
+ - histopathology
8
+ - histology
9
+ - medical
10
+ library_tag: open_clip
11
+ license: mit
12
+ widget:
13
+ - src: >-
14
+ https://quilt1m.github.io/img/BREST092.jpg
15
+ candidate_labels: adipose tissue, debris tissue, lymphocytes tissue, mucus tissue, smooth muscle tissue, normal colon mucosa tissue, cancer-associated stroma tissue, colorectal adenocarcinoma epithelium tissue
16
+ example_title: Tissue phenotyping
17
+ - src: >-
18
+ https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/resolve/main/example_data/biomed_image_classification_example_data/squamous_cell_carcinoma_histopathology.jpeg
19
+ candidate_labels: adenocarcinoma histopathology, squamous cell carcinoma histopathology
20
+ example_title: squamous cell carcinoma histopathology
21
+ - src: >-
22
+ https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/resolve/main/example_data/biomed_image_classification_example_data/adenocarcinoma_histopathology.jpg
23
+ candidate_labels: adenocarcinoma histopathology, squamous cell carcinoma histopathology
24
+ example_title: adenocarcinoma histopathology
25
+ pipeline_tag: zero-shot-image-classification
26
+ ---
27
+
28
+
29
+ ## QuiltNet-B-16-PMB Description
30
+ [QuiltNet-B-32/PMB](https://github.com/wisdomikezogwo/quilt1m/) is a ViT-B/16 image tower and PubMedBERT text tower vision-language foundation model trained on the [Quilt-1M](https://quilt1m.github.io/) dataset curated from representative histopathology videos.
31
+ It can perform various vision-language processing (VLP) tasks such as cross-modal retrieval, image classification, and visual question answering.
32
+ QuiltNet establishes new state of the art in a wide range of standard datasets, and substantially outperforms prior VLP approaches:
33
+
34
+ ![](barchart_zeroshot.png)
35
+
36
+
37
+ # Citation
38
+ ```bibtex
39
+ @misc{ikezogwo2023quilt1m,
40
+ title={Quilt-1M: One Million Image-Text Pairs for Histopathology},
41
+ author={Wisdom Oluchi Ikezogwo and Mehmet Saygin Seyfioglu and Fatemeh Ghezloo and Dylan Stefan Chan Geva and Fatwir Sheikh Mohammed and Pavan Kumar Anand and Ranjay Krishna and Linda Shapiro},
42
+ year={2023},
43
+ eprint={2306.11207},
44
+ archivePrefix={arXiv},
45
+ primaryClass={cs.CV}
46
+ }
47
+ ```
48
+
49
+
50
+ # Uses
51
+
52
+ As per the original [OpenAI CLIP model card](https://github.com/openai/CLIP/blob/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1/model-card.md), this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such model.
53
+
54
+ The OpenAI CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis.
55
+
56
+ ## Direct Use
57
+
58
+ Zero-shot image classification, image and text retrieval, among others.
59
+
60
+ ## Downstream Use
61
+
62
+ Image classification and other image task fine-tuning, linear probe image classification, image generation guiding and conditioning, among others.
63
+
64
+ ### Intended Use
65
+
66
+ The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such models.
67
+
68
+ #### Primary intended uses
69
+
70
+ The primary intended users of these models are AI researchers.
71
+
72
+ We primarily imagine the model will be used by researchers to better understand robustness, generalization, and other capabilities, biases, and constraints of computer vision histopathology models.
73
+
74
+ ### Out-of-Scope Use Cases
75
+
76
+ **Any** deployed use case of the model - whether commercial or not - is currently out of scope. Non-deployed use cases such as image search in a constrained environment, are also not recommended unless there is thorough in-domain testing of the model with a specific, fixed class taxonomy.
77
+
78
+ Since the model has not been purposefully trained in or evaluated on any languages other than English, its use should be limited to English language use cases.
79
+
80
+ Further the above notice, the Quilt-1M dataset used in training of these models has additional considerations, see below.
81
+
82
+ ## Training Data
83
+
84
+ This model was trained with [QUILT-1M](https://quilt1m.github.io/) is an image-text dataset for histopathology.
85
+ Curated from educational videos on Youtube QUILT-1M contributes the largest dataset for vision language modeling in histopathology.
86
+
87
+ **IMPORTANT NOTE:** The motivation behind dataset creation is to democratize research and experimentation around large-scale multi-modal model training and handling of uncurated, large-scale histopathology datasets crawled from publically available internet. Our recommendation is therefore to use the dataset for research purposes.
88
+
89
+ # Evaluation
90
+
91
+ Evaluation done with code in the [CLIP Benchmark suite](https://github.com/LAION-AI/CLIP_benchmark) and results can be found in the paper on a list of varying histology tasks and datasets.
92
+
93
+
94
+ # Disclaimer
95
+ It is important to note that the results obtained from this function are not intended to constitute medical advice or replace consultation with a qualified medical professional. The use of this function is solely at your own risk and should be consistent with applicable laws, regulations, and ethical considerations. We do not warrant or guarantee the accuracy, completeness, suitability, or usefulness of this function for any particular purpose, and we hereby disclaim any liability arising from any reliance placed on this function or any results obtained from its use.
96
+
97
+ # Privacy
98
+ In accordance with the privacy policy of Youtube, only Video IDs data is redistributed by us.
99
+ It is strictly prohibited to redistribute any content apart from the Video IDs.
100
+ Any distribution carried out must adhere to the laws and regulations applicable in your jurisdiction, including export control laws and embargoes.'
101
+
open_clip_config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_cfg": {
3
+ "embed_dim": 512,
4
+ "vision_cfg": {
5
+ "timm_model_name": "vit_base_patch16_224",
6
+ "timm_model_pretrained": false,
7
+ "timm_pool": "",
8
+ "timm_proj": "linear",
9
+ "image_size": 224
10
+ },
11
+ "text_cfg": {
12
+ "hf_model_name": "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract",
13
+ "hf_tokenizer_name": "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract",
14
+ "hf_proj_type": "mlp",
15
+ "hf_pooler_type": "cls_last_hidden_state_pooler",
16
+ "context_length": 256
17
+ }
18
+ },
19
+ "preprocess_cfg": {
20
+ "mean": [
21
+ 0.48145466,
22
+ 0.4578275,
23
+ 0.40821073
24
+ ],
25
+ "std": [
26
+ 0.26862954,
27
+ 0.26130258,
28
+ 0.27577711
29
+ ]
30
+ }
31
+ }
open_clip_pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:642a4702d0fc9fd0ab0388d0340cb19ff8d234fd79155679d88f78e4ac0880e1
3
+ size 783763297
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "do_basic_tokenize": true,
4
+ "do_lower_case": true,
5
+ "mask_token": "[MASK]",
6
+ "model_max_length": 1000000000000000019884624838656,
7
+ "never_split": null,
8
+ "pad_token": "[PAD]",
9
+ "sep_token": "[SEP]",
10
+ "special_tokens_map_file": null,
11
+ "strip_accents": null,
12
+ "tokenize_chinese_chars": true,
13
+ "tokenizer_class": "BertTokenizer",
14
+ "unk_token": "[UNK]"
15
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff