calpt commited on
Commit
d447458
·
1 Parent(s): 330fe7c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -1
README.md CHANGED
@@ -2,4 +2,33 @@
2
  license: mit
3
  ---
4
 
5
- [CLIP ViT-B/32 xlm roberta base - LAION-5B](https://huggingface.co/laion/CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k) model converted to HuggingFace Transformers via https://gist.github.com/calpt/8e3555bd11f1916b5169c8125117e5ee.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: mit
3
  ---
4
 
5
+ # CLIP ViT-B/32 xlm roberta base - LAION-5B
6
+
7
+ [CLIP ViT-B/32 xlm roberta base - LAION-5B](https://huggingface.co/laion/CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k) model converted from OpenCLIP to HuggingFace Transformers.
8
+
9
+ See https://gist.github.com/calpt/8e3555bd11f1916b5169c8125117e5ee for conversion script and more info.
10
+
11
+ ## Usage
12
+
13
+ Model uses custom code. Make sure to pass `trust_remote_code=True` when loading the model.
14
+
15
+ Example:
16
+ ```python
17
+ import torch
18
+ from PIL import Image
19
+ from transformers import AutoModel, AutoFeatureExtractor, AutoTokenizer
20
+
21
+ model = AutoModel.from_pretrained("calpt/CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k", trust_remote_code=True)
22
+
23
+ processor = AutoFeatureExtractor.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K")
24
+ tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")
25
+
26
+ image_input = processor(Image.open("CLIP.png"), return_tensors="pt")
27
+ text_input = tokenizer(["a diagram", "a dog", "a cat"], return_tensors="pt", padding=True)
28
+
29
+ with torch.no_grad():
30
+ outputs = model(**image_input, **text_input)
31
+ text_probs = (100.0 * outputs.logits_per_image.softmax(dim=-1))
32
+
33
+ print("Label probs:", text_probs)
34
+ ```