Update README.md
Browse files
README.md
CHANGED
@@ -1,8 +1,97 @@
|
|
1 |
---
|
2 |
tags:
|
3 |
- clip
|
|
|
|
|
4 |
library_name: open_clip
|
5 |
pipeline_tag: zero-shot-image-classification
|
6 |
-
license:
|
|
|
|
|
|
|
|
|
|
|
7 |
---
|
8 |
-
# Model
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
tags:
|
3 |
- clip
|
4 |
+
- llm-jp-clip
|
5 |
+
- japanese-clip
|
6 |
library_name: open_clip
|
7 |
pipeline_tag: zero-shot-image-classification
|
8 |
+
license:
|
9 |
+
- apache-2.0
|
10 |
+
datasets:
|
11 |
+
- laion/relaion2B-en-research-safe
|
12 |
+
language:
|
13 |
+
- ja
|
14 |
---
|
15 |
+
# Model Card for llm-jp-clip-vit-base-patch16
|
16 |
+
|
17 |
+
# Model Details
|
18 |
+
|
19 |
+
A CLIP ViT-B/16 model trained using [OpenCLIP](https://github.com/mlfoundations/open_clip) with the Japanese translation of the English subset of ReLAION-5B (https://huggingface.co/datasets/laion/relaion2B-en-research-safe), translated by [gemma-2-9b-it](https://huggingface.co/datasets/laion/relaion2B-en-research-safe).
|
20 |
+
|
21 |
+
The total number of parameters of this model is 248M.
|
22 |
+
|
23 |
+
# How to Use
|
24 |
+
|
25 |
+
## Installation
|
26 |
+
|
27 |
+
```bash
|
28 |
+
$ pip install open_clip_torch
|
29 |
+
```
|
30 |
+
|
31 |
+
## Zero-shot Image Classification
|
32 |
+
```python
|
33 |
+
import open_clip
|
34 |
+
|
35 |
+
model, preprocess = open_clip.create_model_from_pretrained('hf-hub:llm-jp/llm-jp-clip-vit-base-patch16')
|
36 |
+
tokenizer = open_clip.get_tokenizer('hf-hub:llm-jp/llm-jp-clip-vit-base-patch16')
|
37 |
+
|
38 |
+
import torch
|
39 |
+
from PIL import Image
|
40 |
+
import requests
|
41 |
+
|
42 |
+
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
|
43 |
+
image = Image.open(requests.get(url, stream=True).raw)
|
44 |
+
image = preprocess(image).unsqueeze(0)
|
45 |
+
text = tokenizer(["猫", "犬", "鳥"])
|
46 |
+
|
47 |
+
with torch.no_grad(), torch.cuda.amp.autocast():
|
48 |
+
image_features = model.encode_image(image)
|
49 |
+
text_features = model.encode_text(text)
|
50 |
+
image_features /= image_features.norm(dim=-1, keepdim=True)
|
51 |
+
text_features /= text_features.norm(dim=-1, keepdim=True)
|
52 |
+
|
53 |
+
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
|
54 |
+
|
55 |
+
print("Label probs:", text_probs)
|
56 |
+
# Label probs: tensor([[9.9425e-01, 5.2273e-03, 5.2600e-04]])
|
57 |
+
```
|
58 |
+
|
59 |
+
Reference:
|
60 |
+
- [Using OpenCLIP at Hugging Face](https://huggingface.co/docs/hub/en/open_clip), HuggingFace Docs
|
61 |
+
- OpenCLIP [repository](https://github.com/mlfoundations/open_clip)
|
62 |
+
|
63 |
+
|
64 |
+
# Training Details
|
65 |
+
|
66 |
+
## Model Architecture
|
67 |
+
|
68 |
+
- Text Encoder: RoBERTa base with llm-jp-tokenizer
|
69 |
+
- Image Encoder: ViT-B/16
|
70 |
+
|
71 |
+
## Training Data
|
72 |
+
|
73 |
+
We used a Japanese-translated version of the relaion2B-en-research-safe dataset.
|
74 |
+
The translation was performed using gemma-2-9b-it.
|
75 |
+
Due to a 70% success rate in image downloads, the dataset size was 1.45 billion samples, and we processed it over 9 epochs (13 billion samples in total).
|
76 |
+
|
77 |
+
# Evaluation
|
78 |
+
|
79 |
+
Evaluation Code: https://github.com/llm-jp/clip-eval
|
80 |
+
|
81 |
+
TODO:
|
82 |
+
|
83 |
+
# LICENSE
|
84 |
+
[The Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
85 |
+
|
86 |
+
Please also see Gemma Terms of Use (https://ai.google.dev/gemma/terms) as the training data is translated by [gemma-2-9b-it](https://huggingface.co/datasets/laion/relaion2B-en-research-safe).
|
87 |
+
|
88 |
+
> 3.3 Generated Output
|
89 |
+
>
|
90 |
+
> Google claims no rights in Outputs you generate using Gemma. You and your users are solely responsible for Outputs and their subsequent uses.
|
91 |
+
|
92 |
+
# Citation
|
93 |
+
|
94 |
+
Bibtex:
|
95 |
+
```
|
96 |
+
TODO:
|
97 |
+
```
|