hanxiao commited on
Commit
6607434
1 Parent(s): e7cdc21

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -32
README.md CHANGED
@@ -15,9 +15,6 @@ license: apache-2.0
15
  # jina-clip-v1
16
  Jina CLIP: your CLIP model is also your text retriever!
17
 
18
- ## Quick Start
19
-
20
- The easiest way to starting using `jina-clip-v1` is to use Jina AI [Embedding API](https://jina.ai/embeddings/).
21
 
22
  ## Intended Usage & Model Info
23
 
@@ -30,11 +27,11 @@ Traditional text embedding models, such as [jina-embeddings-v2-base-en](https://
30
 
31
  ## Data & Parameters
32
 
33
- `jina-clip-v1` [technical report]() coming soon.
34
 
35
  ## Usage
36
 
37
- You can use Jina CLIP directly from transformers package.
38
 
39
  ```python
40
  !pip install transformers einops timm pillow
@@ -55,31 +52,6 @@ print(cos_sim(text_embeddings[0], text_embeddings[1])) # text embedding similari
55
  print(cos_sim(text_embeddings[0], image_embeddings[0])) # text-image cross-modal similarity
56
  ```
57
 
58
- **notice: our emperical study shows that text-text cosine similarity is normally larger than text-image cosine similarity!**
59
- If you want to merge two scores, we recommended 2 ways:
60
-
61
- 1. weighted average of text-text sim and text-image sim:
62
-
63
- ```python
64
- # pseudo code
65
- alpha = 0.6
66
- beta = 0.4
67
-
68
- combined_scores = alpha * sim(query, document) + beta * sim(text, image)
69
- ```
70
-
71
- 2. apply z-score normalization before merging scores:
72
-
73
- ```python
74
- # pseudo code
75
- query_document_mean = np.mean(cos_sim_query_documents)
76
- query_document_std = np.std(cos_sim_query_documents)
77
- text_image_mean = np.mean(cos_sim_text_images)
78
- text_image_std = np.std(cos_sim_text_images)
79
-
80
- query_document_sim_normalized = (cos_sim_query_documents - query_document_mean) / query_document_std
81
- text_image_sim_normalized = (cos_sim_text_images - text_image_mean) / text_image_std
82
- ```
83
 
84
  ## Performance
85
 
@@ -119,7 +91,38 @@ Join our [Discord community](https://discord.jina.ai) and chat with other commun
119
 
120
  If you find `jina-clip-v1` useful in your research, please cite the following paper:
121
 
122
- ```console
123
- TBD
 
 
 
 
 
124
  ```
125
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  # jina-clip-v1
16
  Jina CLIP: your CLIP model is also your text retriever!
17
 
 
 
 
18
 
19
  ## Intended Usage & Model Info
20
 
 
27
 
28
  ## Data & Parameters
29
 
30
+ [Check out our paper](https://arxiv.org/abs/2405.20204)
31
 
32
  ## Usage
33
 
34
+ You can use Jina CLIP directly via transformers package.
35
 
36
  ```python
37
  !pip install transformers einops timm pillow
 
52
  print(cos_sim(text_embeddings[0], image_embeddings[0])) # text-image cross-modal similarity
53
  ```
54
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
  ## Performance
57
 
 
91
 
92
  If you find `jina-clip-v1` useful in your research, please cite the following paper:
93
 
94
+ ```bibtex
95
+ @misc{2405.20204,
96
+ Author = {Andreas Koukounas and Georgios Mastrapas and Michael Günther and Bo Wang and Scott Martens and Isabelle Mohr and Saba Sturua and Mohammad Kalim Akram and Joan Fontanals Martínez and Saahil Ognawala and Susana Guzman and Maximilian Werk and Nan Wang and Han Xiao},
97
+ Title = {Jina CLIP: Your CLIP Model Is Also Your Text Retriever},
98
+ Year = {2024},
99
+ Eprint = {arXiv:2405.20204},
100
+ }
101
  ```
102
 
103
+
104
+ **notice: our emperical study shows that text-text cosine similarity is normally larger than text-image cosine similarity!**
105
+ If you want to merge two scores, we recommended 2 ways:
106
+
107
+ 1. weighted average of text-text sim and text-image sim:
108
+
109
+ ```python
110
+ # pseudo code
111
+ alpha = 0.6
112
+ beta = 0.4
113
+
114
+ combined_scores = alpha * sim(query, document) + beta * sim(text, image)
115
+ ```
116
+
117
+ 2. apply z-score normalization before merging scores:
118
+
119
+ ```python
120
+ # pseudo code
121
+ query_document_mean = np.mean(cos_sim_query_documents)
122
+ query_document_std = np.std(cos_sim_query_documents)
123
+ text_image_mean = np.mean(cos_sim_text_images)
124
+ text_image_std = np.std(cos_sim_text_images)
125
+
126
+ query_document_sim_normalized = (cos_sim_query_documents - query_document_mean) / query_document_std
127
+ text_image_sim_normalized = (cos_sim_text_images - text_image_mean) / text_image_std
128
+ ```