Update README.md
Browse files
README.md
CHANGED
@@ -26,7 +26,7 @@ If you just want to check out how to use the model, please check out the [Usage
|
|
26 |
|
27 |
Welcome to JaColBERT version 2, the second release of JaColBERT, a Japanese-only document retrieval model based on [ColBERT](https://github.com/stanford-futuredata/ColBERT).
|
28 |
|
29 |
-
JaColBERTv2 is a model that offers very strong out-of-domain generalisation. Having been only trained on a single dataset (MMarco), it reaches state-of-the-art performance
|
30 |
|
31 |
JaColBERTv2 was initialised off JaColBERTv1 and trained using knowledge distillation with 31 negative examples per positive example. It was trained for 250k steps using a batch size of 32.
|
32 |
|
@@ -66,21 +66,22 @@ We present the first results, on two datasets: JQaRa, a passage retrieval task c
|
|
66 |
|
67 |
JaColBERTv2 reaches state-of-the-art results on both datasets, outperforming models with 5x more parameters.
|
68 |
|
69 |
-
|
70 |
-
|
|
71 |
-
|
|
72 |
-
|
|
73 |
-
|
|
74 |
-
|
|
75 |
-
|
|
76 |
-
|
|
77 |
-
| m-e5-
|
78 |
-
| m-e5-
|
79 |
-
|
|
80 |
-
|
|
81 |
-
| sup-simcse-ja-
|
82 |
-
|
|
83 |
-
|
|
|
|
84 |
|
85 |
|
86 |
# Usage
|
|
|
26 |
|
27 |
Welcome to JaColBERT version 2, the second release of JaColBERT, a Japanese-only document retrieval model based on [ColBERT](https://github.com/stanford-futuredata/ColBERT).
|
28 |
|
29 |
+
JaColBERTv2 is a model that offers very strong out-of-domain generalisation. Having been only trained on a single dataset (MMarco), it reaches state-of-the-art performance.
|
30 |
|
31 |
JaColBERTv2 was initialised off JaColBERTv1 and trained using knowledge distillation with 31 negative examples per positive example. It was trained for 250k steps using a batch size of 32.
|
32 |
|
|
|
66 |
|
67 |
JaColBERTv2 reaches state-of-the-art results on both datasets, outperforming models with 5x more parameters.
|
68 |
|
69 |
+
|
70 |
+
| | | | JQaRa | | | | JSQuAD | | |
|
71 |
+
| ------------------- | --- | --------- | --------- | --------- | --------- | --- | --------- | --------- | --------- |
|
72 |
+
| | | NDCG@10 | MRR@10 | NDCG@100 | MRR@100 | | R@1 | R@5 | R@10 |
|
73 |
+
| JaColBERTv2 | | **0.585** | **0.836** | **0.753** | **0.838** | | **0.918** | **0.975** | **0.982** |
|
74 |
+
| JaColBERT | | 0.549 | 0.811 | 0.730 | 0.814 | | 0.906 | 0.968 | 0.978 |
|
75 |
+
| bge-m3+all | | 0.576 | 0.818 | 0.745 | 0.820 | | N/A | N/A | N/A |
|
76 |
+
| bg3-m3+dense | | 0.539 | 0.785 | 0.721 | 0.788 | | 0.850 | 0.959 | 0.976 |
|
77 |
+
| m-e5-large | | 0.554 | 0.799 | 0.731 | 0.801 | | 0.865 | 0.966 | 0.977 |
|
78 |
+
| m-e5-base | | 0.471 | 0.727 | 0.673 | 0.731 | | *0.838* | *0.955* | 0.973 |
|
79 |
+
| m-e5-small | | 0.492 | 0.729 | 0.689 | 0.733 | | *0.840* | *0.954* | 0.973 |
|
80 |
+
| GLuCoSE | | 0.308 | 0.518 | 0.564 | 0.527 | | 0.645 | 0.846 | 0.897 |
|
81 |
+
| sup-simcse-ja-base | | 0.324 | 0.541 | 0.572 | 0.550 | | 0.632 | 0.849 | 0.897 |
|
82 |
+
| sup-simcse-ja-large | | 0.356 | 0.575 | 0.596 | 0.583 | | 0.603 | 0.833 | 0.889 |
|
83 |
+
| fio-base-v0.1 | | 0.372 | 0.616 | 0.608 | 0.622 | | 0.700 | 0.879 | 0.924 |
|
84 |
+
| | | | | | | | | | |
|
85 |
|
86 |
|
87 |
# Usage
|