Dang Phuong Nam commited on
Commit
938bb7b
1 Parent(s): d2e9205

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -4
README.md CHANGED
@@ -47,7 +47,7 @@ Get relevance scores (higher scores indicate more relevance):
47
  ```python
48
  from FlagEmbedding import FlagReranker
49
 
50
- reranker = FlagReranker('namdp/bge-reranker-vietnamese',
51
  use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
52
 
53
  score = reranker.compute_score(['tỉnh nào có diện tích lớn nhất việt nam', 'nghệ an có diện tích lớn nhất việt nam'])
@@ -89,8 +89,8 @@ Get relevance scores (higher scores indicate more relevance):
89
  import torch
90
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
91
 
92
- tokenizer = AutoTokenizer.from_pretrained('namdp/bge-reranker-vietnamese')
93
- model = AutoModelForSequenceClassification.from_pretrained('namdp/bge-reranker-vietnamese')
94
  model.eval()
95
 
96
  pairs = [
@@ -115,4 +115,32 @@ Train data should be a json file, where each line is a dict like this:
115
 
116
  `query` is the query, and `pos` is a list of positive texts, `neg` is a list of negative texts, `prompt` indicates the
117
  relationship between query and texts. If you have no negative texts for a query, you can random sample some from the
118
- entire corpus as the negatives.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  ```python
48
  from FlagEmbedding import FlagReranker
49
 
50
+ reranker = FlagReranker('namdp/ViRanker',
51
  use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
52
 
53
  score = reranker.compute_score(['tỉnh nào có diện tích lớn nhất việt nam', 'nghệ an có diện tích lớn nhất việt nam'])
 
89
  import torch
90
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
91
 
92
+ tokenizer = AutoTokenizer.from_pretrained('namdp/ViRanker')
93
+ model = AutoModelForSequenceClassification.from_pretrained('namdp/ViRanker')
94
  model.eval()
95
 
96
  pairs = [
 
115
 
116
  `query` is the query, and `pos` is a list of positive texts, `neg` is a list of negative texts, `prompt` indicates the
117
  relationship between query and texts. If you have no negative texts for a query, you can random sample some from the
118
+ entire corpus as the negatives.
119
+
120
+ ## Performance
121
+
122
+ In the following table, we provide various pre-trained Cross-Encoders together with their performance on
123
+ the [MS MMarco Passage Reranking - Vi - Dev](https://huggingface.co/datasets/unicamp-dl/mmarco) dataset.
124
+
125
+ | Model-Name | NDCG@3 | MRR@3 | NDCG@5 | MRR@5 | NDCG@10 | MRR@10 | Docs / Sec |
126
+ |-----------------------------------------------------------------------------------------------------------------------------------------|:-----------|:-----------|:-----------|:-----------|:-----------|:-----------|:-----------|
127
+ | [namdp/ViRanker](https://huggingface.co/namdp/ViRanker) | **0.6685** | **0.6564** | 0.6842 | **0.6811** | 0.7278 | **0.6985** | 2.02
128
+ | [itdainb/PhoRankere](https://huggingface.co/itdainb/PhoRanker) | 0.6625 | 0.6458 | **0.7147** | 0.6731 | **0.7422** | 0.6830 | **15**
129
+ | [kien-vu-uet/finetuned-phobert-passage-rerank-best-eval](https://huggingface.co/kien-vu-uet/finetuned-phobert-passage-rerank-best-eval) | 0.0963 | 0.0883 | 0.1396 | 0.1131 | 0.1681 | 0.1246 | **15**
130
+ | [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) | 0.6087 | 0.5841 | 0.6513 | 0.6062 | 0.6872 | 0.62091 | 3.51
131
+ | [BAAI/bge-reranker-v2-gemma](https://huggingface.co/BAAI/bge-reranker-v2-gemma) | 0.6088 | 0.5908 | 0.6446 | 0.6108 | 0.6785 | 0.6249 | 1.29
132
+
133
+ ## Citation
134
+
135
+ Please cite as
136
+
137
+ ```Plaintext
138
+ @misc{ViRanker,
139
+ title={ViRanker: A Cross-encoder Model for Vietnamese Text Ranking},
140
+ author={Nam Dang Phuong},
141
+ year={2024},
142
+ publisher={Huggingface},
143
+ journal={huggingface repository},
144
+ howpublished={\url{https://huggingface.co/namdp/ViRanker}},
145
+ }
146
+ ```