File size: 4,031 Bytes

---

language:
  - ko
  - en
  - zh
license: mit
pipeline_tag: feature-extraction
tags:
  - transformers
  - sentence-transformers
  - text-embeddings-inference
---




# upskyy/ko-reranker

**ko-reranker**는 [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) 모델에 [한국어 데이터](https://huggingface.co/datasets/upskyy/ko-wiki-reranking)를 finetuning 한 model 입니다.

## Usage
## Using FlagEmbedding
```

pip install -U FlagEmbedding

```

Get relevance scores (higher scores indicate more relevance):

```python

from FlagEmbedding import FlagReranker





reranker = FlagReranker('upskyy/ko-reranker', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation



score = reranker.compute_score(['query', 'passage'])

print(score) # -1.861328125



# You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score

score = reranker.compute_score(['query', 'passage'], normalize=True)

print(score) # 0.13454832326359276



scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])

print(scores) # [-7.37109375, 8.5390625]



# You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score

scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], normalize=True)

print(scores) # [0.0006287840192903181, 0.9998043646624727]

```

## Using Sentence-Transformers

```

pip install -U sentence-transformers

```

Get relevance scores (higher scores indicate more relevance):

```python

from sentence_transformers import SentenceTransformer





sentences_1 = ["경제 전문가가 금리 인하에 대한 예측을 하고 있다.", "주식 시장에서 한 투자자가 주식을 매수한다."]

sentences_2 = ["한 투자자가 비트코인을 매수한다.", "금융 거래소에서 새로운 디지털 자산이 상장된다."]



model = SentenceTransformer('upskyy/ko-reranker')



embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)

embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)

similarity = embeddings_1 @ embeddings_2.T



print(similarity)

```

## Using Huggingface transformers

Get relevance scores (higher scores indicate more relevance):


```python

import torch

from transformers import AutoModelForSequenceClassification, AutoTokenizer





tokenizer = AutoTokenizer.from_pretrained('upskyy/ko-reranker')

model = AutoModelForSequenceClassification.from_pretrained('upskyy/ko-reranker')

model.eval()



pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]



with torch.no_grad():

    inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)

    scores = model(**inputs, return_dict=True).logits.view(-1, ).float()

    print(scores)

```



## Citation

```bibtex

@misc{bge_embedding,

      title={C-Pack: Packaged Resources To Advance General Chinese Embedding}, 

      author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},

      year={2023},

      eprint={2309.07597},

      archivePrefix={arXiv},

      primaryClass={cs.CL}

}

```

## License

FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.

## Reference

- [Dongjin-kr/ko-reranker](https://huggingface.co/Dongjin-kr/ko-reranker)
- [reranker-kr](https://github.com/aws-samples/aws-ai-ml-workshop-kr/tree/master/genai/aws-gen-ai-kr/30_fine_tune/reranker-kr)
- [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding)