metadata

license: mit

isy-thl/bge-reranker-base-course-skill-tuned

Overview

This model is a finetuning of BAAI/bge-reranker-base on a German dataset containing positive and negative skill labels and learning outcomes of courses as the query. The model is trained to perform well on calculating relevance scores for learning outcome and esco skill pairs in German language.

Using FlagEmbedding

pip install -U FlagEmbedding

Get relevance scores (higher scores indicate more relevance):

from FlagEmbedding import FlagReranker
reranker = FlagReranker('isy-thl/bge-reranker-base-course-skill-tuned', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation

scores = reranker.compute_score([['Einführung in die Arbeitsweise von WordPress', 'WordPress'], ['Einführung in die Arbeitsweise von WordPress', 'Software für Content-Management-Systeme nutzen'], ['Einführung in die Arbeitsweise von WordPress', 'Website-Sichtbarkeit erhöhen']])
print(scores)

The resulting scores can be normalized using a sigmoid function

score = 1 / (1 + math.exp(-score))

Performance

To evaluate the model, all ESCO (x=13895) and GRETA (x=23) skills were embedded using the model under assessment and stored in a vector database. For each query in the evaluation dataset, the top 30 most relevant candidates were retrieved based on cosine similarity. Metrics such as accuracy, precision, recall, NDCG, MRR, and MAP were then calculated. For reranker evaluation, the reranker was used to re-rank the top 30 candidates chosen by the fine-tuned bi-encoder model. The evaluation results were split for the ESCO and GRETA use cases:

ESCO Use Case

GRETA Use Case

The results demonstrate that fine-tuning significantly enhanced the performance of the model, often more than doubling the performance of the non-fine-tuned base model. Notably, fine-tuning with training data from both use cases outperformed fine-tuning with training data from only the target skill taxonomy. This suggests that the models learn more than just specific skills from the training data and are capable of generalizing. Further research could evaluate the model's performance on an unknown skill taxonomy, where we expect it to perform better as well.

The fine-tuned BI-Encoder model (isy-thl/multilingual-e5-base-course-skill-tuned) shows exceptional performance for the target task, providing significant improvements over the base model. To maximize retrieval success, it is recommended to complement the BI-Encoder model with the reranker (isy-thl/bge-reranker-base-course-skill-tuned), especially in scenarios where the computational cost is justified by the need for higher accuracy and precision.

Acknowledgments

Special thanks to the contributors from the Institut für Interaktive Systeme, Kursportal Schleswig-Holstein, Weiterbildung Hessen eV, MyEduLife, and Trainspot for their invaluable support and contributions to the dataset and finetuning process.

Funding: This project was funded by the Federal Ministry of Education and Research.