Code for training / finetuning the sparse encoders

by oneryalcin - opened Jul 22

Jul 22

Forst of all many thanks for both v1 and v2 models, we are using v1 and happy with the retrieval quality in general. I'll be evaluating v2 models as soon as I can. My question is about if you have any plans to release documentation around pretraining or fine tuning the sparse encoder models. I'd like to adapt the model to our domains (to help with out of vocabulary words) and increase recall.

I'd appreciate if you could share a github repo or any blog that explains how to fine tune or pretrain these models. Many thanks again.

zhichao-geng

opensearch-project org Jul 26

Thanks for your interest on our project! We're condensing our training techniques and plan to release a paper about these details. After that we'll release the code on github. But the concrete repo to place the code is not decided yet.

oneryalcin

Jul 26

Many thanks I'll be looking forward to read the paper and test the code :)

freethenation

Aug 2

This is great work! Looking forward to the paper! Any ideas when you might release it?

zhichao-geng

opensearch-project org Aug 5

@freethenation We have finished the draft version, and now we're working on improving the structure and writing. After we finilize the paper, it still needs to go through some internal review before the paper and code can be public released. I guess we still need a few months to finish these

macavaney

3 days ago

Any updates on this?

zhichao-geng

opensearch-project org 2 days ago

Any updates on this?

We're under the internal review to make them public

macavaney

2 days ago

Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment