Spaces:

mteb
/

leaderboard

Running on CPU Upgrade

App Files Files Community

153

New SOTA! Apply for refreshing the results

#55

by stanpcf - opened Dec 16, 2023

Discussion

stanpcf

Dec 16, 2023

Hi, @Muennighoff
Thanks for the great work!
I submitted a new Chinese Text Embedding models: "Baichuan-text-embedding", can you help restart this space?
Thanks!

stanpcf changed discussion title from Restart the space for new models to New SOTA! Apply for refreshing the results Dec 18, 2023

Muennighoff

Massive Text Embedding Benchmark org Dec 18, 2023

Done! Sorry it took me a bit cuz I think you manually modified some of the result files - e.g. STS had the name STS22_zh etc. I fixed them 👍

Muennighoff

Massive Text Embedding Benchmark org Dec 18, 2023

Also congrats on the great performance! Would love to know how you did it :)

stanpcf

Dec 19, 2023

•

edited Dec 19, 2023

Done! Sorry it took me a bit cuz I think you manually modified some of the result files - e.g. STS had the name STS22_zh etc. I fixed them 👍

Sorry for your confuse. thanks for your reply. rename is because I use BGE C_MTEB evaluation(https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) in the beginning, which only 31 datasets, lack 4 datasets so I add 4 class to the code which 4 datasets I only need zh, so I changed classname to diff from origin). I checked my results before submitting this result by using https://github.com/embeddings-benchmark/mteb/blob/main/scripts/run_mteb_chinese.py, all is same but the 4 filename.

stanpcf

Dec 19, 2023

Also congrats on the great performance! Would love to know how you did it :)

thanks. detail is coming soon :)

Muennighoff

Massive Text Embedding Benchmark org Dec 22, 2023

Also congrats on the great performance! Would love to know how you did it :)

thanks. detail is coming soon :)

Looking forward to it! Let me know if I can help :)

stanpcf

Dec 23, 2023

•

edited Dec 23, 2023

Also congrats on the great performance! Would love to know how you did it :)

thanks. detail is coming soon :)

Looking forward to it! Let me know if I can help :)

some info in this link: https://mp.weixin.qq.com/s/Hy78rtJuJTehAJIC-HK2Rg
key info:

much more high-quality data than existed model
improved contrastive loss. a).for batch size limit b). for cluster and classfication task (which two task may conflict with other task)

Muennighoff

Massive Text Embedding Benchmark org Dec 23, 2023

Also congrats on the great performance! Would love to know how you did it :)

thanks. detail is coming soon :)

Looking forward to it! Let me know if I can help :)

some info in this link: https://mp.weixin.qq.com/s/Hy78rtJuJTehAJIC-HK2Rg
key info:

much more high-quality data than existed model

improved contrastive loss. a).for batch size limit b). for cluster and classfication task (which two task may conflict with other task)

Looks cool, thanks! How does the new loss function work?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment