New SOTA! Apply for refreshing the results

#55
by stanpcf - opened

Hi, @Muennighoff
Thanks for the great work!
I submitted a new Chinese Text Embedding models: "Baichuan-text-embedding", can you help restart this space?
Thanks!

stanpcf changed discussion title from Restart the space for new models to New SOTA! Apply for refreshing the results
Massive Text Embedding Benchmark org

Done! Sorry it took me a bit cuz I think you manually modified some of the result files - e.g. STS had the name STS22_zh etc. I fixed them ๐Ÿ‘

Massive Text Embedding Benchmark org

Also congrats on the great performance! Would love to know how you did it :)

Done! Sorry it took me a bit cuz I think you manually modified some of the result files - e.g. STS had the name STS22_zh etc. I fixed them ๐Ÿ‘

Sorry for your confuse. thanks for your reply. rename is because I use BGE C_MTEB evaluation(https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) in the beginning, which only 31 datasets, lack 4 datasets so I add 4 class to the code which 4 datasets I only need zh, so I changed classname to diff from origin). I checked my results before submitting this result by using https://github.com/embeddings-benchmark/mteb/blob/main/scripts/run_mteb_chinese.py, all is same but the 4 filename.

Also congrats on the great performance! Would love to know how you did it :)

thanks. detail is coming soon :)

Massive Text Embedding Benchmark org

Also congrats on the great performance! Would love to know how you did it :)

thanks. detail is coming soon :)

Looking forward to it! Let me know if I can help :)

Also congrats on the great performance! Would love to know how you did it :)

thanks. detail is coming soon :)

Looking forward to it! Let me know if I can help :)

some info in this link: https://mp.weixin.qq.com/s/Hy78rtJuJTehAJIC-HK2Rg
key info:

  1. much more high-quality data than existed model
  2. improved contrastive loss. a).for batch size limit b). for cluster and classfication task (which two task may conflict with other task)
Massive Text Embedding Benchmark org

Also congrats on the great performance! Would love to know how you did it :)

thanks. detail is coming soon :)

Looking forward to it! Let me know if I can help :)

some info in this link: https://mp.weixin.qq.com/s/Hy78rtJuJTehAJIC-HK2Rg
key info:

  1. much more high-quality data than existed model
  2. improved contrastive loss. a).for batch size limit b). for cluster and classfication task (which two task may conflict with other task)

Looks cool, thanks! How does the new loss function work?

Sign up or log in to comment