don't reproduce QuoraRetrieval NDCG@10 score.

#14
by jcli0606 - opened

thanks.
I want to reproduce to mteb/retrieval for QuoraRetrieval. but I get an NDCG@10 score of 80.73.
I confirm that query embedding have prompt,and doc don't have prompt。

Other dataset's NDCG@10 score can reproduce. For example SCIDOCS,ArguAna,etc.

Snowflake org
edited Jul 17

QuoraRetrieval is a duplicate question retrieval task, i.e. matching queries to other queries instead of queries to documents. As such, we follow the common practice of using the query prefix for both queries and documents when embedding this dataset (this was not our brilliant idea by any means, it goes back to the E5 paper at least -- see their Appendix B).

I do not believe this was properly documented anywhere, though, even in our tech report. My apologies for the oversight!

Snowflake org

You should see if this symmetrical embedding improves your organization's Stella models' scores on QuoraRetrieval, too, if you haven't yet!

(And good luck with the write-up for that one -- we're looking forward to reading when it's ready!)

Thanks ,got it.

jcli0606 changed discussion status to closed

Sign up or log in to comment