Post
🔥 New LLM leaderboard on the hub: an LLM Hallucination Leaderboard!
Led by @pminervini , it evaluates the propensity of models to *hallucinate*, either on factuality (= say false things) or faithfulness (= ignore user instructions). This is becoming an increasingly important avenue of research, as more and more people are starting to rely on LLMs to find and search for information!
It contains 14 datasets, grouped over 7 concepts, to try to get a better overall view of when LLMs output wrong content.
hallucinations-leaderboard/leaderboard
Their introductory blog post also contains an in depth analysis of which LLMs get what wrong, which is super interesting: https://huggingface.co/blog/leaderboards-on-the-hub-hallucinations
Congrats to the team! 🚀
Led by @pminervini , it evaluates the propensity of models to *hallucinate*, either on factuality (= say false things) or faithfulness (= ignore user instructions). This is becoming an increasingly important avenue of research, as more and more people are starting to rely on LLMs to find and search for information!
It contains 14 datasets, grouped over 7 concepts, to try to get a better overall view of when LLMs output wrong content.
hallucinations-leaderboard/leaderboard
Their introductory blog post also contains an in depth analysis of which LLMs get what wrong, which is super interesting: https://huggingface.co/blog/leaderboards-on-the-hub-hallucinations
Congrats to the team! 🚀