Post
1810
~75% on the challenging GPQA with only 40M parameters 🔥🥳
GREAT ACHIEVEMENT ! Or is it ?
This new Work, "Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation", take out the mystery about many models i personally suspected their results. Speacially on leaderboards other than the english one, Like the Open Arabic LLM Leaderbaord OALL/Open-Arabic-LLM-Leaderboard.
The authors of this work, first started by training a model on the GPQA data, which, unsurprisingly, led to the model achieving 100% performance.
Afterward, they trained what they referred to as a 'legitimate' model on legitimate data (MedMCQA). However, they introduced a distillation loss from the earlier, 'cheated' model.
What they discovered was fascinating: the knowledge of GPQA leaked through this distillation loss, even though the legitimate model was never explicitly trained on GPQA during this stage.
This raises important questions about the careful use of distillation in model training, especially when the training data is opaque. As they demonstrated, it’s apparently possible to (intentionally or unintentionally) leak test data through this method.
Find out more: Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation (2412.15255)
GREAT ACHIEVEMENT ! Or is it ?
This new Work, "Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation", take out the mystery about many models i personally suspected their results. Speacially on leaderboards other than the english one, Like the Open Arabic LLM Leaderbaord OALL/Open-Arabic-LLM-Leaderboard.
The authors of this work, first started by training a model on the GPQA data, which, unsurprisingly, led to the model achieving 100% performance.
Afterward, they trained what they referred to as a 'legitimate' model on legitimate data (MedMCQA). However, they introduced a distillation loss from the earlier, 'cheated' model.
What they discovered was fascinating: the knowledge of GPQA leaked through this distillation loss, even though the legitimate model was never explicitly trained on GPQA during this stage.
This raises important questions about the careful use of distillation in model training, especially when the training data is opaque. As they demonstrated, it’s apparently possible to (intentionally or unintentionally) leak test data through this method.
Find out more: Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation (2412.15255)