Collection of datasets for the paper "No Free Labels: Limitations of LLM-as-a-Judge Without Human Grounding"