AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge Paper • 2412.13670 • Published Dec 18, 2024 • 4
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models Paper • 2412.05939 • Published Dec 8, 2024 • 15
Themis: Towards Flexible and Interpretable NLG Evaluation Paper • 2406.18365 • Published Jun 26, 2024
Themis: Towards Flexible and Interpretable NLG Evaluation Paper • 2406.18365 • Published Jun 26, 2024
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation Paper • 2402.11493 • Published Feb 18, 2024