Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge Paper β’ 2502.16457 β’ Published 15 days ago β’ 11
Break the Breakout: Reinventing LM Defense Against Jailbreak Attacks with Self-Refinement Paper β’ 2402.15180 β’ Published Feb 23, 2024
FLEX: Expert-level False-Less EXecution Metric for Reliable Text-to-SQL Benchmark Paper β’ 2409.19014 β’ Published Sep 24, 2024
GTA: Gated Toxicity Avoidance for LM Performance Preservation Paper β’ 2312.06122 β’ Published Dec 11, 2023