AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge Paper • 2412.13670 • Published 25 days ago • 4
RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios Paper • 2412.08972 • Published Dec 12, 2024 • 9
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy Paper • 2410.13218 • Published Oct 17, 2024 • 4
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design Paper • 2410.05677 • Published Oct 8, 2024 • 14
BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM Paper • 2406.12168 • Published Jun 18, 2024 • 7
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences Paper • 2406.11069 • Published Jun 16, 2024 • 13
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? Paper • 2406.07546 • Published Jun 11, 2024 • 8
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published Jun 12, 2024 • 24
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Paper • 2405.18750 • Published May 29, 2024 • 21
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11, 2024 • 30
Weak-to-Strong Jailbreaking on Large Language Models Paper • 2401.17256 • Published Jan 30, 2024 • 15
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models Paper • 2305.15393 • Published May 24, 2023 • 1
DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text Paper • 2305.17359 • Published May 27, 2023 • 1