Early-Exit and Instant Confidence Translation Quality Estimation Paper • 2502.14429 • Published 5 days ago • 2 • 2
How to Select Datapoints for Efficient Human Evaluation of NLG Models? Paper • 2501.18251 • Published 26 days ago • 2 • 1
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization Paper • 2410.03190 • Published Oct 4, 2024 • 1
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding Paper • 2502.14949 • Published 5 days ago • 6 • 2
Evaluating Multimodal Generative AI with Korean Educational Standards Paper • 2502.15422 • Published 4 days ago • 8 • 3
Learning to Discover Regulatory Elements for Gene Expression Prediction Paper • 2502.13991 • Published 7 days ago • 1 • 2
mStyleDistance: Multilingual Style Embeddings and their Evaluation Paper • 2502.15168 • Published 5 days ago • 3 • 2
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path? Paper • 2502.15657 • Published 4 days ago • 4 • 2
Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model Paper • 2502.13449 • Published 7 days ago • 41 • 2
FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation Paper • 2502.13995 • Published 6 days ago • 7 • 2
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published 5 days ago • 132 • 3
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models Paper • 2502.14302 • Published 6 days ago • 8 • 2
Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease Paper • 2502.15069 • Published 5 days ago • 1 • 2