Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark Paper • 2410.14702 • Published Oct 6, 2024 • 1
LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks Paper • 2311.09564 • Published Nov 16, 2023
InstructABSA: Instruction Learning for Aspect Based Sentiment Analysis Paper • 2302.08624 • Published Feb 16, 2023 • 2
TarGEN: Targeted Data Generation with Large Language Models Paper • 2310.17876 • Published Oct 27, 2023
"John is 50 years old, can his son be 65?" Evaluating NLP Models' Understanding of Feasibility Paper • 2210.07471 • Published Oct 14, 2022