ReviewScore: Misinformed Peer Review Detection with Large Language Models Paper • 2509.21679 • Published Sep 25, 2025 • 63
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties Paper • 2505.20875 • Published May 27, 2025 • 3
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models Paper • 2505.17225 • Published May 22, 2025 • 64