NUBIA: NeUral Based Interchangeability Assessor for Text Generation Paper • 2004.14667 • Published Apr 30, 2020
Evaluation data contamination in LLMs: how do we measure it and (when) does it matter? Paper • 2411.03923 • Published Nov 6, 2024