CLINIC: Evaluating Multilingual Trustworthiness in Language Models for Healthcare Paper • 2512.11437 • Published 15 days ago • 3
Polarity-Aware Probing for Quantifying Latent Alignment in Language Models Paper • 2511.21737 • Published Nov 21 • 1
Polarity-Aware Probing Datasets Collection Datasets for PA-Probing described in "Polarity-Aware Probing for Quantifying Latent Alignment in Language Models" https://www.arxiv.org/pdf/2511.21737 • 2 items • Updated 20 days ago • 1
Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models Paper • 2402.04614 • Published Feb 7, 2024 • 3
SAM: The Sensitivity of Attribution Methods to Hyperparameters Paper • 2003.08754 • Published Mar 4, 2020