Chirag Agarwal's picture

4 3

Chirag Agarwal

chirag912

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 12 days ago

CLINIC: Evaluating Multilingual Trustworthiness in Language Models for Healthcare

upvoted a paper 15 days ago

Polarity-Aware Probing for Quantifying Latent Alignment in Language Models

liked a dataset 15 days ago

SabrinaSadiekh/not_hate_dataset

View all activity

Organizations

upvoted a paper 12 days ago

CLINIC: Evaluating Multilingual Trustworthiness in Language Models for Healthcare

Paper • 2512.11437 • Published 15 days ago • 3

upvoted a paper 15 days ago

Polarity-Aware Probing for Quantifying Latent Alignment in Language Models

Paper • 2511.21737 • Published Nov 21 • 1

liked a dataset 15 days ago

SabrinaSadiekh/not_hate_dataset

Viewer • Updated 7 days ago • 1.25k • 49 • 2

upvoted a collection 15 days ago

Polarity-Aware Probing Datasets

Datasets for PA-Probing described in "Polarity-Aware Probing for Quantifying Latent Alignment in Language Models" https://www.arxiv.org/pdf/2511.21737 • 2 items • Updated 20 days ago • 1

liked a dataset about 1 month ago

SabrinaSadiekh/mixed_hate_dataset

Viewer • Updated 7 days ago • 1.24k • 46 • 2

liked a Space over 1 year ago

PEEB Demo

Select and edit bird images to classify and explain predictions

authored a paper almost 2 years ago

Counterfactual Explanation Policies in RL

Paper • 2307.13192 • Published Jul 25, 2023

upvoted a paper almost 2 years ago

Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

Paper • 2402.04614 • Published Feb 7, 2024 • 3

authored a paper over 2 years ago

SAM: The Sensitivity of Attribution Methods to Hyperparameters

Paper • 2003.08754 • Published Mar 4, 2020