-
protectai/deberta-v3-base-prompt-injection-v2
Text Classification • 0.2B • Updated • 164k • • 82 -
openai/gpt-oss-safeguard-20b
Text Generation • 22B • Updated • 21.8k • • 175 -
meta-llama/Llama-Prompt-Guard-2-86M
Text Classification • 0.3B • Updated • 33.8k • • 73 -
leolee99/PIGuard
Text Classification • 0.2B • Updated • 2.16k • 4
Lipeng (Tony) He
ttttonyhe
·
AI & ML interests
Trustworthy Machine Learning
Recent Activity
authored
a paper
about 3 hours ago
Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance
submitted
a paper
about 8 hours ago
Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance
updated
a collection
1 day ago
Red-Teaming Models & Datasets