|
|
--- |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-classification |
|
|
library_name: peft |
|
|
base_model: microsoft/deberta-v3-large |
|
|
datasets: |
|
|
- stealthcode/ai-detection |
|
|
tags: |
|
|
- lora |
|
|
- ai-detection |
|
|
- binary-classification |
|
|
- deberta-v3-large |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- auroc |
|
|
- average_precision |
|
|
model-index: |
|
|
- name: AI Detector LoRA (DeBERTa-v3-large) |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: AI Text Detection |
|
|
dataset: |
|
|
name: stealthcode/ai-detection |
|
|
type: stealthcode/ai-detection |
|
|
metrics: |
|
|
- type: auroc |
|
|
value: 0.9985 |
|
|
- type: f1 |
|
|
value: 0.9812 |
|
|
- type: accuracy |
|
|
value: 0.9814 |
|
|
--- |
|
|
|
|
|
# AI Detector LoRA (DeBERTa-v3-large) |
|
|
|
|
|
LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.7M English samples |
|
|
(`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model. |
|
|
|
|
|
- **Base model:** `microsoft/deberta-v3-large` |
|
|
- **Task:** Binary classification (AI vs Human) |
|
|
- **Head:** Single-logit + `BCEWithLogitsLoss` |
|
|
- **Adapter type:** LoRA (`peft`) |
|
|
- **Hardware:** 8 x RTX 5090, bf16, multi-GPU |
|
|
- **Final decision threshold:** **0.8697** (max-F1 on calibration set) |
|
|
|
|
|
--- |
|
|
|
|
|
## Files in this repo |
|
|
|
|
|
- `adapter/` – LoRA weights saved with `peft_model.save_pretrained(...)` |
|
|
- `merged_model/` – fully merged model (base + LoRA) for standalone use |
|
|
- `threshold.json` – chosen deployment threshold and validation F1 |
|
|
- `calibration.json` – temperature scaling parameters and calibration metrics |
|
|
- `results.json` – hyperparameters, validation threshold search, test metrics |
|
|
- `training_log_history.csv` – raw Trainer log history |
|
|
- `predictions_calib.csv` – calibration-set probabilities and labels |
|
|
- `predictions_test.csv` – test probabilities and labels |
|
|
- `figures/` – training and evaluation plots |
|
|
- `README.md` – this file |
|
|
|
|
|
--- |
|
|
|
|
|
## Metrics (test set, n=279,241) |
|
|
|
|
|
Using threshold **0.8697**: |
|
|
|
|
|
| Metric | Value | |
|
|
| ---------------------- | ------ | |
|
|
| AUROC | 0.9985 | |
|
|
| Average Precision (AP) | 0.9985 | |
|
|
| F1 | 0.9812 | |
|
|
| Accuracy | 0.9814 | |
|
|
| Precision (AI) | 0.9902 | |
|
|
| Recall (AI) | 0.9724 | |
|
|
| Precision (Human) | 0.9728 | |
|
|
| Recall (Human) | 0.9904 | |
|
|
|
|
|
Confusion matrix (test): |
|
|
|
|
|
- **True Negatives (Human correctly)**: 138,276 |
|
|
- **False Positives (Human → AI)**: 1,345 |
|
|
- **False Negatives (AI → Human)**: 3,859 |
|
|
- **True Positives (AI correctly)**: 135,761 |
|
|
|
|
|
### Calibration |
|
|
|
|
|
- **Method:** temperature scaling |
|
|
- **Temperature (T):** 1.4437 |
|
|
- **Calibration set:** calibration |
|
|
- Test ECE: 0.0075 → 0.0116 (after calibration) |
|
|
- Test Brier: 0.0157 → 0.0156 (after calibration) |
|
|
|
|
|
--- |
|
|
|
|
|
## Plots |
|
|
|
|
|
### Training & validation |
|
|
|
|
|
- Learning curves: |
|
|
|
|
|
 |
|
|
|
|
|
- Eval metrics over time: |
|
|
|
|
|
 |
|
|
|
|
|
### Validation set |
|
|
|
|
|
- ROC: |
|
|
|
|
|
 |
|
|
|
|
|
- Precision–Recall: |
|
|
|
|
|
 |
|
|
|
|
|
- Calibration curve: |
|
|
|
|
|
 |
|
|
|
|
|
- F1 vs threshold: |
|
|
|
|
|
 |
|
|
|
|
|
### Test set |
|
|
|
|
|
- ROC: |
|
|
|
|
|
 |
|
|
|
|
|
- Precision–Recall: |
|
|
|
|
|
 |
|
|
|
|
|
- Calibration curve: |
|
|
|
|
|
 |
|
|
|
|
|
- Confusion matrix: |
|
|
|
|
|
 |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Load base + LoRA adapter |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
from peft import PeftModel |
|
|
import torch |
|
|
import json |
|
|
|
|
|
base_model_id = "microsoft/deberta-v3-large" |
|
|
adapter_id = "stealthcode/ai-detection" # or local: "./adapter" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(base_model_id) |
|
|
|
|
|
base_model = AutoModelForSequenceClassification.from_pretrained( |
|
|
base_model_id, |
|
|
num_labels=1, # single logit for BCEWithLogitsLoss |
|
|
) |
|
|
model = PeftModel.from_pretrained(base_model, adapter_id) |
|
|
model.eval() |
|
|
``` |
|
|
|
|
|
### Inference with threshold |
|
|
|
|
|
```python |
|
|
# load threshold |
|
|
with open("threshold.json") as f: |
|
|
thr = json.load(f)["threshold"] # 0.8697 |
|
|
|
|
|
def predict_proba(texts): |
|
|
enc = tokenizer( |
|
|
texts, |
|
|
padding=True, |
|
|
truncation=True, |
|
|
max_length=512, |
|
|
return_tensors="pt", |
|
|
) |
|
|
with torch.no_grad(): |
|
|
logits = model(**enc).logits.squeeze(-1) |
|
|
probs = torch.sigmoid(logits) |
|
|
return probs.cpu().numpy() |
|
|
|
|
|
def predict_label(texts, threshold=thr): |
|
|
probs = predict_proba(texts) |
|
|
return (probs >= threshold).astype(int) |
|
|
|
|
|
# example |
|
|
texts = ["Some example text to classify"] |
|
|
probs = predict_proba(texts) |
|
|
labels = predict_label(texts) |
|
|
print(probs, labels) # label 1 = AI, 0 = Human |
|
|
``` |
|
|
|
|
|
### Load merged model (no PEFT required) |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch, json |
|
|
|
|
|
model_dir = "./merged_model" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_dir) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_dir) |
|
|
model.eval() |
|
|
|
|
|
with open("threshold.json") as f: |
|
|
thr = json.load(f)["threshold"] # 0.8697 |
|
|
|
|
|
def predict_proba(texts): |
|
|
enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt") |
|
|
with torch.no_grad(): |
|
|
logits = model(**enc).logits.squeeze(-1) |
|
|
probs = torch.sigmoid(logits) |
|
|
return probs.cpu().numpy() |
|
|
``` |
|
|
|
|
|
### Optional: apply temperature scaling to logits |
|
|
|
|
|
```python |
|
|
import json |
|
|
with open("calibration.json") as f: |
|
|
T = json.load(f)["temperature"] # e.g., 1.4437 |
|
|
|
|
|
def predict_proba_calibrated(texts): |
|
|
enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt") |
|
|
with torch.no_grad(): |
|
|
logits = model(**enc).logits.squeeze(-1) |
|
|
probs = torch.sigmoid(logits / T) |
|
|
return probs.cpu().numpy() |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Notes |
|
|
|
|
|
- Classifier head is **trainable** together with LoRA layers (unfrozen after applying PEFT). |
|
|
- **LoRA config:** |
|
|
- `r=32`, `alpha=128`, `dropout=0.0` |
|
|
- Target modules: `query_proj`, `key_proj`, `value_proj` |
|
|
- **Training config:** |
|
|
|
|
|
- `bf16=True` |
|
|
- `optim="adamw_torch_fused"` |
|
|
- `lr_scheduler_type="cosine_with_restarts"` |
|
|
- `num_train_epochs=2` |
|
|
- `per_device_train_batch_size=8`, `gradient_accumulation_steps=4` |
|
|
- `max_grad_norm=0.5` |
|
|
|
|
|
- Threshold `0.8697` was chosen as the **max-F1** point on the calibration set. |
|
|
You can adjust it if you prefer fewer false positives or fewer false negatives. |
|
|
|