ai-detection / README.md
kbourro's picture
Model improved
8b1a796
---
language:
- en
pipeline_tag: text-classification
library_name: peft
base_model: microsoft/deberta-v3-large
datasets:
- stealthcode/ai-detection
tags:
- lora
- ai-detection
- binary-classification
- deberta-v3-large
metrics:
- accuracy
- f1
- auroc
- average_precision
model-index:
- name: AI Detector LoRA (DeBERTa-v3-large)
results:
- task:
type: text-classification
name: AI Text Detection
dataset:
name: stealthcode/ai-detection
type: stealthcode/ai-detection
metrics:
- type: auroc
value: 0.9985
- type: f1
value: 0.9812
- type: accuracy
value: 0.9814
---
# AI Detector LoRA (DeBERTa-v3-large)
LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.7M English samples
(`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model.
- **Base model:** `microsoft/deberta-v3-large`
- **Task:** Binary classification (AI vs Human)
- **Head:** Single-logit + `BCEWithLogitsLoss`
- **Adapter type:** LoRA (`peft`)
- **Hardware:** 8 x RTX 5090, bf16, multi-GPU
- **Final decision threshold:** **0.8697** (max-F1 on calibration set)
---
## Files in this repo
- `adapter/` – LoRA weights saved with `peft_model.save_pretrained(...)`
- `merged_model/` – fully merged model (base + LoRA) for standalone use
- `threshold.json` – chosen deployment threshold and validation F1
- `calibration.json` – temperature scaling parameters and calibration metrics
- `results.json` – hyperparameters, validation threshold search, test metrics
- `training_log_history.csv` – raw Trainer log history
- `predictions_calib.csv` – calibration-set probabilities and labels
- `predictions_test.csv` – test probabilities and labels
- `figures/` – training and evaluation plots
- `README.md` – this file
---
## Metrics (test set, n=279,241)
Using threshold **0.8697**:
| Metric | Value |
| ---------------------- | ------ |
| AUROC | 0.9985 |
| Average Precision (AP) | 0.9985 |
| F1 | 0.9812 |
| Accuracy | 0.9814 |
| Precision (AI) | 0.9902 |
| Recall (AI) | 0.9724 |
| Precision (Human) | 0.9728 |
| Recall (Human) | 0.9904 |
Confusion matrix (test):
- **True Negatives (Human correctly)**: 138,276
- **False Positives (Human → AI)**: 1,345
- **False Negatives (AI → Human)**: 3,859
- **True Positives (AI correctly)**: 135,761
### Calibration
- **Method:** temperature scaling
- **Temperature (T):** 1.4437
- **Calibration set:** calibration
- Test ECE: 0.0075 → 0.0116 (after calibration)
- Test Brier: 0.0157 → 0.0156 (after calibration)
---
## Plots
### Training & validation
- Learning curves:
![Learning curves](./figures/fig_learning_curves.png)
- Eval metrics over time:
![Eval metrics](./figures/fig_eval_metrics.png)
### Validation set
- ROC:
![ROC (calib)](./figures/fig_roc_calib.png)
- Precision–Recall:
![PR (calib)](./figures/fig_pr_calib.png)
- Calibration curve:
![Calibration (calib)](./figures/fig_calibration_calib.png)
- F1 vs threshold:
![F1 vs threshold (calib)](./figures/fig_threshold_f1_calib.png)
### Test set
- ROC:
![ROC (test)](./figures/fig_roc_test.png)
- Precision–Recall:
![PR (test)](./figures/fig_pr_test.png)
- Calibration curve:
![Calibration (test)](./figures/fig_calibration_test.png)
- Confusion matrix:
![Confusion matrix (test)](./figures/fig_confusion_test.png)
---
## Usage
### Load base + LoRA adapter
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
import torch
import json
base_model_id = "microsoft/deberta-v3-large"
adapter_id = "stealthcode/ai-detection" # or local: "./adapter"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForSequenceClassification.from_pretrained(
base_model_id,
num_labels=1, # single logit for BCEWithLogitsLoss
)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()
```
### Inference with threshold
```python
# load threshold
with open("threshold.json") as f:
thr = json.load(f)["threshold"] # 0.8697
def predict_proba(texts):
enc = tokenizer(
texts,
padding=True,
truncation=True,
max_length=512,
return_tensors="pt",
)
with torch.no_grad():
logits = model(**enc).logits.squeeze(-1)
probs = torch.sigmoid(logits)
return probs.cpu().numpy()
def predict_label(texts, threshold=thr):
probs = predict_proba(texts)
return (probs >= threshold).astype(int)
# example
texts = ["Some example text to classify"]
probs = predict_proba(texts)
labels = predict_label(texts)
print(probs, labels) # label 1 = AI, 0 = Human
```
### Load merged model (no PEFT required)
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch, json
model_dir = "./merged_model"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSequenceClassification.from_pretrained(model_dir)
model.eval()
with open("threshold.json") as f:
thr = json.load(f)["threshold"] # 0.8697
def predict_proba(texts):
enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits.squeeze(-1)
probs = torch.sigmoid(logits)
return probs.cpu().numpy()
```
### Optional: apply temperature scaling to logits
```python
import json
with open("calibration.json") as f:
T = json.load(f)["temperature"] # e.g., 1.4437
def predict_proba_calibrated(texts):
enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits.squeeze(-1)
probs = torch.sigmoid(logits / T)
return probs.cpu().numpy()
```
---
## Notes
- Classifier head is **trainable** together with LoRA layers (unfrozen after applying PEFT).
- **LoRA config:**
- `r=32`, `alpha=128`, `dropout=0.0`
- Target modules: `query_proj`, `key_proj`, `value_proj`
- **Training config:**
- `bf16=True`
- `optim="adamw_torch_fused"`
- `lr_scheduler_type="cosine_with_restarts"`
- `num_train_epochs=2`
- `per_device_train_batch_size=8`, `gradient_accumulation_steps=4`
- `max_grad_norm=0.5`
- Threshold `0.8697` was chosen as the **max-F1** point on the calibration set.
You can adjust it if you prefer fewer false positives or fewer false negatives.