ai-detection / README.md

Model improved

8b1a796 20 days ago

6.61 kB

	---
	language:
	- en
	pipeline_tag: text-classification
	library_name: peft
	base_model: microsoft/deberta-v3-large
	datasets:
	- stealthcode/ai-detection
	tags:
	- lora
	- ai-detection
	- binary-classification
	- deberta-v3-large
	metrics:
	- accuracy
	- f1
	- auroc
	- average_precision
	model-index:
	- name: AI Detector LoRA (DeBERTa-v3-large)
	results:
	- task:
	type: text-classification
	name: AI Text Detection
	dataset:
	name: stealthcode/ai-detection
	type: stealthcode/ai-detection
	metrics:
	- type: auroc
	value: 0.9985
	- type: f1
	value: 0.9812
	- type: accuracy
	value: 0.9814
	---

	# AI Detector LoRA (DeBERTa-v3-large)

	LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.7M English samples
	(`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model.

	- Base model: `microsoft/deberta-v3-large`
	- Task: Binary classification (AI vs Human)
	- Head: Single-logit + `BCEWithLogitsLoss`
	- Adapter type: LoRA (`peft`)
	- Hardware: 8 x RTX 5090, bf16, multi-GPU
	- Final decision threshold: 0.8697 (max-F1 on calibration set)

	---

	## Files in this repo

	- `adapter/` – LoRA weights saved with `peft_model.save_pretrained(...)`
	- `merged_model/` – fully merged model (base + LoRA) for standalone use
	- `threshold.json` – chosen deployment threshold and validation F1
	- `calibration.json` – temperature scaling parameters and calibration metrics
	- `results.json` – hyperparameters, validation threshold search, test metrics
	- `training_log_history.csv` – raw Trainer log history
	- `predictions_calib.csv` – calibration-set probabilities and labels
	- `predictions_test.csv` – test probabilities and labels
	- `figures/` – training and evaluation plots
	- `README.md` – this file

	---

	## Metrics (test set, n=279,241)

	Using threshold 0.8697:

	\| Metric \| Value \|
	\| ---------------------- \| ------ \|
	\| AUROC \| 0.9985 \|
	\| Average Precision (AP) \| 0.9985 \|
	\| F1 \| 0.9812 \|
	\| Accuracy \| 0.9814 \|
	\| Precision (AI) \| 0.9902 \|
	\| Recall (AI) \| 0.9724 \|
	\| Precision (Human) \| 0.9728 \|
	\| Recall (Human) \| 0.9904 \|

	Confusion matrix (test):

	- True Negatives (Human correctly): 138,276
	- False Positives (Human → AI): 1,345
	- False Negatives (AI → Human): 3,859
	- True Positives (AI correctly): 135,761

	### Calibration

	- Method: temperature scaling
	- Temperature (T): 1.4437
	- Calibration set: calibration
	- Test ECE: 0.0075 → 0.0116 (after calibration)
	- Test Brier: 0.0157 → 0.0156 (after calibration)

	---

	## Plots

	### Training & validation

	- Learning curves:

	![Learning curves](./figures/fig_learning_curves.png)

	- Eval metrics over time:

	![Eval metrics](./figures/fig_eval_metrics.png)

	### Validation set

	- ROC:

	![ROC (calib)](./figures/fig_roc_calib.png)

	- Precision–Recall:

	![PR (calib)](./figures/fig_pr_calib.png)

	- Calibration curve:

	![Calibration (calib)](./figures/fig_calibration_calib.png)

	- F1 vs threshold:

	![F1 vs threshold (calib)](./figures/fig_threshold_f1_calib.png)

	### Test set

	- ROC:

	![ROC (test)](./figures/fig_roc_test.png)

	- Precision–Recall:

	![PR (test)](./figures/fig_pr_test.png)

	- Calibration curve:

	![Calibration (test)](./figures/fig_calibration_test.png)

	- Confusion matrix:

	![Confusion matrix (test)](./figures/fig_confusion_test.png)

	---

	## Usage

	### Load base + LoRA adapter

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	from peft import PeftModel
	import torch
	import json

	base_model_id = "microsoft/deberta-v3-large"
	adapter_id = "stealthcode/ai-detection" # or local: "./adapter"

	tokenizer = AutoTokenizer.from_pretrained(base_model_id)

	base_model = AutoModelForSequenceClassification.from_pretrained(
	base_model_id,
	num_labels=1, # single logit for BCEWithLogitsLoss
	)
	model = PeftModel.from_pretrained(base_model, adapter_id)
	model.eval()
	```

	### Inference with threshold

	```python
	# load threshold
	with open("threshold.json") as f:
	thr = json.load(f)["threshold"] # 0.8697

	def predict_proba(texts):
	enc = tokenizer(
	texts,
	padding=True,
	truncation=True,
	max_length=512,
	return_tensors="pt",
	)
	with torch.no_grad():
	logits = model(**enc).logits.squeeze(-1)
	probs = torch.sigmoid(logits)
	return probs.cpu().numpy()

	def predict_label(texts, threshold=thr):
	probs = predict_proba(texts)
	return (probs >= threshold).astype(int)

	# example
	texts = ["Some example text to classify"]
	probs = predict_proba(texts)
	labels = predict_label(texts)
	print(probs, labels) # label 1 = AI, 0 = Human
	```

	### Load merged model (no PEFT required)

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch, json

	model_dir = "./merged_model"
	tokenizer = AutoTokenizer.from_pretrained(model_dir)
	model = AutoModelForSequenceClassification.from_pretrained(model_dir)
	model.eval()

	with open("threshold.json") as f:
	thr = json.load(f)["threshold"] # 0.8697

	def predict_proba(texts):
	enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
	with torch.no_grad():
	logits = model(**enc).logits.squeeze(-1)
	probs = torch.sigmoid(logits)
	return probs.cpu().numpy()
	```

	### Optional: apply temperature scaling to logits

	```python
	import json
	with open("calibration.json") as f:
	T = json.load(f)["temperature"] # e.g., 1.4437

	def predict_proba_calibrated(texts):
	enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
	with torch.no_grad():
	logits = model(**enc).logits.squeeze(-1)
	probs = torch.sigmoid(logits / T)
	return probs.cpu().numpy()
	```

	---

	## Notes

	- Classifier head is trainable together with LoRA layers (unfrozen after applying PEFT).
	- LoRA config:
	- `r=32`, `alpha=128`, `dropout=0.0`
	- Target modules: `query_proj`, `key_proj`, `value_proj`
	- Training config:

	- `bf16=True`
	- `optim="adamw_torch_fused"`
	- `lr_scheduler_type="cosine_with_restarts"`
	- `num_train_epochs=2`
	- `per_device_train_batch_size=8`, `gradient_accumulation_steps=4`
	- `max_grad_norm=0.5`

	- Threshold `0.8697` was chosen as the max-F1 point on the calibration set.
	You can adjust it if you prefer fewer false positives or fewer false negatives.