Model improved
Browse files- README.md +48 -26
- adapter/adapter_config.json +2 -2
- adapter/adapter_model.safetensors +1 -1
- calibration.json +17 -17
- figures/fig_calibration_calib.png +2 -2
- figures/fig_calibration_comparison_calib.png +2 -2
- figures/fig_calibration_comparison_test.png +2 -2
- figures/fig_calibration_test.png +2 -2
- figures/fig_confusion_test.png +2 -2
- figures/fig_eval_metrics.png +2 -2
- figures/fig_learning_curves.png +2 -2
- figures/fig_pr_calib.png +2 -2
- figures/fig_pr_test.png +2 -2
- figures/fig_roc_calib.png +2 -2
- figures/fig_roc_test.png +2 -2
- figures/fig_threshold_f1_calib.png +2 -2
- merged_model/model.safetensors +1 -1
- predictions_calib.csv +0 -0
- predictions_test.csv +0 -0
- results.json +62 -62
- threshold.json +4 -4
- training_log_history.csv +31 -51
README.md
CHANGED
|
@@ -16,19 +16,35 @@ metrics:
|
|
| 16 |
- f1
|
| 17 |
- auroc
|
| 18 |
- average_precision
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
---
|
| 20 |
|
| 21 |
# AI Detector LoRA (DeBERTa-v3-large)
|
| 22 |
|
| 23 |
-
LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.
|
| 24 |
(`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model.
|
| 25 |
|
| 26 |
- **Base model:** `microsoft/deberta-v3-large`
|
| 27 |
- **Task:** Binary classification (AI vs Human)
|
| 28 |
- **Head:** Single-logit + `BCEWithLogitsLoss`
|
| 29 |
- **Adapter type:** LoRA (`peft`)
|
| 30 |
-
- **Hardware:**
|
| 31 |
-
- **Final decision threshold:** **0.
|
| 32 |
|
| 33 |
---
|
| 34 |
|
|
@@ -47,34 +63,35 @@ LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.3M Englis
|
|
| 47 |
|
| 48 |
---
|
| 49 |
|
| 50 |
-
## Metrics (test set)
|
| 51 |
|
| 52 |
-
Using threshold **0.
|
| 53 |
|
| 54 |
| Metric | Value |
|
| 55 |
| ---------------------- | ------ |
|
| 56 |
-
| AUROC | 0.
|
| 57 |
-
| Average Precision (AP) | 0.
|
| 58 |
-
| F1 | 0.
|
| 59 |
-
| Accuracy | 0.
|
| 60 |
-
| Precision
|
| 61 |
-
| Recall
|
| 62 |
-
|
|
|
|
|
| 63 |
|
| 64 |
Confusion matrix (test):
|
| 65 |
|
| 66 |
-
- **True Negatives (Human correctly)**:
|
| 67 |
-
- **False Positives (Human → AI)**:
|
| 68 |
-
- **False Negatives (AI → Human)**: 3,
|
| 69 |
-
- **True Positives (AI correctly)**:
|
| 70 |
|
| 71 |
### Calibration
|
| 72 |
|
| 73 |
- **Method:** temperature scaling
|
| 74 |
-
- **Temperature (T):** 1.
|
| 75 |
- **Calibration set:** calibration
|
| 76 |
-
- Test ECE: 0.
|
| 77 |
-
- Test Brier: 0.
|
| 78 |
|
| 79 |
---
|
| 80 |
|
|
@@ -156,7 +173,7 @@ model.eval()
|
|
| 156 |
```python
|
| 157 |
# load threshold
|
| 158 |
with open("threshold.json") as f:
|
| 159 |
-
thr = json.load(f)["threshold"] # 0.
|
| 160 |
|
| 161 |
def predict_proba(texts):
|
| 162 |
enc = tokenizer(
|
|
@@ -194,7 +211,7 @@ model = AutoModelForSequenceClassification.from_pretrained(model_dir)
|
|
| 194 |
model.eval()
|
| 195 |
|
| 196 |
with open("threshold.json") as f:
|
| 197 |
-
thr = json.load(f)["threshold"] # 0.
|
| 198 |
|
| 199 |
def predict_proba(texts):
|
| 200 |
enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
|
|
@@ -209,7 +226,7 @@ def predict_proba(texts):
|
|
| 209 |
```python
|
| 210 |
import json
|
| 211 |
with open("calibration.json") as f:
|
| 212 |
-
T = json.load(f)["temperature"] # e.g., 1.
|
| 213 |
|
| 214 |
def predict_proba_calibrated(texts):
|
| 215 |
enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
|
|
@@ -224,12 +241,17 @@ def predict_proba_calibrated(texts):
|
|
| 224 |
## Notes
|
| 225 |
|
| 226 |
- Classifier head is **trainable** together with LoRA layers (unfrozen after applying PEFT).
|
| 227 |
-
-
|
|
|
|
|
|
|
|
|
|
| 228 |
|
| 229 |
- `bf16=True`
|
| 230 |
- `optim="adamw_torch_fused"`
|
| 231 |
-
-
|
| 232 |
-
-
|
|
|
|
|
|
|
| 233 |
|
| 234 |
-
- Threshold `0.
|
| 235 |
You can adjust it if you prefer fewer false positives or fewer false negatives.
|
|
|
|
| 16 |
- f1
|
| 17 |
- auroc
|
| 18 |
- average_precision
|
| 19 |
+
model-index:
|
| 20 |
+
- name: AI Detector LoRA (DeBERTa-v3-large)
|
| 21 |
+
results:
|
| 22 |
+
- task:
|
| 23 |
+
type: text-classification
|
| 24 |
+
name: AI Text Detection
|
| 25 |
+
dataset:
|
| 26 |
+
name: stealthcode/ai-detection
|
| 27 |
+
type: stealthcode/ai-detection
|
| 28 |
+
metrics:
|
| 29 |
+
- type: auroc
|
| 30 |
+
value: 0.9985
|
| 31 |
+
- type: f1
|
| 32 |
+
value: 0.9812
|
| 33 |
+
- type: accuracy
|
| 34 |
+
value: 0.9814
|
| 35 |
---
|
| 36 |
|
| 37 |
# AI Detector LoRA (DeBERTa-v3-large)
|
| 38 |
|
| 39 |
+
LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.7M English samples
|
| 40 |
(`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model.
|
| 41 |
|
| 42 |
- **Base model:** `microsoft/deberta-v3-large`
|
| 43 |
- **Task:** Binary classification (AI vs Human)
|
| 44 |
- **Head:** Single-logit + `BCEWithLogitsLoss`
|
| 45 |
- **Adapter type:** LoRA (`peft`)
|
| 46 |
+
- **Hardware:** 8 x RTX 5090, bf16, multi-GPU
|
| 47 |
+
- **Final decision threshold:** **0.8697** (max-F1 on calibration set)
|
| 48 |
|
| 49 |
---
|
| 50 |
|
|
|
|
| 63 |
|
| 64 |
---
|
| 65 |
|
| 66 |
+
## Metrics (test set, n=279,241)
|
| 67 |
|
| 68 |
+
Using threshold **0.8697**:
|
| 69 |
|
| 70 |
| Metric | Value |
|
| 71 |
| ---------------------- | ------ |
|
| 72 |
+
| AUROC | 0.9985 |
|
| 73 |
+
| Average Precision (AP) | 0.9985 |
|
| 74 |
+
| F1 | 0.9812 |
|
| 75 |
+
| Accuracy | 0.9814 |
|
| 76 |
+
| Precision (AI) | 0.9902 |
|
| 77 |
+
| Recall (AI) | 0.9724 |
|
| 78 |
+
| Precision (Human) | 0.9728 |
|
| 79 |
+
| Recall (Human) | 0.9904 |
|
| 80 |
|
| 81 |
Confusion matrix (test):
|
| 82 |
|
| 83 |
+
- **True Negatives (Human correctly)**: 138,276
|
| 84 |
+
- **False Positives (Human → AI)**: 1,345
|
| 85 |
+
- **False Negatives (AI → Human)**: 3,859
|
| 86 |
+
- **True Positives (AI correctly)**: 135,761
|
| 87 |
|
| 88 |
### Calibration
|
| 89 |
|
| 90 |
- **Method:** temperature scaling
|
| 91 |
+
- **Temperature (T):** 1.4437
|
| 92 |
- **Calibration set:** calibration
|
| 93 |
+
- Test ECE: 0.0075 → 0.0116 (after calibration)
|
| 94 |
+
- Test Brier: 0.0157 → 0.0156 (after calibration)
|
| 95 |
|
| 96 |
---
|
| 97 |
|
|
|
|
| 173 |
```python
|
| 174 |
# load threshold
|
| 175 |
with open("threshold.json") as f:
|
| 176 |
+
thr = json.load(f)["threshold"] # 0.8697
|
| 177 |
|
| 178 |
def predict_proba(texts):
|
| 179 |
enc = tokenizer(
|
|
|
|
| 211 |
model.eval()
|
| 212 |
|
| 213 |
with open("threshold.json") as f:
|
| 214 |
+
thr = json.load(f)["threshold"] # 0.8697
|
| 215 |
|
| 216 |
def predict_proba(texts):
|
| 217 |
enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
|
|
|
|
| 226 |
```python
|
| 227 |
import json
|
| 228 |
with open("calibration.json") as f:
|
| 229 |
+
T = json.load(f)["temperature"] # e.g., 1.4437
|
| 230 |
|
| 231 |
def predict_proba_calibrated(texts):
|
| 232 |
enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
|
|
|
|
| 241 |
## Notes
|
| 242 |
|
| 243 |
- Classifier head is **trainable** together with LoRA layers (unfrozen after applying PEFT).
|
| 244 |
+
- **LoRA config:**
|
| 245 |
+
- `r=32`, `alpha=128`, `dropout=0.0`
|
| 246 |
+
- Target modules: `query_proj`, `key_proj`, `value_proj`
|
| 247 |
+
- **Training config:**
|
| 248 |
|
| 249 |
- `bf16=True`
|
| 250 |
- `optim="adamw_torch_fused"`
|
| 251 |
+
- `lr_scheduler_type="cosine_with_restarts"`
|
| 252 |
+
- `num_train_epochs=2`
|
| 253 |
+
- `per_device_train_batch_size=8`, `gradient_accumulation_steps=4`
|
| 254 |
+
- `max_grad_norm=0.5`
|
| 255 |
|
| 256 |
+
- Threshold `0.8697` was chosen as the **max-F1** point on the calibration set.
|
| 257 |
You can adjust it if you prefer fewer false positives or fewer false negatives.
|
adapter/adapter_config.json
CHANGED
|
@@ -34,9 +34,9 @@
|
|
| 34 |
"rank_pattern": {},
|
| 35 |
"revision": null,
|
| 36 |
"target_modules": [
|
|
|
|
| 37 |
"query_proj",
|
| 38 |
-
"value_proj"
|
| 39 |
-
"key_proj"
|
| 40 |
],
|
| 41 |
"target_parameters": null,
|
| 42 |
"task_type": "SEQ_CLS",
|
|
|
|
| 34 |
"rank_pattern": {},
|
| 35 |
"revision": null,
|
| 36 |
"target_modules": [
|
| 37 |
+
"key_proj",
|
| 38 |
"query_proj",
|
| 39 |
+
"value_proj"
|
|
|
|
| 40 |
],
|
| 41 |
"target_parameters": null,
|
| 42 |
"task_type": "SEQ_CLS",
|
adapter/adapter_model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 23099012
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:78566bec1ea60ab8451693e143cefe13dea659a75daf9c426bc8269c7dfce4b1
|
| 3 |
size 23099012
|
calibration.json
CHANGED
|
@@ -1,26 +1,26 @@
|
|
| 1 |
{
|
| 2 |
-
"temperature": 1.
|
| 3 |
"method": "temperature_scaling",
|
| 4 |
"calibration_set": "calibration",
|
| 5 |
"calibration_metrics": {
|
| 6 |
-
"temperature": 1.
|
| 7 |
"optimization_method": "LBFGS_logspace",
|
| 8 |
-
"uncalibrated_nll": 0.
|
| 9 |
-
"calibrated_nll": 0.
|
| 10 |
-
"uncalibrated_ece": 0.
|
| 11 |
-
"calibrated_ece": 0.
|
| 12 |
-
"uncalibrated_brier": 0.
|
| 13 |
-
"calibrated_brier": 0.
|
| 14 |
-
"nll_improvement": 0.
|
| 15 |
-
"ece_improvement": -0.
|
| 16 |
-
"brier_improvement":
|
| 17 |
},
|
| 18 |
"test_metrics": {
|
| 19 |
-
"ece_before": 0.
|
| 20 |
-
"ece_after": 0.
|
| 21 |
-
"ece_improvement": -0.
|
| 22 |
-
"brier_before": 0.
|
| 23 |
-
"brier_after": 0.
|
| 24 |
-
"brier_improvement": -
|
| 25 |
}
|
| 26 |
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"temperature": 1.4436575174331665,
|
| 3 |
"method": "temperature_scaling",
|
| 4 |
"calibration_set": "calibration",
|
| 5 |
"calibration_metrics": {
|
| 6 |
+
"temperature": 1.4436575174331665,
|
| 7 |
"optimization_method": "LBFGS_logspace",
|
| 8 |
+
"uncalibrated_nll": 0.057230830731130305,
|
| 9 |
+
"calibrated_nll": 0.05340311260808736,
|
| 10 |
+
"uncalibrated_ece": 0.007595386161633095,
|
| 11 |
+
"calibrated_ece": 0.011707928851842823,
|
| 12 |
+
"uncalibrated_brier": 0.01589206575792085,
|
| 13 |
+
"calibrated_brier": 0.015775692446082124,
|
| 14 |
+
"nll_improvement": 0.0038277181230429447,
|
| 15 |
+
"ece_improvement": -0.004112542690209728,
|
| 16 |
+
"brier_improvement": 0.00011637331183872793
|
| 17 |
},
|
| 18 |
"test_metrics": {
|
| 19 |
+
"ece_before": 0.007462335961689493,
|
| 20 |
+
"ece_after": 0.011600581100766194,
|
| 21 |
+
"ece_improvement": -0.004138245139076701,
|
| 22 |
+
"brier_before": 0.015727129447539786,
|
| 23 |
+
"brier_after": 0.0156334356493489,
|
| 24 |
+
"brier_improvement": 9.369379819088725e-05
|
| 25 |
}
|
| 26 |
}
|
figures/fig_calibration_calib.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
figures/fig_calibration_comparison_calib.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
figures/fig_calibration_comparison_test.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
figures/fig_calibration_test.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
figures/fig_confusion_test.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
figures/fig_eval_metrics.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
figures/fig_learning_curves.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
figures/fig_pr_calib.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
figures/fig_pr_test.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
figures/fig_roc_calib.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
figures/fig_roc_test.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
figures/fig_threshold_f1_calib.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
merged_model/model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1740300340
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:43dabe52c25c41912eef74ac37416aa2c7793f42fc8193906f6bc7153a877962
|
| 3 |
size 1740300340
|
predictions_calib.csv
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
predictions_test.csv
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
results.json
CHANGED
|
@@ -16,105 +16,105 @@
|
|
| 16 |
"key_proj",
|
| 17 |
"value_proj"
|
| 18 |
],
|
| 19 |
-
"learning_rate": 0.
|
| 20 |
"lr_scheduler_type": "cosine_with_restarts",
|
| 21 |
"max_grad_norm": 0.5,
|
| 22 |
"optim": "adamw_torch_fused"
|
| 23 |
},
|
| 24 |
"threshold_optimization": {
|
| 25 |
"max_f1": {
|
| 26 |
-
"threshold": 0.
|
| 27 |
"metrics": {
|
| 28 |
-
"threshold": 0.
|
| 29 |
-
"auroc": 0.
|
| 30 |
-
"average_precision": 0.
|
| 31 |
-
"f1": 0.
|
| 32 |
-
"accuracy": 0.
|
| 33 |
-
"precision": 0.
|
| 34 |
-
"recall": 0.
|
| 35 |
-
"specificity": 0.
|
| 36 |
-
"precision_human": 0.
|
| 37 |
-
"recall_human": 0.
|
| 38 |
-
"precision_ai": 0.
|
| 39 |
-
"recall_ai": 0.
|
| 40 |
"confusion_matrix": {
|
| 41 |
-
"true_negative":
|
| 42 |
-
"false_positive":
|
| 43 |
-
"false_negative":
|
| 44 |
-
"true_positive":
|
| 45 |
}
|
| 46 |
}
|
| 47 |
},
|
| 48 |
"precision_at_95recall": {
|
| 49 |
-
"threshold":
|
| 50 |
"metrics": {
|
| 51 |
-
"threshold":
|
| 52 |
-
"auroc": 0.
|
| 53 |
-
"average_precision": 0.
|
| 54 |
-
"f1": 0.
|
| 55 |
-
"accuracy": 0.
|
| 56 |
-
"precision": 0.
|
| 57 |
"recall": 1.0,
|
| 58 |
"specificity": 0.0,
|
| 59 |
"precision_human": 0.0,
|
| 60 |
"recall_human": 0.0,
|
| 61 |
-
"precision_ai": 0.
|
| 62 |
"recall_ai": 1.0,
|
| 63 |
"confusion_matrix": {
|
| 64 |
"true_negative": 0,
|
| 65 |
-
"false_positive":
|
| 66 |
"false_negative": 0,
|
| 67 |
-
"true_positive":
|
| 68 |
}
|
| 69 |
}
|
| 70 |
}
|
| 71 |
},
|
| 72 |
"calibration": {
|
| 73 |
-
"temperature": 1.
|
| 74 |
"method": "temperature_scaling",
|
| 75 |
"calibration_set": "calibration",
|
| 76 |
"calibration_metrics": {
|
| 77 |
-
"temperature": 1.
|
| 78 |
"optimization_method": "LBFGS_logspace",
|
| 79 |
-
"uncalibrated_nll": 0.
|
| 80 |
-
"calibrated_nll": 0.
|
| 81 |
-
"uncalibrated_ece": 0.
|
| 82 |
-
"calibrated_ece": 0.
|
| 83 |
-
"uncalibrated_brier": 0.
|
| 84 |
-
"calibrated_brier": 0.
|
| 85 |
-
"nll_improvement": 0.
|
| 86 |
-
"ece_improvement": -0.
|
| 87 |
-
"brier_improvement":
|
| 88 |
},
|
| 89 |
"test_metrics": {
|
| 90 |
-
"ece_before": 0.
|
| 91 |
-
"ece_after": 0.
|
| 92 |
-
"ece_improvement": -0.
|
| 93 |
-
"brier_before": 0.
|
| 94 |
-
"brier_after": 0.
|
| 95 |
-
"brier_improvement": -
|
| 96 |
}
|
| 97 |
},
|
| 98 |
"test_metrics": {
|
| 99 |
-
"threshold": 0.
|
| 100 |
-
"auroc": 0.
|
| 101 |
-
"average_precision": 0.
|
| 102 |
-
"f1": 0.
|
| 103 |
-
"accuracy": 0.
|
| 104 |
-
"precision": 0.
|
| 105 |
-
"recall": 0.
|
| 106 |
-
"specificity": 0.
|
| 107 |
-
"precision_human": 0.
|
| 108 |
-
"recall_human": 0.
|
| 109 |
-
"precision_ai": 0.
|
| 110 |
-
"recall_ai": 0.
|
| 111 |
"confusion_matrix": {
|
| 112 |
-
"true_negative":
|
| 113 |
-
"false_positive":
|
| 114 |
-
"false_negative":
|
| 115 |
-
"true_positive":
|
| 116 |
}
|
| 117 |
},
|
| 118 |
-
"timestamp": "
|
| 119 |
"seed": 42
|
| 120 |
}
|
|
|
|
| 16 |
"key_proj",
|
| 17 |
"value_proj"
|
| 18 |
],
|
| 19 |
+
"learning_rate": 0.00014057133690327707,
|
| 20 |
"lr_scheduler_type": "cosine_with_restarts",
|
| 21 |
"max_grad_norm": 0.5,
|
| 22 |
"optim": "adamw_torch_fused"
|
| 23 |
},
|
| 24 |
"threshold_optimization": {
|
| 25 |
"max_f1": {
|
| 26 |
+
"threshold": 0.869714617729187,
|
| 27 |
"metrics": {
|
| 28 |
+
"threshold": 0.869714617729187,
|
| 29 |
+
"auroc": 0.9984783120401353,
|
| 30 |
+
"average_precision": 0.9985350724478098,
|
| 31 |
+
"f1": 0.9809629649707713,
|
| 32 |
+
"accuracy": 0.9811363829663419,
|
| 33 |
+
"precision": 0.9900603673104631,
|
| 34 |
+
"recall": 0.9720312276178198,
|
| 35 |
+
"specificity": 0.9902414567983026,
|
| 36 |
+
"precision_human": 0.9725316756205432,
|
| 37 |
+
"recall_human": 0.9902414567983026,
|
| 38 |
+
"precision_ai": 0.9900603673104631,
|
| 39 |
+
"recall_ai": 0.9720312276178198,
|
| 40 |
"confusion_matrix": {
|
| 41 |
+
"true_negative": 110607,
|
| 42 |
+
"false_positive": 1090,
|
| 43 |
+
"false_negative": 3124,
|
| 44 |
+
"true_positive": 108572
|
| 45 |
}
|
| 46 |
}
|
| 47 |
},
|
| 48 |
"precision_at_95recall": {
|
| 49 |
+
"threshold": 3.6534821390432626e-08,
|
| 50 |
"metrics": {
|
| 51 |
+
"threshold": 3.6534821390432626e-08,
|
| 52 |
+
"auroc": 0.9984783120401353,
|
| 53 |
+
"average_precision": 0.9985350724478098,
|
| 54 |
+
"f1": 0.6666646771454748,
|
| 55 |
+
"accuracy": 0.49999776179199884,
|
| 56 |
+
"precision": 0.49999776179199884,
|
| 57 |
"recall": 1.0,
|
| 58 |
"specificity": 0.0,
|
| 59 |
"precision_human": 0.0,
|
| 60 |
"recall_human": 0.0,
|
| 61 |
+
"precision_ai": 0.49999776179199884,
|
| 62 |
"recall_ai": 1.0,
|
| 63 |
"confusion_matrix": {
|
| 64 |
"true_negative": 0,
|
| 65 |
+
"false_positive": 111697,
|
| 66 |
"false_negative": 0,
|
| 67 |
+
"true_positive": 111696
|
| 68 |
}
|
| 69 |
}
|
| 70 |
}
|
| 71 |
},
|
| 72 |
"calibration": {
|
| 73 |
+
"temperature": 1.4436575174331665,
|
| 74 |
"method": "temperature_scaling",
|
| 75 |
"calibration_set": "calibration",
|
| 76 |
"calibration_metrics": {
|
| 77 |
+
"temperature": 1.4436575174331665,
|
| 78 |
"optimization_method": "LBFGS_logspace",
|
| 79 |
+
"uncalibrated_nll": 0.057230830731130305,
|
| 80 |
+
"calibrated_nll": 0.05340311260808736,
|
| 81 |
+
"uncalibrated_ece": 0.007595386161633095,
|
| 82 |
+
"calibrated_ece": 0.011707928851842823,
|
| 83 |
+
"uncalibrated_brier": 0.01589206575792085,
|
| 84 |
+
"calibrated_brier": 0.015775692446082124,
|
| 85 |
+
"nll_improvement": 0.0038277181230429447,
|
| 86 |
+
"ece_improvement": -0.004112542690209728,
|
| 87 |
+
"brier_improvement": 0.00011637331183872793
|
| 88 |
},
|
| 89 |
"test_metrics": {
|
| 90 |
+
"ece_before": 0.007462335961689493,
|
| 91 |
+
"ece_after": 0.011600581100766194,
|
| 92 |
+
"ece_improvement": -0.004138245139076701,
|
| 93 |
+
"brier_before": 0.015727129447539786,
|
| 94 |
+
"brier_after": 0.0156334356493489,
|
| 95 |
+
"brier_improvement": 9.369379819088725e-05
|
| 96 |
}
|
| 97 |
},
|
| 98 |
"test_metrics": {
|
| 99 |
+
"threshold": 0.869714617729187,
|
| 100 |
+
"auroc": 0.9984910666612247,
|
| 101 |
+
"average_precision": 0.9985476887515279,
|
| 102 |
+
"f1": 0.981194394455165,
|
| 103 |
+
"accuracy": 0.9813637682145531,
|
| 104 |
+
"precision": 0.9901900719151605,
|
| 105 |
+
"recall": 0.972360693310414,
|
| 106 |
+
"specificity": 0.9903667786364515,
|
| 107 |
+
"precision_human": 0.9728497555141239,
|
| 108 |
+
"recall_human": 0.9903667786364515,
|
| 109 |
+
"precision_ai": 0.9901900719151605,
|
| 110 |
+
"recall_ai": 0.972360693310414,
|
| 111 |
"confusion_matrix": {
|
| 112 |
+
"true_negative": 138276,
|
| 113 |
+
"false_positive": 1345,
|
| 114 |
+
"false_negative": 3859,
|
| 115 |
+
"true_positive": 135761
|
| 116 |
}
|
| 117 |
},
|
| 118 |
+
"timestamp": "20251124_170935",
|
| 119 |
"seed": 42
|
| 120 |
}
|
threshold.json
CHANGED
|
@@ -1,9 +1,9 @@
|
|
| 1 |
{
|
| 2 |
-
"threshold": 0.
|
| 3 |
"method": "max_f1",
|
| 4 |
-
"calibration_f1": 0.
|
| 5 |
"alternative_thresholds": {
|
| 6 |
-
"max_f1": 0.
|
| 7 |
-
"precision_at_95recall":
|
| 8 |
}
|
| 9 |
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"threshold": 0.869714617729187,
|
| 3 |
"method": "max_f1",
|
| 4 |
+
"calibration_f1": 0.9809629649707713,
|
| 5 |
"alternative_thresholds": {
|
| 6 |
+
"max_f1": 0.869714617729187,
|
| 7 |
+
"precision_at_95recall": 3.6534821390432626e-08
|
| 8 |
}
|
| 9 |
}
|
training_log_history.csv
CHANGED
|
@@ -1,52 +1,32 @@
|
|
| 1 |
loss,grad_norm,learning_rate,epoch,step,eval_loss,eval_auroc,eval_ap,eval_f1,eval_max_f1,eval_best_threshold,eval_accuracy,eval_precision_human,eval_recall_human,eval_precision_ai,eval_recall_ai,eval_runtime,eval_samples_per_second,eval_steps_per_second,train_runtime,train_samples_per_second,train_steps_per_second,total_flos,train_loss
|
| 2 |
-
0.
|
| 3 |
-
,,,0.
|
| 4 |
-
0.
|
| 5 |
-
,,,0.
|
| 6 |
-
0.
|
| 7 |
-
,,,0.
|
| 8 |
-
0.
|
| 9 |
-
,,,0.
|
| 10 |
-
0.
|
| 11 |
-
,,,0.
|
| 12 |
-
0.
|
| 13 |
-
,,,0.
|
| 14 |
-
0.
|
| 15 |
-
,,,0.
|
| 16 |
-
0.
|
| 17 |
-
,,,
|
| 18 |
-
0.
|
| 19 |
-
,,,
|
| 20 |
-
0.
|
| 21 |
-
,,,
|
| 22 |
-
0.
|
| 23 |
-
,,,
|
| 24 |
-
0.
|
| 25 |
-
,,,
|
| 26 |
-
0.
|
| 27 |
-
,,,1.
|
| 28 |
-
0.
|
| 29 |
-
,,,1.
|
| 30 |
-
0.
|
| 31 |
-
,,,1.
|
| 32 |
-
0
|
| 33 |
-
,,,1.2453491087413404,8000,0.07455883920192719,0.9975485047929438,0.9972585615075422,0.9726402255038238,0.9758848582753115,0.9136765599250793,0.9752655591848888,0.9750665582603982,0.9798072841157577,0.97550810243656,0.969789161571968,252.079,906.01,14.158,,,,,
|
| 34 |
-
0.0521,0.4137882590293884,4.351849838388919e-05,1.3231882929866896,8500,,,,,,,,,,,,,,,,,,,
|
| 35 |
-
,,,1.3231882929866896,8500,0.07783409208059311,0.9975517384131479,0.9972440455972116,0.9723962677736611,0.9749894769810089,0.9111796617507935,0.975077281444572,0.9738866219645087,0.9807043821637684,0.9765353333658013,0.9682921411255662,252.489,904.538,14.135,,,,,
|
| 36 |
-
0.0526,0.3627403974533081,3.4876244727530656e-05,1.4010274772320386,9000,,,,,,,,,,,,,,,,,,,
|
| 37 |
-
,,,1.4010274772320386,9000,0.0745643824338913,0.9975749102093014,0.9972690713005937,0.9723900247831475,0.9756176280729699,0.9046505093574524,0.9750247388193672,0.975305785387727,0.9791024213637493,0.9746829301427421,0.9701078820541053,252.119,905.866,14.156,,,,,
|
| 38 |
-
0.052,0.7762022614479065,2.693721991111627e-05,1.4788666614773878,9500,,,,,,,,,,,,,,,,,,,
|
| 39 |
-
,,,1.4788666614773878,9500,0.07452459633350372,0.9977328074904859,0.9974410894856215,0.9730508384452147,0.975759591492858,0.9124361872673035,0.9755895720403177,0.976881986981624,0.978501686063742,0.9740254712964038,0.9720781541254986,252.2439,905.417,14.149,,,,,
|
| 40 |
-
0.0507,0.3820905387401581,1.9831740005311437e-05,1.5567058457227367,10000,,,,,,,,,,,,,,,,,,,
|
| 41 |
-
,,,1.5567058457227367,10000,0.08428945392370224,0.9975299683849125,0.9972155352602694,0.9700319035460719,0.9748023112122028,0.9334307909011841,0.9728135700086695,0.9755487501803781,0.9746970291636964,0.9695218431614696,0.9705425008933831,252.2555,905.376,14.148,,,,,
|
| 42 |
-
0.0515,0.36162489652633667,1.3676438758331925e-05,1.634545029968086,10500,,,,,,,,,,,,,,,,,,,
|
| 43 |
-
,,,1.634545029968086,10500,0.07442453503608704,0.9977306493713021,0.997440190110606,0.9729125537103704,0.9761234031726127,0.9136765599250793,0.9754888653420087,0.9760021075993326,0.9792385880317509,0.9748654545454546,0.9709674615362327,252.2157,905.519,14.151,,,,,
|
| 44 |
-
0.0512,0.5945746302604675,8.572353097359252e-06,1.7123842142134351,11000,,,,,,,,,,,,,,,,,,,
|
| 45 |
-
,,,1.7123842142134351,11000,0.07524814456701279,0.9977685405428793,0.9974825248399177,0.9727850366057699,0.9761874492694766,0.9207897186279297,0.975357508778997,0.9763919857424856,0.978581784103743,0.9741039521978714,0.9714696877505095,252.2195,905.505,14.15,,,,,
|
| 46 |
-
0.0509,0.9120739698410034,4.603264645836933e-06,1.7902233984587843,11500,,,,,,,,,,,,,,,,,,,
|
| 47 |
-
,,,1.7902233984587843,11500,0.06851697713136673,0.9979216958335029,0.997654853113855,0.9747033543129303,0.9769461620177022,0.9124361872673035,0.9771395794838563,0.9764685264549843,0.9818417743317821,0.9779586201532299,0.9714696877505095,252.8119,903.383,14.117,,,,,
|
| 48 |
-
0.048,0.6627203822135925,1.834324480010042e-06,1.8680625827041333,12000,,,,,,,,,,,,,,,,,,,
|
| 49 |
-
,,,1.8680625827041333,12000,0.07031949609518051,0.9978716433195517,0.9975978286587303,0.9741437319971922,0.9765590576618682,0.9241418242454529,0.9766141532318093,0.9766512444160816,0.980664333143768,0.9765690214120707,0.9717304590540763,252.5244,904.411,14.133,,,,,
|
| 50 |
-
0.0488,0.7994762659072876,3.1098369880601253e-07,1.9459017669494822,12500,,,,,,,,,,,,,,,,,,,
|
| 51 |
-
,,,1.9459017669494822,12500,0.07445533573627472,0.9977876009119543,0.997504157410398,0.972836356080046,0.9761660160257996,0.929440438747406,0.9753881586436997,0.9769268924908395,0.9780771664517369,0.9735279325286289,0.9721457615004974,252.7495,903.606,14.121,,,,,
|
| 52 |
-
,,,2.0,12848,,,,,,,,,,,,,,,15236.3158,215.85,0.843,3.0700924448014336e+18,0.07117979241486355
|
|
|
|
| 1 |
loss,grad_norm,learning_rate,epoch,step,eval_loss,eval_auroc,eval_ap,eval_f1,eval_max_f1,eval_best_threshold,eval_accuracy,eval_precision_human,eval_recall_human,eval_precision_ai,eval_recall_ai,eval_runtime,eval_samples_per_second,eval_steps_per_second,train_runtime,train_samples_per_second,train_steps_per_second,total_flos,train_loss
|
| 2 |
+
0.1959,1.1447575092315674,0.00014045785291760075,0.1273277096928219,1000,,,,,,,,,,,,,,,,,,,
|
| 3 |
+
,,,0.1273277096928219,1000,0.10616814345121384,0.9950148191129459,0.9952899350299439,0.962311290288237,0.9679296498026972,0.8289388418197632,0.961968199398367,0.9705350238550314,0.9528649190660364,0.953707741871949,0.9710714797306976,302.0769,924.4,14.447,,,,,
|
| 4 |
+
0.0793,0.9442410469055176,0.00013808916945651123,0.2546554193856438,2000,,,,,,,,,,,,,,,,,,,
|
| 5 |
+
,,,0.2546554193856438,2000,0.06762869656085968,0.99699523578959,0.9971745448574031,0.9747652814641683,0.9751815105140932,0.6486889719963074,0.9749068901303538,0.9696360146472409,0.9805185503509526,0.9802974220045925,0.969295229909755,301.8486,925.1,14.458,,,,,
|
| 6 |
+
0.066,1.0335582494735718,0.0001327492586182366,0.3819831290784657,3000,,,,,,,,,,,,,,,,,,,
|
| 7 |
+
,,,0.3819831290784657,3000,0.10242880135774612,0.9966748220521451,0.9968401495248937,0.9646262449071978,0.9739805993788537,0.9433475732803345,0.96418134937688,0.9761586387280689,0.9516043546769803,0.9527918285219238,0.9767583440767799,301.8298,925.157,14.458,,,,,
|
| 8 |
+
0.0604,0.49253857135772705,0.00012467212143575104,0.5093108387712876,4000,,,,,,,,,,,,,,,,,,,
|
| 9 |
+
,,,0.5093108387712876,4000,0.07713142782449722,0.9974895074288549,0.9976087876172619,0.9726526500782271,0.9769228267815947,0.896251380443573,0.9725827245380319,0.9750118785365643,0.9700257842715944,0.970178288939245,0.9751396648044692,302.0624,924.445,14.447,,,,,
|
| 10 |
+
0.0562,1.2046138048171997,0.00011421170734780347,0.6366385484641095,5000,,,,,,,,,,,,,,,,,,,
|
| 11 |
+
,,,0.6366385484641095,5000,0.06674948334693909,0.9978946111553613,0.9979933640808432,0.9756314294006572,0.9780440997353064,0.8354835510253906,0.9756446067898582,0.9751307495832469,0.9761853602635725,0.9761595766801224,0.9751038533161438,301.8554,925.079,14.457,,,,,
|
| 12 |
+
0.0536,0.8311429619789124,0.0001018264037275262,0.7639662581569314,6000,,,,,,,,,,,,,,,,,,,
|
| 13 |
+
,,,0.7639662581569314,6000,0.07356549799442291,0.9977181868917908,0.9978107356852736,0.973830491983889,0.9777105038793896,0.8824278712272644,0.9738003151410972,0.9748955476747692,0.9726471852170177,0.9727101227651456,0.9749534450651769,301.9925,924.659,14.451,,,,,
|
| 14 |
+
0.0519,0.4782758951187134,8.80589488206726e-05,0.8912939678497533,7000,,,,,,,,,,,,,,,,,,,
|
| 15 |
+
,,,0.8912939678497533,7000,0.05816827714443207,0.9981127635745546,0.9981834468144306,0.9776137884992947,0.9801179559592376,0.7745833992958069,0.9776643747314139,0.9755153260939315,0.97992407964475,0.9798329364194289,0.9754046698180776,302.2733,923.8,14.437,,,,,
|
| 16 |
+
0.0477,0.4090639054775238,7.351264833162605e-05,1.018589845615152,8000,,,,,,,,,,,,,,,,,,,
|
| 17 |
+
,,,1.018589845615152,8000,0.08029133081436157,0.9978708245246101,0.9979566865663798,0.9732345002905806,0.9780694791225678,0.9196425676345825,0.9731163157140811,0.9773316857797336,0.9687007592035525,0.9689747467217595,0.9775318722246097,302.0748,924.407,14.447,,,,,
|
| 18 |
+
0.0429,0.7672129273414612,5.882493787372914e-05,1.145917555307974,9000,,,,,,,,,,,,,,,,,,,
|
| 19 |
+
,,,1.145917555307974,9000,0.08352840691804886,0.9977365428060092,0.9978319247026995,0.9719582883547271,0.9772799491574283,0.9416541457176208,0.9718127775390345,0.9767606806059158,0.9666236928806761,0.9669665199299633,0.9770018621973929,302.0739,924.41,14.447,,,,,
|
| 20 |
+
0.0405,0.9896750450134277,4.4639449807758265e-05,1.2732452650007957,10000,,,,,,,,,,,,,,,,,,,
|
| 21 |
+
,,,1.2732452650007957,10000,0.061068352311849594,0.9983874067313616,0.9984549393455515,0.9787944744951835,0.9803829377847685,0.8933094143867493,0.9788568972926515,0.97605417182894,0.9818006016330039,0.9816928197812649,0.9759131929522991,302.0004,924.635,14.45,,,,,
|
| 22 |
+
0.0406,0.47202062606811523,3.157780853180043e-05,1.4005729746936177,11000,,,,,,,,,,,,,,,,,,,
|
| 23 |
+
,,,1.4005729746936177,11000,0.0718986839056015,0.9981721218474578,0.9982508526859964,0.9758077493875945,0.9788950827489958,0.9407896995544434,0.9757735281478298,0.9771233614652541,0.9743589743589743,0.9744313109309717,0.9771880819366853,302.0521,924.476,14.448,,,,,
|
| 24 |
+
0.0395,0.7460657358169556,2.0212390185360698e-05,1.5279006843864396,12000,,,,,,,,,,,,,,,,,,,
|
| 25 |
+
,,,1.5279006843864396,12000,0.06611855328083038,0.9983199319059504,0.9983858111318533,0.9774215058543476,0.9800705008735375,0.9111796617507935,0.9774459246526286,0.9764154314546676,0.9785274316000573,0.9784808854562942,0.9763644177051999,302.0129,924.596,14.45,,,,,
|
| 26 |
+
0.0394,0.165859654545784,1.1041240468788348e-05,1.6552283940792614,13000,,,,,,,,,,,,,,,,,,,
|
| 27 |
+
,,,1.6552283940792614,13000,0.05736415088176727,0.998453987731572,0.998514713572729,0.9790977044095155,0.9808756501587189,0.8652240633964539,0.9791505514969202,0.9767398771432236,0.9816788425726973,0.9815857293001425,0.9766222604211431,301.9939,924.655,14.451,,,,,
|
| 28 |
+
0.0386,0.3515833616256714,4.466249708014854e-06,1.7825561037720834,14000,,,,,,,,,,,,,,,,,,,
|
| 29 |
+
,,,1.7825561037720834,14000,0.06121337413787842,0.998390859685223,0.9984525070219366,0.9779097761600413,0.9804425977774784,0.8887588381767273,0.9779186362985246,0.9775355680874818,0.9783197249677696,0.9783023195802392,0.9775175476292794,302.6176,922.749,14.421,,,,,
|
| 30 |
+
0.0374,0.2531239092350006,7.75541558362136e-07,1.9098838134649054,15000,,,,,,,,,,,,,,,,,,,
|
| 31 |
+
,,,1.9098838134649054,15000,0.06407459080219269,0.998362819330903,0.9984245391184183,0.9778709076581417,0.980121108766156,0.9059898257255554,0.977875662512534,0.9776703894616265,0.9780905314424867,0.9780811120664947,0.9776607935825813,302.0272,924.553,14.449,,,,,
|
| 32 |
+
,,,2.0,15708,,,,,,,,,,,,,,,15515.8764,259.158,1.012,3.758608241022468e+18,0.05829760437555704
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|