kbourro commited on
Commit
8b1a796
·
1 Parent(s): 5f029c0

Model improved

Browse files
README.md CHANGED
@@ -16,19 +16,35 @@ metrics:
16
  - f1
17
  - auroc
18
  - average_precision
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ---
20
 
21
  # AI Detector LoRA (DeBERTa-v3-large)
22
 
23
- LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.3M English samples
24
  (`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model.
25
 
26
  - **Base model:** `microsoft/deberta-v3-large`
27
  - **Task:** Binary classification (AI vs Human)
28
  - **Head:** Single-logit + `BCEWithLogitsLoss`
29
  - **Adapter type:** LoRA (`peft`)
30
- - **Hardware:** H100 SXM, bf16, multi-GPU
31
- - **Final decision threshold:** **0.9284** (max-F1 on calibration set)
32
 
33
  ---
34
 
@@ -47,34 +63,35 @@ LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.3M Englis
47
 
48
  ---
49
 
50
- ## Metrics (test set)
51
 
52
- Using threshold **0.9284**:
53
 
54
  | Metric | Value |
55
  | ---------------------- | ------ |
56
- | AUROC | 0.9979 |
57
- | Average Precision (AP) | 0.9977 |
58
- | F1 | 0.9773 |
59
- | Accuracy | 0.9797 |
60
- | Precision | 0.9909 |
61
- | Recall | 0.9640 |
62
- | Specificity | 0.9927 |
 
63
 
64
  Confusion matrix (test):
65
 
66
- - **True Negatives (Human correctly)**: 123,936
67
- - **False Positives (Human → AI)**: 912
68
- - **False Negatives (AI → Human)**: 3,723
69
- - **True Positives (AI correctly)**: 99,816
70
 
71
  ### Calibration
72
 
73
  - **Method:** temperature scaling
74
- - **Temperature (T):** 1.2807
75
  - **Calibration set:** calibration
76
- - Test ECE: 0.0119 → 0.0159 (after calibration)
77
- - Test Brier: 0.01812 → 0.01829 (after calibration)
78
 
79
  ---
80
 
@@ -156,7 +173,7 @@ model.eval()
156
  ```python
157
  # load threshold
158
  with open("threshold.json") as f:
159
- thr = json.load(f)["threshold"] # 0.9284
160
 
161
  def predict_proba(texts):
162
  enc = tokenizer(
@@ -194,7 +211,7 @@ model = AutoModelForSequenceClassification.from_pretrained(model_dir)
194
  model.eval()
195
 
196
  with open("threshold.json") as f:
197
- thr = json.load(f)["threshold"] # 0.9284
198
 
199
  def predict_proba(texts):
200
  enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
@@ -209,7 +226,7 @@ def predict_proba(texts):
209
  ```python
210
  import json
211
  with open("calibration.json") as f:
212
- T = json.load(f)["temperature"] # e.g., 1.2807
213
 
214
  def predict_proba_calibrated(texts):
215
  enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
@@ -224,12 +241,17 @@ def predict_proba_calibrated(texts):
224
  ## Notes
225
 
226
  - Classifier head is **trainable** together with LoRA layers (unfrozen after applying PEFT).
227
- - Training used:
 
 
 
228
 
229
  - `bf16=True`
230
  - `optim="adamw_torch_fused"`
231
- - cosine-with-restarts scheduler
232
- - LR scaled down from HPO to account for full-dataset (~14k steps).
 
 
233
 
234
- - Threshold `0.9284` was chosen as the **max-F1** point on the calibration set.
235
  You can adjust it if you prefer fewer false positives or fewer false negatives.
 
16
  - f1
17
  - auroc
18
  - average_precision
19
+ model-index:
20
+ - name: AI Detector LoRA (DeBERTa-v3-large)
21
+ results:
22
+ - task:
23
+ type: text-classification
24
+ name: AI Text Detection
25
+ dataset:
26
+ name: stealthcode/ai-detection
27
+ type: stealthcode/ai-detection
28
+ metrics:
29
+ - type: auroc
30
+ value: 0.9985
31
+ - type: f1
32
+ value: 0.9812
33
+ - type: accuracy
34
+ value: 0.9814
35
  ---
36
 
37
  # AI Detector LoRA (DeBERTa-v3-large)
38
 
39
+ LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.7M English samples
40
  (`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model.
41
 
42
  - **Base model:** `microsoft/deberta-v3-large`
43
  - **Task:** Binary classification (AI vs Human)
44
  - **Head:** Single-logit + `BCEWithLogitsLoss`
45
  - **Adapter type:** LoRA (`peft`)
46
+ - **Hardware:** 8 x RTX 5090, bf16, multi-GPU
47
+ - **Final decision threshold:** **0.8697** (max-F1 on calibration set)
48
 
49
  ---
50
 
 
63
 
64
  ---
65
 
66
+ ## Metrics (test set, n=279,241)
67
 
68
+ Using threshold **0.8697**:
69
 
70
  | Metric | Value |
71
  | ---------------------- | ------ |
72
+ | AUROC | 0.9985 |
73
+ | Average Precision (AP) | 0.9985 |
74
+ | F1 | 0.9812 |
75
+ | Accuracy | 0.9814 |
76
+ | Precision (AI) | 0.9902 |
77
+ | Recall (AI) | 0.9724 |
78
+ | Precision (Human) | 0.9728 |
79
+ | Recall (Human) | 0.9904 |
80
 
81
  Confusion matrix (test):
82
 
83
+ - **True Negatives (Human correctly)**: 138,276
84
+ - **False Positives (Human → AI)**: 1,345
85
+ - **False Negatives (AI → Human)**: 3,859
86
+ - **True Positives (AI correctly)**: 135,761
87
 
88
  ### Calibration
89
 
90
  - **Method:** temperature scaling
91
+ - **Temperature (T):** 1.4437
92
  - **Calibration set:** calibration
93
+ - Test ECE: 0.0075 → 0.0116 (after calibration)
94
+ - Test Brier: 0.0157 → 0.0156 (after calibration)
95
 
96
  ---
97
 
 
173
  ```python
174
  # load threshold
175
  with open("threshold.json") as f:
176
+ thr = json.load(f)["threshold"] # 0.8697
177
 
178
  def predict_proba(texts):
179
  enc = tokenizer(
 
211
  model.eval()
212
 
213
  with open("threshold.json") as f:
214
+ thr = json.load(f)["threshold"] # 0.8697
215
 
216
  def predict_proba(texts):
217
  enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
 
226
  ```python
227
  import json
228
  with open("calibration.json") as f:
229
+ T = json.load(f)["temperature"] # e.g., 1.4437
230
 
231
  def predict_proba_calibrated(texts):
232
  enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
 
241
  ## Notes
242
 
243
  - Classifier head is **trainable** together with LoRA layers (unfrozen after applying PEFT).
244
+ - **LoRA config:**
245
+ - `r=32`, `alpha=128`, `dropout=0.0`
246
+ - Target modules: `query_proj`, `key_proj`, `value_proj`
247
+ - **Training config:**
248
 
249
  - `bf16=True`
250
  - `optim="adamw_torch_fused"`
251
+ - `lr_scheduler_type="cosine_with_restarts"`
252
+ - `num_train_epochs=2`
253
+ - `per_device_train_batch_size=8`, `gradient_accumulation_steps=4`
254
+ - `max_grad_norm=0.5`
255
 
256
+ - Threshold `0.8697` was chosen as the **max-F1** point on the calibration set.
257
  You can adjust it if you prefer fewer false positives or fewer false negatives.
adapter/adapter_config.json CHANGED
@@ -34,9 +34,9 @@
34
  "rank_pattern": {},
35
  "revision": null,
36
  "target_modules": [
 
37
  "query_proj",
38
- "value_proj",
39
- "key_proj"
40
  ],
41
  "target_parameters": null,
42
  "task_type": "SEQ_CLS",
 
34
  "rank_pattern": {},
35
  "revision": null,
36
  "target_modules": [
37
+ "key_proj",
38
  "query_proj",
39
+ "value_proj"
 
40
  ],
41
  "target_parameters": null,
42
  "task_type": "SEQ_CLS",
adapter/adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2320198a394f889f1be50d439b5217d996e84da64a4a65671bf653607879cfcb
3
  size 23099012
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78566bec1ea60ab8451693e143cefe13dea659a75daf9c426bc8269c7dfce4b1
3
  size 23099012
calibration.json CHANGED
@@ -1,26 +1,26 @@
1
  {
2
- "temperature": 1.2806789875030518,
3
  "method": "temperature_scaling",
4
  "calibration_set": "calibration",
5
  "calibration_metrics": {
6
- "temperature": 1.2806789875030518,
7
  "optimization_method": "LBFGS_logspace",
8
- "uncalibrated_nll": 0.06460460661246972,
9
- "calibrated_nll": 0.06279846573841724,
10
- "uncalibrated_ece": 0.012124871567496009,
11
- "calibrated_ece": 0.016240862688628014,
12
- "uncalibrated_brier": 0.01822748637167701,
13
- "calibrated_brier": 0.018437309998858068,
14
- "nll_improvement": 0.001806140874052481,
15
- "ece_improvement": -0.004115991121132005,
16
- "brier_improvement": -0.00020982362718105843
17
  },
18
  "test_metrics": {
19
- "ece_before": 0.011862174308705089,
20
- "ece_after": 0.015908939599173937,
21
- "ece_improvement": -0.004046765290468848,
22
- "brier_before": 0.01812282837704726,
23
- "brier_after": 0.018294590049400802,
24
- "brier_improvement": -0.0001717616723535438
25
  }
26
  }
 
1
  {
2
+ "temperature": 1.4436575174331665,
3
  "method": "temperature_scaling",
4
  "calibration_set": "calibration",
5
  "calibration_metrics": {
6
+ "temperature": 1.4436575174331665,
7
  "optimization_method": "LBFGS_logspace",
8
+ "uncalibrated_nll": 0.057230830731130305,
9
+ "calibrated_nll": 0.05340311260808736,
10
+ "uncalibrated_ece": 0.007595386161633095,
11
+ "calibrated_ece": 0.011707928851842823,
12
+ "uncalibrated_brier": 0.01589206575792085,
13
+ "calibrated_brier": 0.015775692446082124,
14
+ "nll_improvement": 0.0038277181230429447,
15
+ "ece_improvement": -0.004112542690209728,
16
+ "brier_improvement": 0.00011637331183872793
17
  },
18
  "test_metrics": {
19
+ "ece_before": 0.007462335961689493,
20
+ "ece_after": 0.011600581100766194,
21
+ "ece_improvement": -0.004138245139076701,
22
+ "brier_before": 0.015727129447539786,
23
+ "brier_after": 0.0156334356493489,
24
+ "brier_improvement": 9.369379819088725e-05
25
  }
26
  }
figures/fig_calibration_calib.png CHANGED

Git LFS Details

  • SHA256: fb9b407564ca2d3c6902bd1825774cbd4acce0e67d817d368966c7a92e5e91eb
  • Pointer size: 130 Bytes
  • Size of remote file: 66 kB

Git LFS Details

  • SHA256: 142d361cd8031012d33019ea55a697b913fa8056d17e1d5c284f6c51c301480e
  • Pointer size: 130 Bytes
  • Size of remote file: 62.9 kB
figures/fig_calibration_comparison_calib.png CHANGED

Git LFS Details

  • SHA256: f50100460613deceb44f375740f993cd85fb5caaf5d75cba20c5d4e45c4cf78d
  • Pointer size: 130 Bytes
  • Size of remote file: 98.3 kB

Git LFS Details

  • SHA256: f56b663df531d44a12bb46f9833c1867a2b44efa640857a87497b19fe4ebc6bf
  • Pointer size: 130 Bytes
  • Size of remote file: 99.9 kB
figures/fig_calibration_comparison_test.png CHANGED

Git LFS Details

  • SHA256: 31dfd6bb061a84a373dde4cde5b3831c2db14b8ba4bb36d2d1736eb4ea0e10ce
  • Pointer size: 131 Bytes
  • Size of remote file: 103 kB

Git LFS Details

  • SHA256: dfe36d2d8f69ddd1751e1b0e025096681be3491f3b8e27a0532906b019cb3edb
  • Pointer size: 130 Bytes
  • Size of remote file: 98.7 kB
figures/fig_calibration_test.png CHANGED

Git LFS Details

  • SHA256: 718c8f118f32a1c871ea563db30fc644b9cb11e53f3d83f0c21597e2fde7a144
  • Pointer size: 130 Bytes
  • Size of remote file: 66.6 kB

Git LFS Details

  • SHA256: a63bde212cbc7ab6daee18b2291d13b52d850cee7f7b52977e428c251e8f6293
  • Pointer size: 130 Bytes
  • Size of remote file: 64.4 kB
figures/fig_confusion_test.png CHANGED

Git LFS Details

  • SHA256: d5d34c728453c76fa608776e0cdcdd298a6513aef81b2602f453f5b5cb347dfd
  • Pointer size: 130 Bytes
  • Size of remote file: 46.4 kB

Git LFS Details

  • SHA256: affe9ca50c1602e93dae6c05652324b2c6378034f1b31b0597a94a9a5e67b38f
  • Pointer size: 130 Bytes
  • Size of remote file: 46.3 kB
figures/fig_eval_metrics.png CHANGED

Git LFS Details

  • SHA256: 3dbb5ef9aaebdc60bf5c866601ae15429d185cb012d4c6c52b4712e47c2ac24b
  • Pointer size: 130 Bytes
  • Size of remote file: 92.4 kB

Git LFS Details

  • SHA256: 4665e2821dd49090b311dc12781d147fc89a2c3144738029fad46ba751c06ad5
  • Pointer size: 130 Bytes
  • Size of remote file: 83.2 kB
figures/fig_learning_curves.png CHANGED

Git LFS Details

  • SHA256: a66e8baab1fa6cfaf7f14efc5574a2b4d0d23943d2cfa95a32dd30365e468422
  • Pointer size: 130 Bytes
  • Size of remote file: 58.1 kB

Git LFS Details

  • SHA256: 2dd2ab634527c303b7d967270119ed4d803bb3b9b96255ea65285bf66732c533
  • Pointer size: 130 Bytes
  • Size of remote file: 67.2 kB
figures/fig_pr_calib.png CHANGED

Git LFS Details

  • SHA256: f0ae2af980ed8b9c0a0e4b81e97d195a46055f7950e905f2c2c8b764f1ed258e
  • Pointer size: 130 Bytes
  • Size of remote file: 33.1 kB

Git LFS Details

  • SHA256: 04fc1d13aeb90bfa91576808eb9d45cf2e08b8dbb88631ccb097c48c05783c3c
  • Pointer size: 130 Bytes
  • Size of remote file: 32.9 kB
figures/fig_pr_test.png CHANGED

Git LFS Details

  • SHA256: c31095938ce085ffd8bdebf895dc51ca49fc4bd7fc316c26c29aea864164ac92
  • Pointer size: 130 Bytes
  • Size of remote file: 32.3 kB

Git LFS Details

  • SHA256: a76c8508d43e74c7f91485a243f180cc65c70357b07d8229c18560ab98de31bc
  • Pointer size: 130 Bytes
  • Size of remote file: 32.7 kB
figures/fig_roc_calib.png CHANGED

Git LFS Details

  • SHA256: 2930b3b1b03eb1c3e19e2b661326229f825646be4730ab0b1aaf6364e8ffa933
  • Pointer size: 130 Bytes
  • Size of remote file: 50 kB

Git LFS Details

  • SHA256: 3fdadf4d341190c18d24be63e074421148cec512c25ad97d7114a504d1f8bcda
  • Pointer size: 130 Bytes
  • Size of remote file: 50.1 kB
figures/fig_roc_test.png CHANGED

Git LFS Details

  • SHA256: f500a2984b3782377f03549aaf7dcc955924ef522e308f6f4913b6ee48381927
  • Pointer size: 130 Bytes
  • Size of remote file: 50 kB

Git LFS Details

  • SHA256: a62f79b1a7c283862e8b691cc1e3da63a099bec78fd95096774a2ef6d6519ed8
  • Pointer size: 130 Bytes
  • Size of remote file: 50 kB
figures/fig_threshold_f1_calib.png CHANGED

Git LFS Details

  • SHA256: 75c36a7436e399d9cb3aee204d94fca9a58d8a4b393070c511992e20d9504894
  • Pointer size: 130 Bytes
  • Size of remote file: 43.2 kB

Git LFS Details

  • SHA256: e60d9702c86bf5a9fb3e176f0fae528ab3495b1ad9467d28add8d9287cd9b7cf
  • Pointer size: 130 Bytes
  • Size of remote file: 42.4 kB
merged_model/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f780611b3aca302b2c564ebcb57391dcced23c391e62f96ee4509fa71076185f
3
  size 1740300340
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:43dabe52c25c41912eef74ac37416aa2c7793f42fc8193906f6bc7153a877962
3
  size 1740300340
predictions_calib.csv CHANGED
The diff for this file is too large to render. See raw diff
 
predictions_test.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results.json CHANGED
@@ -16,105 +16,105 @@
16
  "key_proj",
17
  "value_proj"
18
  ],
19
- "learning_rate": 0.0001554357238163802,
20
  "lr_scheduler_type": "cosine_with_restarts",
21
  "max_grad_norm": 0.5,
22
  "optim": "adamw_torch_fused"
23
  },
24
  "threshold_optimization": {
25
  "max_f1": {
26
- "threshold": 0.9284088015556335,
27
  "metrics": {
28
- "threshold": 0.9284088015556335,
29
- "auroc": 0.9978600960936826,
30
- "average_precision": 0.997597673288253,
31
- "f1": 0.9773827668313225,
32
- "accuracy": 0.9797765846236365,
33
- "precision": 0.9912838341196921,
34
- "recall": 0.9638661853653825,
35
- "specificity": 0.9929714251386692,
36
- "precision_human": 0.9707053998766749,
37
- "recall_human": 0.9929714251386692,
38
- "precision_ai": 0.9912838341196921,
39
- "recall_ai": 0.9638661853653825,
40
  "confusion_matrix": {
41
- "true_negative": 99176,
42
- "false_positive": 702,
43
- "false_negative": 2993,
44
- "true_positive": 79838
45
  }
46
  }
47
  },
48
  "precision_at_95recall": {
49
- "threshold": 1.9947297005273867e-06,
50
  "metrics": {
51
- "threshold": 1.9947297005273867e-06,
52
- "auroc": 0.9978600960936826,
53
- "average_precision": 0.997597673288253,
54
- "f1": 0.6238683437523537,
55
- "accuracy": 0.45334931503100556,
56
- "precision": 0.45334931503100556,
57
  "recall": 1.0,
58
  "specificity": 0.0,
59
  "precision_human": 0.0,
60
  "recall_human": 0.0,
61
- "precision_ai": 0.45334931503100556,
62
  "recall_ai": 1.0,
63
  "confusion_matrix": {
64
  "true_negative": 0,
65
- "false_positive": 99878,
66
  "false_negative": 0,
67
- "true_positive": 82831
68
  }
69
  }
70
  }
71
  },
72
  "calibration": {
73
- "temperature": 1.2806789875030518,
74
  "method": "temperature_scaling",
75
  "calibration_set": "calibration",
76
  "calibration_metrics": {
77
- "temperature": 1.2806789875030518,
78
  "optimization_method": "LBFGS_logspace",
79
- "uncalibrated_nll": 0.06460460661246972,
80
- "calibrated_nll": 0.06279846573841724,
81
- "uncalibrated_ece": 0.012124871567496009,
82
- "calibrated_ece": 0.016240862688628014,
83
- "uncalibrated_brier": 0.01822748637167701,
84
- "calibrated_brier": 0.018437309998858068,
85
- "nll_improvement": 0.001806140874052481,
86
- "ece_improvement": -0.004115991121132005,
87
- "brier_improvement": -0.00020982362718105843
88
  },
89
  "test_metrics": {
90
- "ece_before": 0.011862174308705089,
91
- "ece_after": 0.015908939599173937,
92
- "ece_improvement": -0.004046765290468848,
93
- "brier_before": 0.01812282837704726,
94
- "brier_after": 0.018294590049400802,
95
- "brier_improvement": -0.0001717616723535438
96
  }
97
  },
98
  "test_metrics": {
99
- "threshold": 0.9284088015556335,
100
- "auroc": 0.997910815020985,
101
- "average_precision": 0.9976513211537581,
102
- "f1": 0.9773091101352641,
103
- "accuracy": 0.9797054998752118,
104
- "precision": 0.9909459137479152,
105
- "recall": 0.9640425346970707,
106
- "specificity": 0.9926951172625913,
107
- "precision_human": 0.9708363687636594,
108
- "recall_human": 0.9926951172625913,
109
- "precision_ai": 0.9909459137479152,
110
- "recall_ai": 0.9640425346970707,
111
  "confusion_matrix": {
112
- "true_negative": 123936,
113
- "false_positive": 912,
114
- "false_negative": 3723,
115
- "true_positive": 99816
116
  }
117
  },
118
- "timestamp": "20251115_090814",
119
  "seed": 42
120
  }
 
16
  "key_proj",
17
  "value_proj"
18
  ],
19
+ "learning_rate": 0.00014057133690327707,
20
  "lr_scheduler_type": "cosine_with_restarts",
21
  "max_grad_norm": 0.5,
22
  "optim": "adamw_torch_fused"
23
  },
24
  "threshold_optimization": {
25
  "max_f1": {
26
+ "threshold": 0.869714617729187,
27
  "metrics": {
28
+ "threshold": 0.869714617729187,
29
+ "auroc": 0.9984783120401353,
30
+ "average_precision": 0.9985350724478098,
31
+ "f1": 0.9809629649707713,
32
+ "accuracy": 0.9811363829663419,
33
+ "precision": 0.9900603673104631,
34
+ "recall": 0.9720312276178198,
35
+ "specificity": 0.9902414567983026,
36
+ "precision_human": 0.9725316756205432,
37
+ "recall_human": 0.9902414567983026,
38
+ "precision_ai": 0.9900603673104631,
39
+ "recall_ai": 0.9720312276178198,
40
  "confusion_matrix": {
41
+ "true_negative": 110607,
42
+ "false_positive": 1090,
43
+ "false_negative": 3124,
44
+ "true_positive": 108572
45
  }
46
  }
47
  },
48
  "precision_at_95recall": {
49
+ "threshold": 3.6534821390432626e-08,
50
  "metrics": {
51
+ "threshold": 3.6534821390432626e-08,
52
+ "auroc": 0.9984783120401353,
53
+ "average_precision": 0.9985350724478098,
54
+ "f1": 0.6666646771454748,
55
+ "accuracy": 0.49999776179199884,
56
+ "precision": 0.49999776179199884,
57
  "recall": 1.0,
58
  "specificity": 0.0,
59
  "precision_human": 0.0,
60
  "recall_human": 0.0,
61
+ "precision_ai": 0.49999776179199884,
62
  "recall_ai": 1.0,
63
  "confusion_matrix": {
64
  "true_negative": 0,
65
+ "false_positive": 111697,
66
  "false_negative": 0,
67
+ "true_positive": 111696
68
  }
69
  }
70
  }
71
  },
72
  "calibration": {
73
+ "temperature": 1.4436575174331665,
74
  "method": "temperature_scaling",
75
  "calibration_set": "calibration",
76
  "calibration_metrics": {
77
+ "temperature": 1.4436575174331665,
78
  "optimization_method": "LBFGS_logspace",
79
+ "uncalibrated_nll": 0.057230830731130305,
80
+ "calibrated_nll": 0.05340311260808736,
81
+ "uncalibrated_ece": 0.007595386161633095,
82
+ "calibrated_ece": 0.011707928851842823,
83
+ "uncalibrated_brier": 0.01589206575792085,
84
+ "calibrated_brier": 0.015775692446082124,
85
+ "nll_improvement": 0.0038277181230429447,
86
+ "ece_improvement": -0.004112542690209728,
87
+ "brier_improvement": 0.00011637331183872793
88
  },
89
  "test_metrics": {
90
+ "ece_before": 0.007462335961689493,
91
+ "ece_after": 0.011600581100766194,
92
+ "ece_improvement": -0.004138245139076701,
93
+ "brier_before": 0.015727129447539786,
94
+ "brier_after": 0.0156334356493489,
95
+ "brier_improvement": 9.369379819088725e-05
96
  }
97
  },
98
  "test_metrics": {
99
+ "threshold": 0.869714617729187,
100
+ "auroc": 0.9984910666612247,
101
+ "average_precision": 0.9985476887515279,
102
+ "f1": 0.981194394455165,
103
+ "accuracy": 0.9813637682145531,
104
+ "precision": 0.9901900719151605,
105
+ "recall": 0.972360693310414,
106
+ "specificity": 0.9903667786364515,
107
+ "precision_human": 0.9728497555141239,
108
+ "recall_human": 0.9903667786364515,
109
+ "precision_ai": 0.9901900719151605,
110
+ "recall_ai": 0.972360693310414,
111
  "confusion_matrix": {
112
+ "true_negative": 138276,
113
+ "false_positive": 1345,
114
+ "false_negative": 3859,
115
+ "true_positive": 135761
116
  }
117
  },
118
+ "timestamp": "20251124_170935",
119
  "seed": 42
120
  }
threshold.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "threshold": 0.9284088015556335,
3
  "method": "max_f1",
4
- "calibration_f1": 0.9773827668313225,
5
  "alternative_thresholds": {
6
- "max_f1": 0.9284088015556335,
7
- "precision_at_95recall": 1.9947297005273867e-06
8
  }
9
  }
 
1
  {
2
+ "threshold": 0.869714617729187,
3
  "method": "max_f1",
4
+ "calibration_f1": 0.9809629649707713,
5
  "alternative_thresholds": {
6
+ "max_f1": 0.869714617729187,
7
+ "precision_at_95recall": 3.6534821390432626e-08
8
  }
9
  }
training_log_history.csv CHANGED
@@ -1,52 +1,32 @@
1
  loss,grad_norm,learning_rate,epoch,step,eval_loss,eval_auroc,eval_ap,eval_f1,eval_max_f1,eval_best_threshold,eval_accuracy,eval_precision_human,eval_recall_human,eval_precision_ai,eval_recall_ai,eval_runtime,eval_samples_per_second,eval_steps_per_second,train_runtime,train_samples_per_second,train_steps_per_second,total_flos,train_loss
2
- 0.276,3.159646987915039,0.00013013829896707,0.07783918424534911,500,,,,,,,,,,,,,,,,,,,
3
- ,,,0.07783918424534911,500,0.12149354815483093,0.9924457584277557,0.9916298325594288,0.9536510818288485,0.9557720332927327,0.6680145263671875,0.9579702783883426,0.9616493887295509,0.9614568231515375,0.9535359777528871,0.9537662136972542,251.8553,906.814,14.171,,,,,
4
- 0.1082,0.4662734270095825,0.00015502115157402368,0.15567836849069822,1000,,,,,,,,,,,,,,,,,,,
5
- ,,,0.15567836849069822,1000,0.11359784007072449,0.9941361028997843,0.9936079408729138,0.9563341131667457,0.964144751321268,0.8459424376487732,0.9600369549797273,0.9707662766667209,0.9556737446634681,0.9475350777398559,0.9652981002327625,252.3501,905.036,14.143,,,,,
6
- 0.0901,0.56740403175354,0.00015336171321936976,0.23351755273604732,1500,,,,,,,,,,,,,,,,,,,
7
- ,,,0.23351755273604732,1500,0.09792134165763855,0.995637180466362,0.9951736481792155,0.9632415978730987,0.9687276503605232,0.8856314420700073,0.9665828903698125,0.971504195528524,0.9672399016396068,0.9607059479089608,0.9657906682506109,252.1455,905.771,14.155,,,,,
8
- 0.0828,0.6910482048988342,0.0001504606098364759,0.31135673698139643,2000,,,,,,,,,,,,,,,,,,,
9
- ,,,0.31135673698139643,2000,0.07930342108011246,0.9965300160027056,0.996129497193991,0.9701504169589276,0.9709891509313444,0.6976089477539062,0.9731682327288013,0.968767028089932,0.9825946959077911,0.9786452037697653,0.9618018331256821,252.2542,905.38,14.148,,,,,
10
- 0.0786,0.7298774719238281,0.00014636546193317465,0.38919592122674557,2500,,,,,,,,,,,,,,,,,,,
11
- ,,,0.38919592122674557,2500,0.10587891191244125,0.9961595063461319,0.99574080839367,0.9633412681237827,0.9702281512250107,0.9585376977920532,0.9666222973387161,0.9727536699467656,0.965998382019592,0.9593414171599334,0.9673746124648683,252.014,906.243,14.162,,,,,
12
- 0.075,0.5279271006584167,0.00014114348980363213,0.46703510547209465,3000,,,,,,,,,,,,,,,,,,,
13
- ,,,0.46703510547209465,3000,0.08700015395879745,0.9963549115948438,0.9959508112193701,0.9664618832348054,0.9711828125530761,0.8791467547416687,0.9695953342148819,0.9720765232989261,0.9723101075716677,0.9666019379957298,0.9663218690541728,252.0033,906.282,14.163,,,,,
14
- 0.0718,0.8794483542442322,0.00013488041013280436,0.5448742897174438,3500,,,,,,,,,,,,,,,,,,,
15
- ,,,0.5448742897174438,3500,0.10059615969657898,0.9964150386194646,0.9960424913733611,0.9614705825931823,0.9711960816065123,0.954647421836853,0.9648227124254551,0.9732758550835028,0.9620655682555448,0.9548853558398507,0.9681472681791402,252.084,905.992,14.158,,,,,
16
- 0.0695,0.40731295943260193,0.00012767902898967842,0.6227134739627929,4000,,,,,,,,,,,,,,,,,,,
17
- ,,,0.6227134739627929,4000,0.10083704441785812,0.9961781307127794,0.995731264522598,0.9619906765054659,0.9692610702277147,0.9149009585380554,0.9653350030212009,0.9728877169710597,0.9634352447395612,0.9564208797922713,0.9676257255720067,252.2327,905.457,14.15,,,,,
18
- 0.0669,0.264863520860672,0.00011965755430477945,0.7005526582081419,4500,,,,,,,,,,,,,,,,,,,
19
- ,,,0.7005526582081419,4500,0.094916433095932,0.9967265965858536,0.9963598631028794,0.9650556403576777,0.972462604745369,0.9161096215248108,0.9682248474074593,0.9731915784051766,0.9685535094956227,0.962298576833695,0.9678285476970031,251.9541,906.459,14.165,,,,,
20
- 0.0672,0.7680786848068237,0.00011094765553198254,0.7783918424534911,5000,,,,,,,,,,,,,,,,,,,
21
- ,,,0.7783918424534911,5000,0.09323982149362564,0.9967850366539563,0.9963962011157831,0.9673139455667273,0.9710460087467818,0.8643104434013367,0.9703790950408518,0.972494317999936,0.9733433722876801,0.9678236488446292,0.9668047788755928,251.9538,906.46,14.165,,,,,
22
- 0.0647,0.4233705997467041,0.0001016923023445425,0.8562310266988402,5500,,,,,,,,,,,,,,,,,,,
23
- ,,,0.8562310266988402,5500,0.09126096963882446,0.9967433808226673,0.9963771581479701,0.9657864214107987,0.9726255234214075,0.9111796617507935,0.9688728731183173,0.9742069565497575,0.9687056957716245,0.9625206246882314,0.9690744550362665,252.0396,906.151,14.16,,,,,
24
- 0.0627,0.25091952085494995,9.204341784232336e-05,0.9340702109441893,6000,,,,,,,,,,,,,,,,,,,
25
- ,,,0.9340702109441893,6000,0.06995870172977448,0.9975342235086746,0.9972417957598282,0.973204830514238,0.9758824625579795,0.8918110132217407,0.9758479066142408,0.9732840473716358,0.9827949410077935,0.9790068315757582,0.9674711944291523,252.1038,905.921,14.157,,,,,
26
- 0.0606,0.3678501546382904,8.215938479193825e-05,1.011831556005293,6500,,,,,,,,,,,,,,,,,,,
27
- ,,,1.011831556005293,6500,0.07501858472824097,0.9975025002706578,0.997203865754845,0.971139887346844,0.9754012996088349,0.8902942538261414,0.9738863152732654,0.9744957013881682,0.9778208527237339,0.9731459660760525,0.9691420624112653,252.3023,905.208,14.146,,,,,
28
- 0.0538,0.6192397475242615,7.220244583391773e-05,1.0896707402506423,7000,,,,,,,,,,,,,,,,,,,
29
- ,,,1.0896707402506423,7000,0.09348879754543304,0.997100493345388,0.9967079581489795,0.9689502265506753,0.9725104874267961,0.9184802770614624,0.9718853169633865,0.97321942331053,0.9754099017197049,0.9702686474655716,0.967635383768435,252.0093,906.26,14.162,,,,,
30
- 0.0536,0.6146565675735474,6.233604033151736e-05,1.1675099244959912,7500,,,,,,,,,,,,,,,,,,,
31
- ,,,1.1675099244959912,7500,0.08755695074796677,0.9973925286764659,0.9970545509180517,0.9708156623418074,0.974796319089823,0.9324532747268677,0.9735754380741376,0.9747314921365554,0.9769878331077239,0.9721743341404359,0.9694607828934025,251.9972,906.304,14.163,,,,,
32
- 0.0563,0.24420885741710663,5.272212157577683e-05,1.2453491087413404,8000,,,,,,,,,,,,,,,,,,,
33
- ,,,1.2453491087413404,8000,0.07455883920192719,0.9975485047929438,0.9972585615075422,0.9726402255038238,0.9758848582753115,0.9136765599250793,0.9752655591848888,0.9750665582603982,0.9798072841157577,0.97550810243656,0.969789161571968,252.079,906.01,14.158,,,,,
34
- 0.0521,0.4137882590293884,4.351849838388919e-05,1.3231882929866896,8500,,,,,,,,,,,,,,,,,,,
35
- ,,,1.3231882929866896,8500,0.07783409208059311,0.9975517384131479,0.9972440455972116,0.9723962677736611,0.9749894769810089,0.9111796617507935,0.975077281444572,0.9738866219645087,0.9807043821637684,0.9765353333658013,0.9682921411255662,252.489,904.538,14.135,,,,,
36
- 0.0526,0.3627403974533081,3.4876244727530656e-05,1.4010274772320386,9000,,,,,,,,,,,,,,,,,,,
37
- ,,,1.4010274772320386,9000,0.0745643824338913,0.9975749102093014,0.9972690713005937,0.9723900247831475,0.9756176280729699,0.9046505093574524,0.9750247388193672,0.975305785387727,0.9791024213637493,0.9746829301427421,0.9701078820541053,252.119,905.866,14.156,,,,,
38
- 0.052,0.7762022614479065,2.693721991111627e-05,1.4788666614773878,9500,,,,,,,,,,,,,,,,,,,
39
- ,,,1.4788666614773878,9500,0.07452459633350372,0.9977328074904859,0.9974410894856215,0.9730508384452147,0.975759591492858,0.9124361872673035,0.9755895720403177,0.976881986981624,0.978501686063742,0.9740254712964038,0.9720781541254986,252.2439,905.417,14.149,,,,,
40
- 0.0507,0.3820905387401581,1.9831740005311437e-05,1.5567058457227367,10000,,,,,,,,,,,,,,,,,,,
41
- ,,,1.5567058457227367,10000,0.08428945392370224,0.9975299683849125,0.9972155352602694,0.9700319035460719,0.9748023112122028,0.9334307909011841,0.9728135700086695,0.9755487501803781,0.9746970291636964,0.9695218431614696,0.9705425008933831,252.2555,905.376,14.148,,,,,
42
- 0.0515,0.36162489652633667,1.3676438758331925e-05,1.634545029968086,10500,,,,,,,,,,,,,,,,,,,
43
- ,,,1.634545029968086,10500,0.07442453503608704,0.9977306493713021,0.997440190110606,0.9729125537103704,0.9761234031726127,0.9136765599250793,0.9754888653420087,0.9760021075993326,0.9792385880317509,0.9748654545454546,0.9709674615362327,252.2157,905.519,14.151,,,,,
44
- 0.0512,0.5945746302604675,8.572353097359252e-06,1.7123842142134351,11000,,,,,,,,,,,,,,,,,,,
45
- ,,,1.7123842142134351,11000,0.07524814456701279,0.9977685405428793,0.9974825248399177,0.9727850366057699,0.9761874492694766,0.9207897186279297,0.975357508778997,0.9763919857424856,0.978581784103743,0.9741039521978714,0.9714696877505095,252.2195,905.505,14.15,,,,,
46
- 0.0509,0.9120739698410034,4.603264645836933e-06,1.7902233984587843,11500,,,,,,,,,,,,,,,,,,,
47
- ,,,1.7902233984587843,11500,0.06851697713136673,0.9979216958335029,0.997654853113855,0.9747033543129303,0.9769461620177022,0.9124361872673035,0.9771395794838563,0.9764685264549843,0.9818417743317821,0.9779586201532299,0.9714696877505095,252.8119,903.383,14.117,,,,,
48
- 0.048,0.6627203822135925,1.834324480010042e-06,1.8680625827041333,12000,,,,,,,,,,,,,,,,,,,
49
- ,,,1.8680625827041333,12000,0.07031949609518051,0.9978716433195517,0.9975978286587303,0.9741437319971922,0.9765590576618682,0.9241418242454529,0.9766141532318093,0.9766512444160816,0.980664333143768,0.9765690214120707,0.9717304590540763,252.5244,904.411,14.133,,,,,
50
- 0.0488,0.7994762659072876,3.1098369880601253e-07,1.9459017669494822,12500,,,,,,,,,,,,,,,,,,,
51
- ,,,1.9459017669494822,12500,0.07445533573627472,0.9977876009119543,0.997504157410398,0.972836356080046,0.9761660160257996,0.929440438747406,0.9753881586436997,0.9769268924908395,0.9780771664517369,0.9735279325286289,0.9721457615004974,252.7495,903.606,14.121,,,,,
52
- ,,,2.0,12848,,,,,,,,,,,,,,,15236.3158,215.85,0.843,3.0700924448014336e+18,0.07117979241486355
 
1
  loss,grad_norm,learning_rate,epoch,step,eval_loss,eval_auroc,eval_ap,eval_f1,eval_max_f1,eval_best_threshold,eval_accuracy,eval_precision_human,eval_recall_human,eval_precision_ai,eval_recall_ai,eval_runtime,eval_samples_per_second,eval_steps_per_second,train_runtime,train_samples_per_second,train_steps_per_second,total_flos,train_loss
2
+ 0.1959,1.1447575092315674,0.00014045785291760075,0.1273277096928219,1000,,,,,,,,,,,,,,,,,,,
3
+ ,,,0.1273277096928219,1000,0.10616814345121384,0.9950148191129459,0.9952899350299439,0.962311290288237,0.9679296498026972,0.8289388418197632,0.961968199398367,0.9705350238550314,0.9528649190660364,0.953707741871949,0.9710714797306976,302.0769,924.4,14.447,,,,,
4
+ 0.0793,0.9442410469055176,0.00013808916945651123,0.2546554193856438,2000,,,,,,,,,,,,,,,,,,,
5
+ ,,,0.2546554193856438,2000,0.06762869656085968,0.99699523578959,0.9971745448574031,0.9747652814641683,0.9751815105140932,0.6486889719963074,0.9749068901303538,0.9696360146472409,0.9805185503509526,0.9802974220045925,0.969295229909755,301.8486,925.1,14.458,,,,,
6
+ 0.066,1.0335582494735718,0.0001327492586182366,0.3819831290784657,3000,,,,,,,,,,,,,,,,,,,
7
+ ,,,0.3819831290784657,3000,0.10242880135774612,0.9966748220521451,0.9968401495248937,0.9646262449071978,0.9739805993788537,0.9433475732803345,0.96418134937688,0.9761586387280689,0.9516043546769803,0.9527918285219238,0.9767583440767799,301.8298,925.157,14.458,,,,,
8
+ 0.0604,0.49253857135772705,0.00012467212143575104,0.5093108387712876,4000,,,,,,,,,,,,,,,,,,,
9
+ ,,,0.5093108387712876,4000,0.07713142782449722,0.9974895074288549,0.9976087876172619,0.9726526500782271,0.9769228267815947,0.896251380443573,0.9725827245380319,0.9750118785365643,0.9700257842715944,0.970178288939245,0.9751396648044692,302.0624,924.445,14.447,,,,,
10
+ 0.0562,1.2046138048171997,0.00011421170734780347,0.6366385484641095,5000,,,,,,,,,,,,,,,,,,,
11
+ ,,,0.6366385484641095,5000,0.06674948334693909,0.9978946111553613,0.9979933640808432,0.9756314294006572,0.9780440997353064,0.8354835510253906,0.9756446067898582,0.9751307495832469,0.9761853602635725,0.9761595766801224,0.9751038533161438,301.8554,925.079,14.457,,,,,
12
+ 0.0536,0.8311429619789124,0.0001018264037275262,0.7639662581569314,6000,,,,,,,,,,,,,,,,,,,
13
+ ,,,0.7639662581569314,6000,0.07356549799442291,0.9977181868917908,0.9978107356852736,0.973830491983889,0.9777105038793896,0.8824278712272644,0.9738003151410972,0.9748955476747692,0.9726471852170177,0.9727101227651456,0.9749534450651769,301.9925,924.659,14.451,,,,,
14
+ 0.0519,0.4782758951187134,8.80589488206726e-05,0.8912939678497533,7000,,,,,,,,,,,,,,,,,,,
15
+ ,,,0.8912939678497533,7000,0.05816827714443207,0.9981127635745546,0.9981834468144306,0.9776137884992947,0.9801179559592376,0.7745833992958069,0.9776643747314139,0.9755153260939315,0.97992407964475,0.9798329364194289,0.9754046698180776,302.2733,923.8,14.437,,,,,
16
+ 0.0477,0.4090639054775238,7.351264833162605e-05,1.018589845615152,8000,,,,,,,,,,,,,,,,,,,
17
+ ,,,1.018589845615152,8000,0.08029133081436157,0.9978708245246101,0.9979566865663798,0.9732345002905806,0.9780694791225678,0.9196425676345825,0.9731163157140811,0.9773316857797336,0.9687007592035525,0.9689747467217595,0.9775318722246097,302.0748,924.407,14.447,,,,,
18
+ 0.0429,0.7672129273414612,5.882493787372914e-05,1.145917555307974,9000,,,,,,,,,,,,,,,,,,,
19
+ ,,,1.145917555307974,9000,0.08352840691804886,0.9977365428060092,0.9978319247026995,0.9719582883547271,0.9772799491574283,0.9416541457176208,0.9718127775390345,0.9767606806059158,0.9666236928806761,0.9669665199299633,0.9770018621973929,302.0739,924.41,14.447,,,,,
20
+ 0.0405,0.9896750450134277,4.4639449807758265e-05,1.2732452650007957,10000,,,,,,,,,,,,,,,,,,,
21
+ ,,,1.2732452650007957,10000,0.061068352311849594,0.9983874067313616,0.9984549393455515,0.9787944744951835,0.9803829377847685,0.8933094143867493,0.9788568972926515,0.97605417182894,0.9818006016330039,0.9816928197812649,0.9759131929522991,302.0004,924.635,14.45,,,,,
22
+ 0.0406,0.47202062606811523,3.157780853180043e-05,1.4005729746936177,11000,,,,,,,,,,,,,,,,,,,
23
+ ,,,1.4005729746936177,11000,0.0718986839056015,0.9981721218474578,0.9982508526859964,0.9758077493875945,0.9788950827489958,0.9407896995544434,0.9757735281478298,0.9771233614652541,0.9743589743589743,0.9744313109309717,0.9771880819366853,302.0521,924.476,14.448,,,,,
24
+ 0.0395,0.7460657358169556,2.0212390185360698e-05,1.5279006843864396,12000,,,,,,,,,,,,,,,,,,,
25
+ ,,,1.5279006843864396,12000,0.06611855328083038,0.9983199319059504,0.9983858111318533,0.9774215058543476,0.9800705008735375,0.9111796617507935,0.9774459246526286,0.9764154314546676,0.9785274316000573,0.9784808854562942,0.9763644177051999,302.0129,924.596,14.45,,,,,
26
+ 0.0394,0.165859654545784,1.1041240468788348e-05,1.6552283940792614,13000,,,,,,,,,,,,,,,,,,,
27
+ ,,,1.6552283940792614,13000,0.05736415088176727,0.998453987731572,0.998514713572729,0.9790977044095155,0.9808756501587189,0.8652240633964539,0.9791505514969202,0.9767398771432236,0.9816788425726973,0.9815857293001425,0.9766222604211431,301.9939,924.655,14.451,,,,,
28
+ 0.0386,0.3515833616256714,4.466249708014854e-06,1.7825561037720834,14000,,,,,,,,,,,,,,,,,,,
29
+ ,,,1.7825561037720834,14000,0.06121337413787842,0.998390859685223,0.9984525070219366,0.9779097761600413,0.9804425977774784,0.8887588381767273,0.9779186362985246,0.9775355680874818,0.9783197249677696,0.9783023195802392,0.9775175476292794,302.6176,922.749,14.421,,,,,
30
+ 0.0374,0.2531239092350006,7.75541558362136e-07,1.9098838134649054,15000,,,,,,,,,,,,,,,,,,,
31
+ ,,,1.9098838134649054,15000,0.06407459080219269,0.998362819330903,0.9984245391184183,0.9778709076581417,0.980121108766156,0.9059898257255554,0.977875662512534,0.9776703894616265,0.9780905314424867,0.9780811120664947,0.9776607935825813,302.0272,924.553,14.449,,,,,
32
+ ,,,2.0,15708,,,,,,,,,,,,,,,15515.8764,259.158,1.012,3.758608241022468e+18,0.05829760437555704