kbourro commited on 18 days ago

Commit

8b1a796

1 Parent(s): 5f029c0

Model improved

Browse files

Files changed (22) hide show

README.md +48 -26
adapter/adapter_config.json +2 -2
adapter/adapter_model.safetensors +1 -1
calibration.json +17 -17
figures/fig_calibration_calib.png +2 -2
figures/fig_calibration_comparison_calib.png +2 -2
figures/fig_calibration_comparison_test.png +2 -2
figures/fig_calibration_test.png +2 -2
figures/fig_confusion_test.png +2 -2
figures/fig_eval_metrics.png +2 -2
figures/fig_learning_curves.png +2 -2
figures/fig_pr_calib.png +2 -2
figures/fig_pr_test.png +2 -2
figures/fig_roc_calib.png +2 -2
figures/fig_roc_test.png +2 -2
figures/fig_threshold_f1_calib.png +2 -2
merged_model/model.safetensors +1 -1
predictions_calib.csv +0 -0
predictions_test.csv +0 -0
results.json +62 -62
threshold.json +4 -4
training_log_history.csv +31 -51

README.md CHANGED Viewed

@@ -16,19 +16,35 @@ metrics:
   - f1
   - auroc
   - average_precision
 ---
 # AI Detector LoRA (DeBERTa-v3-large)
-LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.3M English samples
 (`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model.
 - **Base model:** `microsoft/deberta-v3-large`
 - **Task:** Binary classification (AI vs Human)
 - **Head:** Single-logit + `BCEWithLogitsLoss`
 - **Adapter type:** LoRA (`peft`)
-- **Hardware:** H100 SXM, bf16, multi-GPU
-- **Final decision threshold:** **0.9284** (max-F1 on calibration set)
 ---
@@ -47,34 +63,35 @@ LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.3M Englis
 ---
-## Metrics (test set)
-Using threshold **0.9284**:
 | Metric                 | Value  |
 | ---------------------- | ------ |
-| AUROC                  | 0.9979 |
-| Average Precision (AP) | 0.9977 |
-| F1                     | 0.9773 |
-| Accuracy               | 0.9797 |
-| Precision              | 0.9909 |
-| Recall                 | 0.9640 |
-| Specificity            | 0.9927 |
 Confusion matrix (test):
-- **True Negatives (Human correctly)**: 123,936
-- **False Positives (Human → AI)**: 912
-- **False Negatives (AI → Human)**: 3,723
-- **True Positives (AI correctly)**: 99,816
 ### Calibration
 - **Method:** temperature scaling
-- **Temperature (T):** 1.2807
 - **Calibration set:** calibration
-- Test ECE: 0.0119 → 0.0159 (after calibration)
-- Test Brier: 0.01812 → 0.01829 (after calibration)
 ---
@@ -156,7 +173,7 @@ model.eval()
 ```python
 # load threshold
 with open("threshold.json") as f:
-    thr = json.load(f)["threshold"]  # 0.9284
 def predict_proba(texts):
     enc = tokenizer(
@@ -194,7 +211,7 @@ model = AutoModelForSequenceClassification.from_pretrained(model_dir)
 model.eval()
 with open("threshold.json") as f:
-    thr = json.load(f)["threshold"]  # 0.9284
 def predict_proba(texts):
     enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
@@ -209,7 +226,7 @@ def predict_proba(texts):
 ```python
 import json
 with open("calibration.json") as f:
-    T = json.load(f)["temperature"]  # e.g., 1.2807
 def predict_proba_calibrated(texts):
     enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
@@ -224,12 +241,17 @@ def predict_proba_calibrated(texts):
 ## Notes
 - Classifier head is **trainable** together with LoRA layers (unfrozen after applying PEFT).
-- Training used:
   - `bf16=True`
   - `optim="adamw_torch_fused"`
-  - cosine-with-restarts scheduler
-  - LR scaled down from HPO to account for full-dataset (~14k steps).
-- Threshold `0.9284` was chosen as the **max-F1** point on the calibration set.
   You can adjust it if you prefer fewer false positives or fewer false negatives.

   - f1
   - auroc
   - average_precision
+model-index:
+  - name: AI Detector LoRA (DeBERTa-v3-large)
+    results:
+      - task:
+          type: text-classification
+          name: AI Text Detection
+        dataset:
+          name: stealthcode/ai-detection
+          type: stealthcode/ai-detection
+        metrics:
+          - type: auroc
+            value: 0.9985
+          - type: f1
+            value: 0.9812
+          - type: accuracy
+            value: 0.9814
 ---
 # AI Detector LoRA (DeBERTa-v3-large)
+LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.7M English samples
 (`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model.
 - **Base model:** `microsoft/deberta-v3-large`
 - **Task:** Binary classification (AI vs Human)
 - **Head:** Single-logit + `BCEWithLogitsLoss`
 - **Adapter type:** LoRA (`peft`)
+- **Hardware:** 8 x RTX 5090, bf16, multi-GPU
+- **Final decision threshold:** **0.8697** (max-F1 on calibration set)
 ---
 ---
+## Metrics (test set, n=279,241)
+Using threshold **0.8697**:
 | Metric                 | Value  |
 | ---------------------- | ------ |
+| AUROC                  | 0.9985 |
+| Average Precision (AP) | 0.9985 |
+| F1                     | 0.9812 |
+| Accuracy               | 0.9814 |
+| Precision (AI)         | 0.9902 |
+| Recall (AI)            | 0.9724 |
+| Precision (Human)      | 0.9728 |
+| Recall (Human)         | 0.9904 |
 Confusion matrix (test):
+- **True Negatives (Human correctly)**: 138,276
+- **False Positives (Human → AI)**: 1,345
+- **False Negatives (AI → Human)**: 3,859
+- **True Positives (AI correctly)**: 135,761
 ### Calibration
 - **Method:** temperature scaling
+- **Temperature (T):** 1.4437
 - **Calibration set:** calibration
+- Test ECE: 0.0075 → 0.0116 (after calibration)
+- Test Brier: 0.0157 → 0.0156 (after calibration)
 ---
 ```python
 # load threshold
 with open("threshold.json") as f:
+    thr = json.load(f)["threshold"]  # 0.8697
 def predict_proba(texts):
     enc = tokenizer(
 model.eval()
 with open("threshold.json") as f:
+    thr = json.load(f)["threshold"]  # 0.8697
 def predict_proba(texts):
     enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
 ```python
 import json
 with open("calibration.json") as f:
+    T = json.load(f)["temperature"]  # e.g., 1.4437
 def predict_proba_calibrated(texts):
     enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
 ## Notes
 - Classifier head is **trainable** together with LoRA layers (unfrozen after applying PEFT).
+- **LoRA config:**
+  - `r=32`, `alpha=128`, `dropout=0.0`
+  - Target modules: `query_proj`, `key_proj`, `value_proj`
+- **Training config:**
   - `bf16=True`
   - `optim="adamw_torch_fused"`
+  - `lr_scheduler_type="cosine_with_restarts"`
+  - `num_train_epochs=2`
+  - `per_device_train_batch_size=8`, `gradient_accumulation_steps=4`
+  - `max_grad_norm=0.5`
+- Threshold `0.8697` was chosen as the **max-F1** point on the calibration set.
   You can adjust it if you prefer fewer false positives or fewer false negatives.

adapter/adapter_config.json CHANGED Viewed

@@ -34,9 +34,9 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "query_proj",
-    "value_proj",
-    "key_proj"
   ],
   "target_parameters": null,
   "task_type": "SEQ_CLS",

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "key_proj",
     "query_proj",
+    "value_proj"
   ],
   "target_parameters": null,
   "task_type": "SEQ_CLS",

adapter/adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2320198a394f889f1be50d439b5217d996e84da64a4a65671bf653607879cfcb
 size 23099012

 version https://git-lfs.github.com/spec/v1
+oid sha256:78566bec1ea60ab8451693e143cefe13dea659a75daf9c426bc8269c7dfce4b1
 size 23099012

calibration.json CHANGED Viewed

@@ -1,26 +1,26 @@
 {
-  "temperature": 1.2806789875030518,
   "method": "temperature_scaling",
   "calibration_set": "calibration",
   "calibration_metrics": {
-    "temperature": 1.2806789875030518,
     "optimization_method": "LBFGS_logspace",
-    "uncalibrated_nll": 0.06460460661246972,
-    "calibrated_nll": 0.06279846573841724,
-    "uncalibrated_ece": 0.012124871567496009,
-    "calibrated_ece": 0.016240862688628014,
-    "uncalibrated_brier": 0.01822748637167701,
-    "calibrated_brier": 0.018437309998858068,
-    "nll_improvement": 0.001806140874052481,
-    "ece_improvement": -0.004115991121132005,
-    "brier_improvement": -0.00020982362718105843
   },
   "test_metrics": {
-    "ece_before": 0.011862174308705089,
-    "ece_after": 0.015908939599173937,
-    "ece_improvement": -0.004046765290468848,
-    "brier_before": 0.01812282837704726,
-    "brier_after": 0.018294590049400802,
-    "brier_improvement": -0.0001717616723535438
   }
 }

 {
+  "temperature": 1.4436575174331665,
   "method": "temperature_scaling",
   "calibration_set": "calibration",
   "calibration_metrics": {
+    "temperature": 1.4436575174331665,
     "optimization_method": "LBFGS_logspace",
+    "uncalibrated_nll": 0.057230830731130305,
+    "calibrated_nll": 0.05340311260808736,
+    "uncalibrated_ece": 0.007595386161633095,
+    "calibrated_ece": 0.011707928851842823,
+    "uncalibrated_brier": 0.01589206575792085,
+    "calibrated_brier": 0.015775692446082124,
+    "nll_improvement": 0.0038277181230429447,
+    "ece_improvement": -0.004112542690209728,
+    "brier_improvement": 0.00011637331183872793
   },
   "test_metrics": {
+    "ece_before": 0.007462335961689493,
+    "ece_after": 0.011600581100766194,
+    "ece_improvement": -0.004138245139076701,
+    "brier_before": 0.015727129447539786,
+    "brier_after": 0.0156334356493489,
+    "brier_improvement": 9.369379819088725e-05
   }
 }

figures/fig_calibration_calib.png CHANGED Viewed

Git LFS Details

SHA256: fb9b407564ca2d3c6902bd1825774cbd4acce0e67d817d368966c7a92e5e91eb
Pointer size: 130 Bytes
Size of remote file: 66 kB

Git LFS Details

SHA256: 142d361cd8031012d33019ea55a697b913fa8056d17e1d5c284f6c51c301480e
Pointer size: 130 Bytes
Size of remote file: 62.9 kB

figures/fig_calibration_comparison_calib.png CHANGED Viewed

Git LFS Details

SHA256: f50100460613deceb44f375740f993cd85fb5caaf5d75cba20c5d4e45c4cf78d
Pointer size: 130 Bytes
Size of remote file: 98.3 kB

Git LFS Details

SHA256: f56b663df531d44a12bb46f9833c1867a2b44efa640857a87497b19fe4ebc6bf
Pointer size: 130 Bytes
Size of remote file: 99.9 kB

figures/fig_calibration_comparison_test.png CHANGED Viewed

Git LFS Details

SHA256: 31dfd6bb061a84a373dde4cde5b3831c2db14b8ba4bb36d2d1736eb4ea0e10ce
Pointer size: 131 Bytes
Size of remote file: 103 kB

Git LFS Details

SHA256: dfe36d2d8f69ddd1751e1b0e025096681be3491f3b8e27a0532906b019cb3edb
Pointer size: 130 Bytes
Size of remote file: 98.7 kB

figures/fig_calibration_test.png CHANGED Viewed

Git LFS Details

SHA256: 718c8f118f32a1c871ea563db30fc644b9cb11e53f3d83f0c21597e2fde7a144
Pointer size: 130 Bytes
Size of remote file: 66.6 kB

Git LFS Details

SHA256: a63bde212cbc7ab6daee18b2291d13b52d850cee7f7b52977e428c251e8f6293
Pointer size: 130 Bytes
Size of remote file: 64.4 kB

figures/fig_confusion_test.png CHANGED Viewed

Git LFS Details

SHA256: d5d34c728453c76fa608776e0cdcdd298a6513aef81b2602f453f5b5cb347dfd
Pointer size: 130 Bytes
Size of remote file: 46.4 kB

Git LFS Details

SHA256: affe9ca50c1602e93dae6c05652324b2c6378034f1b31b0597a94a9a5e67b38f
Pointer size: 130 Bytes
Size of remote file: 46.3 kB

figures/fig_eval_metrics.png CHANGED Viewed

Git LFS Details

SHA256: 3dbb5ef9aaebdc60bf5c866601ae15429d185cb012d4c6c52b4712e47c2ac24b
Pointer size: 130 Bytes
Size of remote file: 92.4 kB

Git LFS Details

SHA256: 4665e2821dd49090b311dc12781d147fc89a2c3144738029fad46ba751c06ad5
Pointer size: 130 Bytes
Size of remote file: 83.2 kB

figures/fig_learning_curves.png CHANGED Viewed

Git LFS Details

SHA256: a66e8baab1fa6cfaf7f14efc5574a2b4d0d23943d2cfa95a32dd30365e468422
Pointer size: 130 Bytes
Size of remote file: 58.1 kB

Git LFS Details

SHA256: 2dd2ab634527c303b7d967270119ed4d803bb3b9b96255ea65285bf66732c533
Pointer size: 130 Bytes
Size of remote file: 67.2 kB

figures/fig_pr_calib.png CHANGED Viewed

Git LFS Details

SHA256: f0ae2af980ed8b9c0a0e4b81e97d195a46055f7950e905f2c2c8b764f1ed258e
Pointer size: 130 Bytes
Size of remote file: 33.1 kB

Git LFS Details

SHA256: 04fc1d13aeb90bfa91576808eb9d45cf2e08b8dbb88631ccb097c48c05783c3c
Pointer size: 130 Bytes
Size of remote file: 32.9 kB

figures/fig_pr_test.png CHANGED Viewed

Git LFS Details

SHA256: c31095938ce085ffd8bdebf895dc51ca49fc4bd7fc316c26c29aea864164ac92
Pointer size: 130 Bytes
Size of remote file: 32.3 kB

Git LFS Details

SHA256: a76c8508d43e74c7f91485a243f180cc65c70357b07d8229c18560ab98de31bc
Pointer size: 130 Bytes
Size of remote file: 32.7 kB

figures/fig_roc_calib.png CHANGED Viewed

Git LFS Details

SHA256: 2930b3b1b03eb1c3e19e2b661326229f825646be4730ab0b1aaf6364e8ffa933
Pointer size: 130 Bytes
Size of remote file: 50 kB

Git LFS Details

SHA256: 3fdadf4d341190c18d24be63e074421148cec512c25ad97d7114a504d1f8bcda
Pointer size: 130 Bytes
Size of remote file: 50.1 kB

figures/fig_roc_test.png CHANGED Viewed

Git LFS Details

SHA256: f500a2984b3782377f03549aaf7dcc955924ef522e308f6f4913b6ee48381927
Pointer size: 130 Bytes
Size of remote file: 50 kB

Git LFS Details

SHA256: a62f79b1a7c283862e8b691cc1e3da63a099bec78fd95096774a2ef6d6519ed8
Pointer size: 130 Bytes
Size of remote file: 50 kB

figures/fig_threshold_f1_calib.png CHANGED Viewed

Git LFS Details

SHA256: 75c36a7436e399d9cb3aee204d94fca9a58d8a4b393070c511992e20d9504894
Pointer size: 130 Bytes
Size of remote file: 43.2 kB

Git LFS Details

SHA256: e60d9702c86bf5a9fb3e176f0fae528ab3495b1ad9467d28add8d9287cd9b7cf
Pointer size: 130 Bytes
Size of remote file: 42.4 kB

merged_model/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f780611b3aca302b2c564ebcb57391dcced23c391e62f96ee4509fa71076185f
 size 1740300340

 version https://git-lfs.github.com/spec/v1
+oid sha256:43dabe52c25c41912eef74ac37416aa2c7793f42fc8193906f6bc7153a877962
 size 1740300340

predictions_calib.csv CHANGED Viewed

The diff for this file is too large to render. See raw diff

predictions_test.csv CHANGED Viewed

The diff for this file is too large to render. See raw diff

results.json CHANGED Viewed

@@ -16,105 +16,105 @@
       "key_proj",
       "value_proj"
     ],
-    "learning_rate": 0.0001554357238163802,
     "lr_scheduler_type": "cosine_with_restarts",
     "max_grad_norm": 0.5,
     "optim": "adamw_torch_fused"
   },
   "threshold_optimization": {
     "max_f1": {
-      "threshold": 0.9284088015556335,
       "metrics": {
-        "threshold": 0.9284088015556335,
-        "auroc": 0.9978600960936826,
-        "average_precision": 0.997597673288253,
-        "f1": 0.9773827668313225,
-        "accuracy": 0.9797765846236365,
-        "precision": 0.9912838341196921,
-        "recall": 0.9638661853653825,
-        "specificity": 0.9929714251386692,
-        "precision_human": 0.9707053998766749,
-        "recall_human": 0.9929714251386692,
-        "precision_ai": 0.9912838341196921,
-        "recall_ai": 0.9638661853653825,
         "confusion_matrix": {
-          "true_negative": 99176,
-          "false_positive": 702,
-          "false_negative": 2993,
-          "true_positive": 79838
         }
       }
     },
     "precision_at_95recall": {
-      "threshold": 1.9947297005273867e-06,
       "metrics": {
-        "threshold": 1.9947297005273867e-06,
-        "auroc": 0.9978600960936826,
-        "average_precision": 0.997597673288253,
-        "f1": 0.6238683437523537,
-        "accuracy": 0.45334931503100556,
-        "precision": 0.45334931503100556,
         "recall": 1.0,
         "specificity": 0.0,
         "precision_human": 0.0,
         "recall_human": 0.0,
-        "precision_ai": 0.45334931503100556,
         "recall_ai": 1.0,
         "confusion_matrix": {
           "true_negative": 0,
-          "false_positive": 99878,
           "false_negative": 0,
-          "true_positive": 82831
         }
       }
     }
   },
   "calibration": {
-    "temperature": 1.2806789875030518,
     "method": "temperature_scaling",
     "calibration_set": "calibration",
     "calibration_metrics": {
-      "temperature": 1.2806789875030518,
       "optimization_method": "LBFGS_logspace",
-      "uncalibrated_nll": 0.06460460661246972,
-      "calibrated_nll": 0.06279846573841724,
-      "uncalibrated_ece": 0.012124871567496009,
-      "calibrated_ece": 0.016240862688628014,
-      "uncalibrated_brier": 0.01822748637167701,
-      "calibrated_brier": 0.018437309998858068,
-      "nll_improvement": 0.001806140874052481,
-      "ece_improvement": -0.004115991121132005,
-      "brier_improvement": -0.00020982362718105843
     },
     "test_metrics": {
-      "ece_before": 0.011862174308705089,
-      "ece_after": 0.015908939599173937,
-      "ece_improvement": -0.004046765290468848,
-      "brier_before": 0.01812282837704726,
-      "brier_after": 0.018294590049400802,
-      "brier_improvement": -0.0001717616723535438
     }
   },
   "test_metrics": {
-    "threshold": 0.9284088015556335,
-    "auroc": 0.997910815020985,
-    "average_precision": 0.9976513211537581,
-    "f1": 0.9773091101352641,
-    "accuracy": 0.9797054998752118,
-    "precision": 0.9909459137479152,
-    "recall": 0.9640425346970707,
-    "specificity": 0.9926951172625913,
-    "precision_human": 0.9708363687636594,
-    "recall_human": 0.9926951172625913,
-    "precision_ai": 0.9909459137479152,
-    "recall_ai": 0.9640425346970707,
     "confusion_matrix": {
-      "true_negative": 123936,
-      "false_positive": 912,
-      "false_negative": 3723,
-      "true_positive": 99816
     }
   },
-  "timestamp": "20251115_090814",
   "seed": 42
 }

       "key_proj",
       "value_proj"
     ],
+    "learning_rate": 0.00014057133690327707,
     "lr_scheduler_type": "cosine_with_restarts",
     "max_grad_norm": 0.5,
     "optim": "adamw_torch_fused"
   },
   "threshold_optimization": {
     "max_f1": {
+      "threshold": 0.869714617729187,
       "metrics": {
+        "threshold": 0.869714617729187,
+        "auroc": 0.9984783120401353,
+        "average_precision": 0.9985350724478098,
+        "f1": 0.9809629649707713,
+        "accuracy": 0.9811363829663419,
+        "precision": 0.9900603673104631,
+        "recall": 0.9720312276178198,
+        "specificity": 0.9902414567983026,
+        "precision_human": 0.9725316756205432,
+        "recall_human": 0.9902414567983026,
+        "precision_ai": 0.9900603673104631,
+        "recall_ai": 0.9720312276178198,
         "confusion_matrix": {
+          "true_negative": 110607,
+          "false_positive": 1090,
+          "false_negative": 3124,
+          "true_positive": 108572
         }
       }
     },
     "precision_at_95recall": {
+      "threshold": 3.6534821390432626e-08,
       "metrics": {
+        "threshold": 3.6534821390432626e-08,
+        "auroc": 0.9984783120401353,
+        "average_precision": 0.9985350724478098,
+        "f1": 0.6666646771454748,
+        "accuracy": 0.49999776179199884,
+        "precision": 0.49999776179199884,
         "recall": 1.0,
         "specificity": 0.0,
         "precision_human": 0.0,
         "recall_human": 0.0,
+        "precision_ai": 0.49999776179199884,
         "recall_ai": 1.0,
         "confusion_matrix": {
           "true_negative": 0,
+          "false_positive": 111697,
           "false_negative": 0,
+          "true_positive": 111696
         }
       }
     }
   },
   "calibration": {
+    "temperature": 1.4436575174331665,
     "method": "temperature_scaling",
     "calibration_set": "calibration",
     "calibration_metrics": {
+      "temperature": 1.4436575174331665,
       "optimization_method": "LBFGS_logspace",
+      "uncalibrated_nll": 0.057230830731130305,
+      "calibrated_nll": 0.05340311260808736,
+      "uncalibrated_ece": 0.007595386161633095,
+      "calibrated_ece": 0.011707928851842823,
+      "uncalibrated_brier": 0.01589206575792085,
+      "calibrated_brier": 0.015775692446082124,
+      "nll_improvement": 0.0038277181230429447,
+      "ece_improvement": -0.004112542690209728,
+      "brier_improvement": 0.00011637331183872793
     },
     "test_metrics": {
+      "ece_before": 0.007462335961689493,
+      "ece_after": 0.011600581100766194,
+      "ece_improvement": -0.004138245139076701,
+      "brier_before": 0.015727129447539786,
+      "brier_after": 0.0156334356493489,
+      "brier_improvement": 9.369379819088725e-05
     }
   },
   "test_metrics": {
+    "threshold": 0.869714617729187,
+    "auroc": 0.9984910666612247,
+    "average_precision": 0.9985476887515279,
+    "f1": 0.981194394455165,
+    "accuracy": 0.9813637682145531,
+    "precision": 0.9901900719151605,
+    "recall": 0.972360693310414,
+    "specificity": 0.9903667786364515,
+    "precision_human": 0.9728497555141239,
+    "recall_human": 0.9903667786364515,
+    "precision_ai": 0.9901900719151605,
+    "recall_ai": 0.972360693310414,
     "confusion_matrix": {
+      "true_negative": 138276,
+      "false_positive": 1345,
+      "false_negative": 3859,
+      "true_positive": 135761
     }
   },
+  "timestamp": "20251124_170935",
   "seed": 42
 }

threshold.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
-  "threshold": 0.9284088015556335,
   "method": "max_f1",
-  "calibration_f1": 0.9773827668313225,
   "alternative_thresholds": {
-    "max_f1": 0.9284088015556335,
-    "precision_at_95recall": 1.9947297005273867e-06
   }
 }

 {
+  "threshold": 0.869714617729187,
   "method": "max_f1",
+  "calibration_f1": 0.9809629649707713,
   "alternative_thresholds": {
+    "max_f1": 0.869714617729187,
+    "precision_at_95recall": 3.6534821390432626e-08
   }
 }

training_log_history.csv CHANGED Viewed

@@ -1,52 +1,32 @@
 loss,grad_norm,learning_rate,epoch,step,eval_loss,eval_auroc,eval_ap,eval_f1,eval_max_f1,eval_best_threshold,eval_accuracy,eval_precision_human,eval_recall_human,eval_precision_ai,eval_recall_ai,eval_runtime,eval_samples_per_second,eval_steps_per_second,train_runtime,train_samples_per_second,train_steps_per_second,total_flos,train_loss
-0.276,3.159646987915039,0.00013013829896707,0.07783918424534911,500,,,,,,,,,,,,,,,,,,,
-,,,0.07783918424534911,500,0.12149354815483093,0.9924457584277557,0.9916298325594288,0.9536510818288485,0.9557720332927327,0.6680145263671875,0.9579702783883426,0.9616493887295509,0.9614568231515375,0.9535359777528871,0.9537662136972542,251.8553,906.814,14.171,,,,,
-0.1082,0.4662734270095825,0.00015502115157402368,0.15567836849069822,1000,,,,,,,,,,,,,,,,,,,
-,,,0.15567836849069822,1000,0.11359784007072449,0.9941361028997843,0.9936079408729138,0.9563341131667457,0.964144751321268,0.8459424376487732,0.9600369549797273,0.9707662766667209,0.9556737446634681,0.9475350777398559,0.9652981002327625,252.3501,905.036,14.143,,,,,
-0.0901,0.56740403175354,0.00015336171321936976,0.23351755273604732,1500,,,,,,,,,,,,,,,,,,,
-,,,0.23351755273604732,1500,0.09792134165763855,0.995637180466362,0.9951736481792155,0.9632415978730987,0.9687276503605232,0.8856314420700073,0.9665828903698125,0.971504195528524,0.9672399016396068,0.9607059479089608,0.9657906682506109,252.1455,905.771,14.155,,,,,
-0.0828,0.6910482048988342,0.0001504606098364759,0.31135673698139643,2000,,,,,,,,,,,,,,,,,,,
-,,,0.31135673698139643,2000,0.07930342108011246,0.9965300160027056,0.996129497193991,0.9701504169589276,0.9709891509313444,0.6976089477539062,0.9731682327288013,0.968767028089932,0.9825946959077911,0.9786452037697653,0.9618018331256821,252.2542,905.38,14.148,,,,,
-0.0786,0.7298774719238281,0.00014636546193317465,0.38919592122674557,2500,,,,,,,,,,,,,,,,,,,
-,,,0.38919592122674557,2500,0.10587891191244125,0.9961595063461319,0.99574080839367,0.9633412681237827,0.9702281512250107,0.9585376977920532,0.9666222973387161,0.9727536699467656,0.965998382019592,0.9593414171599334,0.9673746124648683,252.014,906.243,14.162,,,,,
-0.075,0.5279271006584167,0.00014114348980363213,0.46703510547209465,3000,,,,,,,,,,,,,,,,,,,
-,,,0.46703510547209465,3000,0.08700015395879745,0.9963549115948438,0.9959508112193701,0.9664618832348054,0.9711828125530761,0.8791467547416687,0.9695953342148819,0.9720765232989261,0.9723101075716677,0.9666019379957298,0.9663218690541728,252.0033,906.282,14.163,,,,,
-0.0718,0.8794483542442322,0.00013488041013280436,0.5448742897174438,3500,,,,,,,,,,,,,,,,,,,
-,,,0.5448742897174438,3500,0.10059615969657898,0.9964150386194646,0.9960424913733611,0.9614705825931823,0.9711960816065123,0.954647421836853,0.9648227124254551,0.9732758550835028,0.9620655682555448,0.9548853558398507,0.9681472681791402,252.084,905.992,14.158,,,,,
-0.0695,0.40731295943260193,0.00012767902898967842,0.6227134739627929,4000,,,,,,,,,,,,,,,,,,,
-,,,0.6227134739627929,4000,0.10083704441785812,0.9961781307127794,0.995731264522598,0.9619906765054659,0.9692610702277147,0.9149009585380554,0.9653350030212009,0.9728877169710597,0.9634352447395612,0.9564208797922713,0.9676257255720067,252.2327,905.457,14.15,,,,,
-0.0669,0.264863520860672,0.00011965755430477945,0.7005526582081419,4500,,,,,,,,,,,,,,,,,,,
-,,,0.7005526582081419,4500,0.094916433095932,0.9967265965858536,0.9963598631028794,0.9650556403576777,0.972462604745369,0.9161096215248108,0.9682248474074593,0.9731915784051766,0.9685535094956227,0.962298576833695,0.9678285476970031,251.9541,906.459,14.165,,,,,
-0.0672,0.7680786848068237,0.00011094765553198254,0.7783918424534911,5000,,,,,,,,,,,,,,,,,,,
-,,,0.7783918424534911,5000,0.09323982149362564,0.9967850366539563,0.9963962011157831,0.9673139455667273,0.9710460087467818,0.8643104434013367,0.9703790950408518,0.972494317999936,0.9733433722876801,0.9678236488446292,0.9668047788755928,251.9538,906.46,14.165,,,,,
-0.0647,0.4233705997467041,0.0001016923023445425,0.8562310266988402,5500,,,,,,,,,,,,,,,,,,,
-,,,0.8562310266988402,5500,0.09126096963882446,0.9967433808226673,0.9963771581479701,0.9657864214107987,0.9726255234214075,0.9111796617507935,0.9688728731183173,0.9742069565497575,0.9687056957716245,0.9625206246882314,0.9690744550362665,252.0396,906.151,14.16,,,,,
-0.0627,0.25091952085494995,9.204341784232336e-05,0.9340702109441893,6000,,,,,,,,,,,,,,,,,,,
-,,,0.9340702109441893,6000,0.06995870172977448,0.9975342235086746,0.9972417957598282,0.973204830514238,0.9758824625579795,0.8918110132217407,0.9758479066142408,0.9732840473716358,0.9827949410077935,0.9790068315757582,0.9674711944291523,252.1038,905.921,14.157,,,,,
-0.0606,0.3678501546382904,8.215938479193825e-05,1.011831556005293,6500,,,,,,,,,,,,,,,,,,,
-,,,1.011831556005293,6500,0.07501858472824097,0.9975025002706578,0.997203865754845,0.971139887346844,0.9754012996088349,0.8902942538261414,0.9738863152732654,0.9744957013881682,0.9778208527237339,0.9731459660760525,0.9691420624112653,252.3023,905.208,14.146,,,,,
-0.0538,0.6192397475242615,7.220244583391773e-05,1.0896707402506423,7000,,,,,,,,,,,,,,,,,,,
-,,,1.0896707402506423,7000,0.09348879754543304,0.997100493345388,0.9967079581489795,0.9689502265506753,0.9725104874267961,0.9184802770614624,0.9718853169633865,0.97321942331053,0.9754099017197049,0.9702686474655716,0.967635383768435,252.0093,906.26,14.162,,,,,
-0.0536,0.6146565675735474,6.233604033151736e-05,1.1675099244959912,7500,,,,,,,,,,,,,,,,,,,
-,,,1.1675099244959912,7500,0.08755695074796677,0.9973925286764659,0.9970545509180517,0.9708156623418074,0.974796319089823,0.9324532747268677,0.9735754380741376,0.9747314921365554,0.9769878331077239,0.9721743341404359,0.9694607828934025,251.9972,906.304,14.163,,,,,
-0.0563,0.24420885741710663,5.272212157577683e-05,1.2453491087413404,8000,,,,,,,,,,,,,,,,,,,
-,,,1.2453491087413404,8000,0.07455883920192719,0.9975485047929438,0.9972585615075422,0.9726402255038238,0.9758848582753115,0.9136765599250793,0.9752655591848888,0.9750665582603982,0.9798072841157577,0.97550810243656,0.969789161571968,252.079,906.01,14.158,,,,,
-0.0521,0.4137882590293884,4.351849838388919e-05,1.3231882929866896,8500,,,,,,,,,,,,,,,,,,,
-,,,1.3231882929866896,8500,0.07783409208059311,0.9975517384131479,0.9972440455972116,0.9723962677736611,0.9749894769810089,0.9111796617507935,0.975077281444572,0.9738866219645087,0.9807043821637684,0.9765353333658013,0.9682921411255662,252.489,904.538,14.135,,,,,
-0.0526,0.3627403974533081,3.4876244727530656e-05,1.4010274772320386,9000,,,,,,,,,,,,,,,,,,,
-,,,1.4010274772320386,9000,0.0745643824338913,0.9975749102093014,0.9972690713005937,0.9723900247831475,0.9756176280729699,0.9046505093574524,0.9750247388193672,0.975305785387727,0.9791024213637493,0.9746829301427421,0.9701078820541053,252.119,905.866,14.156,,,,,
-0.052,0.7762022614479065,2.693721991111627e-05,1.4788666614773878,9500,,,,,,,,,,,,,,,,,,,
-,,,1.4788666614773878,9500,0.07452459633350372,0.9977328074904859,0.9974410894856215,0.9730508384452147,0.975759591492858,0.9124361872673035,0.9755895720403177,0.976881986981624,0.978501686063742,0.9740254712964038,0.9720781541254986,252.2439,905.417,14.149,,,,,
-0.0507,0.3820905387401581,1.9831740005311437e-05,1.5567058457227367,10000,,,,,,,,,,,,,,,,,,,
-,,,1.5567058457227367,10000,0.08428945392370224,0.9975299683849125,0.9972155352602694,0.9700319035460719,0.9748023112122028,0.9334307909011841,0.9728135700086695,0.9755487501803781,0.9746970291636964,0.9695218431614696,0.9705425008933831,252.2555,905.376,14.148,,,,,
-0.0515,0.36162489652633667,1.3676438758331925e-05,1.634545029968086,10500,,,,,,,,,,,,,,,,,,,
-,,,1.634545029968086,10500,0.07442453503608704,0.9977306493713021,0.997440190110606,0.9729125537103704,0.9761234031726127,0.9136765599250793,0.9754888653420087,0.9760021075993326,0.9792385880317509,0.9748654545454546,0.9709674615362327,252.2157,905.519,14.151,,,,,
-0.0512,0.5945746302604675,8.572353097359252e-06,1.7123842142134351,11000,,,,,,,,,,,,,,,,,,,
-,,,1.7123842142134351,11000,0.07524814456701279,0.9977685405428793,0.9974825248399177,0.9727850366057699,0.9761874492694766,0.9207897186279297,0.975357508778997,0.9763919857424856,0.978581784103743,0.9741039521978714,0.9714696877505095,252.2195,905.505,14.15,,,,,
-0.0509,0.9120739698410034,4.603264645836933e-06,1.7902233984587843,11500,,,,,,,,,,,,,,,,,,,
-,,,1.7902233984587843,11500,0.06851697713136673,0.9979216958335029,0.997654853113855,0.9747033543129303,0.9769461620177022,0.9124361872673035,0.9771395794838563,0.9764685264549843,0.9818417743317821,0.9779586201532299,0.9714696877505095,252.8119,903.383,14.117,,,,,
-0.048,0.6627203822135925,1.834324480010042e-06,1.8680625827041333,12000,,,,,,,,,,,,,,,,,,,
-,,,1.8680625827041333,12000,0.07031949609518051,0.9978716433195517,0.9975978286587303,0.9741437319971922,0.9765590576618682,0.9241418242454529,0.9766141532318093,0.9766512444160816,0.980664333143768,0.9765690214120707,0.9717304590540763,252.5244,904.411,14.133,,,,,
-0.0488,0.7994762659072876,3.1098369880601253e-07,1.9459017669494822,12500,,,,,,,,,,,,,,,,,,,
-,,,1.9459017669494822,12500,0.07445533573627472,0.9977876009119543,0.997504157410398,0.972836356080046,0.9761660160257996,0.929440438747406,0.9753881586436997,0.9769268924908395,0.9780771664517369,0.9735279325286289,0.9721457615004974,252.7495,903.606,14.121,,,,,
-,,,2.0,12848,,,,,,,,,,,,,,,15236.3158,215.85,0.843,3.0700924448014336e+18,0.07117979241486355

 loss,grad_norm,learning_rate,epoch,step,eval_loss,eval_auroc,eval_ap,eval_f1,eval_max_f1,eval_best_threshold,eval_accuracy,eval_precision_human,eval_recall_human,eval_precision_ai,eval_recall_ai,eval_runtime,eval_samples_per_second,eval_steps_per_second,train_runtime,train_samples_per_second,train_steps_per_second,total_flos,train_loss
+0.1959,1.1447575092315674,0.00014045785291760075,0.1273277096928219,1000,,,,,,,,,,,,,,,,,,,
+,,,0.1273277096928219,1000,0.10616814345121384,0.9950148191129459,0.9952899350299439,0.962311290288237,0.9679296498026972,0.8289388418197632,0.961968199398367,0.9705350238550314,0.9528649190660364,0.953707741871949,0.9710714797306976,302.0769,924.4,14.447,,,,,
+0.0793,0.9442410469055176,0.00013808916945651123,0.2546554193856438,2000,,,,,,,,,,,,,,,,,,,
+,,,0.2546554193856438,2000,0.06762869656085968,0.99699523578959,0.9971745448574031,0.9747652814641683,0.9751815105140932,0.6486889719963074,0.9749068901303538,0.9696360146472409,0.9805185503509526,0.9802974220045925,0.969295229909755,301.8486,925.1,14.458,,,,,
+0.066,1.0335582494735718,0.0001327492586182366,0.3819831290784657,3000,,,,,,,,,,,,,,,,,,,
+,,,0.3819831290784657,3000,0.10242880135774612,0.9966748220521451,0.9968401495248937,0.9646262449071978,0.9739805993788537,0.9433475732803345,0.96418134937688,0.9761586387280689,0.9516043546769803,0.9527918285219238,0.9767583440767799,301.8298,925.157,14.458,,,,,
+0.0604,0.49253857135772705,0.00012467212143575104,0.5093108387712876,4000,,,,,,,,,,,,,,,,,,,
+,,,0.5093108387712876,4000,0.07713142782449722,0.9974895074288549,0.9976087876172619,0.9726526500782271,0.9769228267815947,0.896251380443573,0.9725827245380319,0.9750118785365643,0.9700257842715944,0.970178288939245,0.9751396648044692,302.0624,924.445,14.447,,,,,
+0.0562,1.2046138048171997,0.00011421170734780347,0.6366385484641095,5000,,,,,,,,,,,,,,,,,,,
+,,,0.6366385484641095,5000,0.06674948334693909,0.9978946111553613,0.9979933640808432,0.9756314294006572,0.9780440997353064,0.8354835510253906,0.9756446067898582,0.9751307495832469,0.9761853602635725,0.9761595766801224,0.9751038533161438,301.8554,925.079,14.457,,,,,
+0.0536,0.8311429619789124,0.0001018264037275262,0.7639662581569314,6000,,,,,,,,,,,,,,,,,,,
+,,,0.7639662581569314,6000,0.07356549799442291,0.9977181868917908,0.9978107356852736,0.973830491983889,0.9777105038793896,0.8824278712272644,0.9738003151410972,0.9748955476747692,0.9726471852170177,0.9727101227651456,0.9749534450651769,301.9925,924.659,14.451,,,,,
+0.0519,0.4782758951187134,8.80589488206726e-05,0.8912939678497533,7000,,,,,,,,,,,,,,,,,,,
+,,,0.8912939678497533,7000,0.05816827714443207,0.9981127635745546,0.9981834468144306,0.9776137884992947,0.9801179559592376,0.7745833992958069,0.9776643747314139,0.9755153260939315,0.97992407964475,0.9798329364194289,0.9754046698180776,302.2733,923.8,14.437,,,,,
+0.0477,0.4090639054775238,7.351264833162605e-05,1.018589845615152,8000,,,,,,,,,,,,,,,,,,,
+,,,1.018589845615152,8000,0.08029133081436157,0.9978708245246101,0.9979566865663798,0.9732345002905806,0.9780694791225678,0.9196425676345825,0.9731163157140811,0.9773316857797336,0.9687007592035525,0.9689747467217595,0.9775318722246097,302.0748,924.407,14.447,,,,,
+0.0429,0.7672129273414612,5.882493787372914e-05,1.145917555307974,9000,,,,,,,,,,,,,,,,,,,
+,,,1.145917555307974,9000,0.08352840691804886,0.9977365428060092,0.9978319247026995,0.9719582883547271,0.9772799491574283,0.9416541457176208,0.9718127775390345,0.9767606806059158,0.9666236928806761,0.9669665199299633,0.9770018621973929,302.0739,924.41,14.447,,,,,
+0.0405,0.9896750450134277,4.4639449807758265e-05,1.2732452650007957,10000,,,,,,,,,,,,,,,,,,,
+,,,1.2732452650007957,10000,0.061068352311849594,0.9983874067313616,0.9984549393455515,0.9787944744951835,0.9803829377847685,0.8933094143867493,0.9788568972926515,0.97605417182894,0.9818006016330039,0.9816928197812649,0.9759131929522991,302.0004,924.635,14.45,,,,,
+0.0406,0.47202062606811523,3.157780853180043e-05,1.4005729746936177,11000,,,,,,,,,,,,,,,,,,,
+,,,1.4005729746936177,11000,0.0718986839056015,0.9981721218474578,0.9982508526859964,0.9758077493875945,0.9788950827489958,0.9407896995544434,0.9757735281478298,0.9771233614652541,0.9743589743589743,0.9744313109309717,0.9771880819366853,302.0521,924.476,14.448,,,,,
+0.0395,0.7460657358169556,2.0212390185360698e-05,1.5279006843864396,12000,,,,,,,,,,,,,,,,,,,
+,,,1.5279006843864396,12000,0.06611855328083038,0.9983199319059504,0.9983858111318533,0.9774215058543476,0.9800705008735375,0.9111796617507935,0.9774459246526286,0.9764154314546676,0.9785274316000573,0.9784808854562942,0.9763644177051999,302.0129,924.596,14.45,,,,,
+0.0394,0.165859654545784,1.1041240468788348e-05,1.6552283940792614,13000,,,,,,,,,,,,,,,,,,,
+,,,1.6552283940792614,13000,0.05736415088176727,0.998453987731572,0.998514713572729,0.9790977044095155,0.9808756501587189,0.8652240633964539,0.9791505514969202,0.9767398771432236,0.9816788425726973,0.9815857293001425,0.9766222604211431,301.9939,924.655,14.451,,,,,
+0.0386,0.3515833616256714,4.466249708014854e-06,1.7825561037720834,14000,,,,,,,,,,,,,,,,,,,
+,,,1.7825561037720834,14000,0.06121337413787842,0.998390859685223,0.9984525070219366,0.9779097761600413,0.9804425977774784,0.8887588381767273,0.9779186362985246,0.9775355680874818,0.9783197249677696,0.9783023195802392,0.9775175476292794,302.6176,922.749,14.421,,,,,
+0.0374,0.2531239092350006,7.75541558362136e-07,1.9098838134649054,15000,,,,,,,,,,,,,,,,,,,
+,,,1.9098838134649054,15000,0.06407459080219269,0.998362819330903,0.9984245391184183,0.9778709076581417,0.980121108766156,0.9059898257255554,0.977875662512534,0.9776703894616265,0.9780905314424867,0.9780811120664947,0.9776607935825813,302.0272,924.553,14.449,,,,,
+,,,2.0,15708,,,,,,,,,,,,,,,15515.8764,259.158,1.012,3.758608241022468e+18,0.05829760437555704