hungnm commited on
Commit
143e205
·
verified ·
1 Parent(s): 59402c7

Model save

Browse files
Files changed (3) hide show
  1. README.md +16 -16
  2. model.safetensors +1 -1
  3. trainer_state.json +0 -0
README.md CHANGED
@@ -20,10 +20,10 @@ should probably proofread and complete it, then remove this comment. -->
20
 
21
  This model is a fine-tuned version of [jhu-clsp/mmBERT-small](https://huggingface.co/jhu-clsp/mmBERT-small) on an unknown dataset.
22
  It achieves the following results on the evaluation set:
23
- - Loss: nan
24
- - F1: 0.0
25
- - Precision: 0.0
26
- - Recall: 0.0
27
 
28
  ## Model description
29
 
@@ -43,14 +43,14 @@ More information needed
43
 
44
  The following hyperparameters were used during training:
45
  - learning_rate: 5e-05
46
- - train_batch_size: 64
47
- - eval_batch_size: 64
48
  - seed: 0
49
  - distributed_type: multi-GPU
50
  - num_devices: 2
51
- - gradient_accumulation_steps: 8
52
- - total_train_batch_size: 1024
53
- - total_eval_batch_size: 128
54
  - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
55
  - lr_scheduler_type: cosine
56
  - lr_scheduler_warmup_ratio: 0.01
@@ -58,13 +58,13 @@ The following hyperparameters were used during training:
58
 
59
  ### Training results
60
 
61
- | Training Loss | Epoch | Step | Validation Loss | F1 | Precision | Recall |
62
- |:-------------:|:-----:|:----:|:---------------:|:---:|:---------:|:------:|
63
- | 3407638.4 | 1.0 | 10 | nan | 0.0 | 0.0 | 0.0 |
64
- | 0.0 | 2.0 | 20 | nan | 0.0 | 0.0 | 0.0 |
65
- | 0.0 | 3.0 | 30 | nan | 0.0 | 0.0 | 0.0 |
66
- | 0.0 | 4.0 | 40 | nan | 0.0 | 0.0 | 0.0 |
67
- | 0.0 | 5.0 | 50 | nan | 0.0 | 0.0 | 0.0 |
68
 
69
 
70
  ### Framework versions
 
20
 
21
  This model is a fine-tuned version of [jhu-clsp/mmBERT-small](https://huggingface.co/jhu-clsp/mmBERT-small) on an unknown dataset.
22
  It achieves the following results on the evaluation set:
23
+ - Loss: 0.4605
24
+ - F1: 81.8318
25
+ - Precision: 81.8361
26
+ - Recall: 81.8321
27
 
28
  ## Model description
29
 
 
43
 
44
  The following hyperparameters were used during training:
45
  - learning_rate: 5e-05
46
+ - train_batch_size: 512
47
+ - eval_batch_size: 512
48
  - seed: 0
49
  - distributed_type: multi-GPU
50
  - num_devices: 2
51
+ - gradient_accumulation_steps: 2
52
+ - total_train_batch_size: 2048
53
+ - total_eval_batch_size: 1024
54
  - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
55
  - lr_scheduler_type: cosine
56
  - lr_scheduler_warmup_ratio: 0.01
 
58
 
59
  ### Training results
60
 
61
+ | Training Loss | Epoch | Step | Validation Loss | F1 | Precision | Recall |
62
+ |:-------------:|:-----:|:----:|:---------------:|:-------:|:---------:|:-------:|
63
+ | 1.8064 | 1.0 | 1537 | 0.4447 | 80.9603 | 80.9710 | 81.0960 |
64
+ | 1.6408 | 2.0 | 3074 | 0.4309 | 81.7277 | 81.8109 | 81.6765 |
65
+ | 1.4703 | 3.0 | 4611 | 0.4252 | 82.1472 | 82.1151 | 82.1871 |
66
+ | 1.3121 | 4.0 | 6148 | 0.4393 | 82.0532 | 82.0873 | 82.0275 |
67
+ | 1.1733 | 5.0 | 7685 | 0.4605 | 81.8318 | 81.8361 | 81.8321 |
68
 
69
 
70
  ### Framework versions
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e362ec095b69e519f61bcdfe1f9db7392b6a0d31141e289685625c34423710dd
3
  size 281299686
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97da6e07086e607edd10c3221c41f1dd983682ce4aeea5c70c4c849558e150b5
3
  size 281299686
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff