dewata_bert_gelu

This model is a fine-tuned version of pijarcandra22/dewata_bert_gelu on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3142

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 80
  • eval_batch_size: 80
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
3.0632 1.0 2153 2.7589
2.6534 2.0 4306 2.5378
2.4108 3.0 6459 2.4042
2.2169 4.0 8612 2.2892
2.0708 5.0 10765 2.2492
1.9505 6.0 12918 2.1956
1.8635 7.0 15071 2.1887
1.7806 8.0 17224 2.1658
1.7143 9.0 19377 2.1419
1.6472 10.0 21530 2.1155
1.5848 11.0 23683 2.1398
1.5304 12.0 25836 2.1151
1.4656 13.0 27989 2.1178
1.4215 14.0 30142 2.1389
1.375 15.0 32295 2.1276
1.3319 16.0 34448 2.1258
1.2921 17.0 36601 2.1140
1.2578 18.0 38754 2.1110
1.2087 19.0 40907 2.1127
1.17 20.0 43060 2.1248
1.1395 21.0 45213 2.1059
1.1081 22.0 47366 2.0860
1.0752 23.0 49519 2.1249
1.0412 24.0 51672 2.1316
1.0119 25.0 53825 2.1182
0.9852 26.0 55978 2.1331
0.9573 27.0 58131 2.1385
0.9291 28.0 60284 2.1583
0.9052 29.0 62437 2.1286
0.8812 30.0 64590 2.1635
0.8592 31.0 66743 2.1526
0.8311 32.0 68896 2.1488
0.8095 33.0 71049 2.1478
0.794 34.0 73202 2.1496
0.7727 35.0 75355 2.1671
0.7533 36.0 77508 2.1542
0.7335 37.0 79661 2.1502
0.714 38.0 81814 2.1448
0.6985 39.0 83967 2.1553
0.684 40.0 86120 2.1365
0.6648 41.0 88273 2.1962
0.652 42.0 90426 2.1734
0.6348 43.0 92579 2.1742
0.6184 44.0 94732 2.1718
0.6066 45.0 96885 2.1597
0.5907 46.0 99038 2.1823
0.5777 47.0 101191 2.1935
0.7347 48.0 103344 0.6094
0.7113 49.0 105497 0.6030
0.6928 50.0 107650 0.6279
0.6682 51.0 109803 0.6459
0.6589 52.0 111956 0.6777
0.639 53.0 114109 0.7150
0.6256 54.0 116262 0.7474
0.6079 55.0 118415 0.7618
0.5924 56.0 120568 0.7921
0.5813 57.0 122721 0.8130
0.5693 58.0 124874 0.8569
0.557 59.0 127027 0.8834
0.5445 60.0 129180 0.8872
0.5325 61.0 131333 0.8912
0.519 62.0 133486 0.9274
0.5115 63.0 135639 0.9433
0.5035 64.0 137792 0.9394
0.4905 65.0 139945 0.9600
0.4867 66.0 142098 0.9607
0.4712 67.0 144251 0.9685
0.4671 68.0 146404 0.9959
0.4574 69.0 148557 0.9948
0.4467 70.0 150710 1.0065
0.4391 71.0 152863 1.0075
0.4325 72.0 155016 1.0138
0.4233 73.0 157169 1.0218
0.4159 74.0 159322 1.0246
0.4127 75.0 161475 1.0275
0.4059 76.0 163628 1.0405
0.3993 77.0 165781 1.0482
0.3912 78.0 167934 1.0344
0.3858 79.0 170087 1.0452
0.3812 80.0 172240 1.0316
0.3778 81.0 174393 1.0595
0.3719 82.0 176546 1.0594
0.3658 83.0 178699 1.0671
0.3641 84.0 180852 1.0492
0.3574 85.0 183005 1.0752
0.3502 86.0 185158 1.0538
0.3491 87.0 187311 1.0669
0.3428 88.0 189464 1.0670
0.3409 89.0 191617 1.0697
0.3381 90.0 193770 1.0716
0.3344 91.0 195923 1.0750
0.4222 92.0 198076 0.3014
0.4101 93.0 200229 0.3137
0.4044 94.0 202382 0.3089
0.4022 95.0 204535 0.2992
0.3985 96.0 206688 0.3129
0.3979 97.0 208841 0.3149
0.3938 98.0 210994 0.3168
0.3923 99.0 213147 0.3147
0.3935 100.0 215300 0.3142

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.9.0+cu126
  • Datasets 4.0.0
  • Tokenizers 0.22.1
Downloads last month
32
Safetensors
Model size
0.4B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for pijarcandra22/dewata_bert_gelu

Unable to build the model tree, the base model loops to the model itself. Learn more.

Evaluation results