7bb2b7eec8155c11772ca5bd5212703b

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [fi-no] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.8897
  • Data Size: 1.0
  • Epoch Runtime: 42.9410
  • Bleu: 0.3063

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 217.0048 0 3.4774 0.0032
No log 1 85 203.5395 0.0078 4.6229 0.0033
No log 2 170 180.0904 0.0156 5.6513 0.0034
No log 3 255 157.5710 0.0312 7.9227 0.0035
No log 4 340 120.4380 0.0625 10.2457 0.0031
9.7111 5 425 64.7623 0.125 13.1800 0.0031
9.7111 6 510 27.1282 0.25 17.7222 0.0082
13.3839 7 595 13.9549 0.5 26.4410 0.0139
19.0578 8.0 680 10.1067 1.0 43.9399 0.0169
14.0008 9.0 765 9.1114 1.0 42.6606 0.0201
12.0241 10.0 850 7.6959 1.0 41.8689 0.0206
11.2471 11.0 935 7.2508 1.0 42.4253 0.0270
10.277 12.0 1020 6.6612 1.0 41.9782 0.0294
9.3849 13.0 1105 6.1228 1.0 42.2462 0.0310
8.9436 14.0 1190 5.9232 1.0 42.1296 0.0332
8.355 15.0 1275 5.1790 1.0 42.2061 0.0625
7.854 16.0 1360 5.2055 1.0 42.0824 0.0583
7.5813 17.0 1445 5.1245 1.0 41.8080 0.0556
7.1551 18.0 1530 4.6861 1.0 42.1234 0.0616
6.7731 19.0 1615 4.7496 1.0 41.6617 0.0820
6.3516 20.0 1700 4.4754 1.0 42.6181 0.1157
6.269 21.0 1785 4.1332 1.0 42.2423 0.1222
5.9757 22.0 1870 4.1501 1.0 42.2093 0.0944
5.7066 23.0 1955 4.1085 1.0 42.1657 0.0834
5.5625 24.0 2040 3.8988 1.0 42.0781 0.1402
5.3586 25.0 2125 3.9551 1.0 42.2878 0.1207
5.2297 26.0 2210 3.9593 1.0 42.0483 0.1139
5.0567 27.0 2295 3.9023 1.0 42.3980 0.1030
4.9509 28.0 2380 3.5690 1.0 41.9953 0.1714
4.7542 29.0 2465 3.4864 1.0 41.3983 0.1692
4.5969 30.0 2550 3.4250 1.0 42.5190 0.1806
4.5217 31.0 2635 3.4395 1.0 41.9373 0.1787
4.4445 32.0 2720 3.3970 1.0 42.1893 0.2386
4.3017 33.0 2805 3.3271 1.0 42.1738 0.1921
4.2352 34.0 2890 3.2785 1.0 42.7255 0.1946
4.1246 35.0 2975 3.3240 1.0 41.9313 0.2306
4.0183 36.0 3060 3.2468 1.0 42.1121 0.1796
4.0072 37.0 3145 3.0986 1.0 42.3300 0.3186
3.8966 38.0 3230 3.0961 1.0 41.7341 0.2390
3.8021 39.0 3315 3.1354 1.0 42.1132 0.3648
3.7596 40.0 3400 3.0470 1.0 42.0877 0.2864
3.6984 41.0 3485 3.0078 1.0 41.7414 0.3351
3.5975 42.0 3570 3.0299 1.0 42.9425 0.2737
3.5246 43.0 3655 2.9564 1.0 41.7279 0.3126
3.4699 44.0 3740 2.9089 1.0 43.4539 0.3935
3.4073 45.0 3825 2.8996 1.0 42.2385 0.4156
3.3569 46.0 3910 2.9004 1.0 42.8962 0.3872
3.3143 47.0 3995 2.8684 1.0 42.3636 0.2784
3.2595 48.0 4080 2.8784 1.0 42.4358 0.4254
3.1805 49.0 4165 2.8394 1.0 41.8356 0.4363
3.1456 50.0 4250 2.8897 1.0 42.9410 0.3063

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/7bb2b7eec8155c11772ca5bd5212703b

Finetuned
(38)
this model

Evaluation results