7bb2b7eec8155c11772ca5bd5212703b

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [fi-no] dataset. It achieves the following results on the evaluation set:

Loss: 2.8897
Data Size: 1.0
Epoch Runtime: 42.9410
Bleu: 0.3063

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	217.0048	0	3.4774	0.0032
No log	1	85	203.5395	0.0078	4.6229	0.0033
No log	2	170	180.0904	0.0156	5.6513	0.0034
No log	3	255	157.5710	0.0312	7.9227	0.0035
No log	4	340	120.4380	0.0625	10.2457	0.0031
9.7111	5	425	64.7623	0.125	13.1800	0.0031
9.7111	6	510	27.1282	0.25	17.7222	0.0082
13.3839	7	595	13.9549	0.5	26.4410	0.0139
19.0578	8.0	680	10.1067	1.0	43.9399	0.0169
14.0008	9.0	765	9.1114	1.0	42.6606	0.0201
12.0241	10.0	850	7.6959	1.0	41.8689	0.0206
11.2471	11.0	935	7.2508	1.0	42.4253	0.0270
10.277	12.0	1020	6.6612	1.0	41.9782	0.0294
9.3849	13.0	1105	6.1228	1.0	42.2462	0.0310
8.9436	14.0	1190	5.9232	1.0	42.1296	0.0332
8.355	15.0	1275	5.1790	1.0	42.2061	0.0625
7.854	16.0	1360	5.2055	1.0	42.0824	0.0583
7.5813	17.0	1445	5.1245	1.0	41.8080	0.0556
7.1551	18.0	1530	4.6861	1.0	42.1234	0.0616
6.7731	19.0	1615	4.7496	1.0	41.6617	0.0820
6.3516	20.0	1700	4.4754	1.0	42.6181	0.1157
6.269	21.0	1785	4.1332	1.0	42.2423	0.1222
5.9757	22.0	1870	4.1501	1.0	42.2093	0.0944
5.7066	23.0	1955	4.1085	1.0	42.1657	0.0834
5.5625	24.0	2040	3.8988	1.0	42.0781	0.1402
5.3586	25.0	2125	3.9551	1.0	42.2878	0.1207
5.2297	26.0	2210	3.9593	1.0	42.0483	0.1139
5.0567	27.0	2295	3.9023	1.0	42.3980	0.1030
4.9509	28.0	2380	3.5690	1.0	41.9953	0.1714
4.7542	29.0	2465	3.4864	1.0	41.3983	0.1692
4.5969	30.0	2550	3.4250	1.0	42.5190	0.1806
4.5217	31.0	2635	3.4395	1.0	41.9373	0.1787
4.4445	32.0	2720	3.3970	1.0	42.1893	0.2386
4.3017	33.0	2805	3.3271	1.0	42.1738	0.1921
4.2352	34.0	2890	3.2785	1.0	42.7255	0.1946
4.1246	35.0	2975	3.3240	1.0	41.9313	0.2306
4.0183	36.0	3060	3.2468	1.0	42.1121	0.1796
4.0072	37.0	3145	3.0986	1.0	42.3300	0.3186
3.8966	38.0	3230	3.0961	1.0	41.7341	0.2390
3.8021	39.0	3315	3.1354	1.0	42.1132	0.3648
3.7596	40.0	3400	3.0470	1.0	42.0877	0.2864
3.6984	41.0	3485	3.0078	1.0	41.7414	0.3351
3.5975	42.0	3570	3.0299	1.0	42.9425	0.2737
3.5246	43.0	3655	2.9564	1.0	41.7279	0.3126
3.4699	44.0	3740	2.9089	1.0	43.4539	0.3935
3.4073	45.0	3825	2.8996	1.0	42.2385	0.4156
3.3569	46.0	3910	2.9004	1.0	42.8962	0.3872
3.3143	47.0	3995	2.8684	1.0	42.3636	0.2784
3.2595	48.0	4080	2.8784	1.0	42.4358	0.4254
3.1805	49.0	4165	2.8394	1.0	41.8356	0.4363
3.1456	50.0	4250	2.8897	1.0	42.9410	0.3063

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/7bb2b7eec8155c11772ca5bd5212703b

Base model

google/long-t5-local-large

Finetuned

(38)

this model

contemmcm
/

7bb2b7eec8155c11772ca5bd5212703b

7bb2b7eec8155c11772ca5bd5212703b

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for contemmcm/7bb2b7eec8155c11772ca5bd5212703b

Evaluation results