german-jeopardy-mt5-base

This model is a fine-tuned version of google/mt5-base on the lmqg/qg_dequad dataset. It achieves the following results on the evaluation set:

Loss: 1.66
Brevity Penalty: 0.9025
System Length: 18860
Reference Length: 20793
ROUGE-1: 40.62
ROUGE-2: 21.49
ROUGE-L: 39.14
ROUGE-Lsum: 39.13
Exact Match: 2.72
BLEU: 14.56
F1: 39.53

Model description

See google/mt5-base for the model architecture.
The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM.

Intended uses & limitations

This model can be used for question generation on German text.

Training and evaluation data

See lmqg/qg_dequad.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 4
seed: 7
gradient_accumulation_steps: 16
total_train_batch_size: 64
optimizer: Adafactor
lr_scheduler_type: constant
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Counts 1	Counts 2	Counts 3	Counts 4	Totals 1	Totals 2	Totals 3	Totals 4	Precisions 1	Precisions 2	Precisions 3	Precisions 4	Brevity Penalty	System Length	Reference Length	ROUGE-1	ROUGE-2	ROUGE-L	ROUGE-Lsum	Exact Match	BLEU	Mean Generated Length	F1
5.5131	1.0	145	1.8698	6032	1668	626	216	16023	13819	11615	9411	37.6459	12.0703	5.3896	2.2952	0.7216	16023	21250	0.2485	0.1011	0.2368	0.2366	0.0018	6.2485	12.6166	0.2406
2.3946	2.0	291	1.5888	7325	2554	1178	558	16853	14649	12445	10241	43.4641	17.4346	9.4656	5.4487	0.7704	16853	21250	0.3226	0.1585	0.31	0.31	0.0145	10.8315	12.2582	0.3148
2.0101	3.0	436	1.4997	7623	2764	1304	629	17042	14838	12634	10430	44.7307	18.6278	10.3214	6.0307	0.7812	17042	21250	0.3403	0.1723	0.3263	0.3263	0.0154	11.7891	12.6783	0.3315
1.8073	4.0	582	1.4610	7728	2916	1415	707	16654	14450	12246	10042	46.4033	20.1799	11.5548	7.0404	0.7588	16654	21250	0.3461	0.1818	0.3324	0.3326	0.0168	12.6068	12.2963	0.3387
1.6851	4.99	727	1.4357	7964	3059	1483	727	17381	15177	12973	10769	45.8201	20.1555	11.4314	6.7509	0.8004	17381	21250	0.3558	0.1888	0.3415	0.3414	0.0159	13.0784	12.7436	0.3483
1.5642	6.0	873	1.4003	8299	3224	1592	788	17351	15147	12943	10739	47.8301	21.2847	12.3001	7.3377	0.7987	17351	21250	0.3814	0.2025	0.3684	0.3685	0.0204	13.9065	12.9569	0.3736
1.4756	6.99	1018	1.3779	8640	3430	1712	879	17669	15465	13261	11057	48.8992	22.1791	12.91	7.9497	0.8165	17669	21250	0.3971	0.2133	0.3828	0.3826	0.025	14.9146	13.1084	0.3892
1.3792	8.0	1164	1.3624	8732	3417	1712	871	17996	15792	13588	11384	48.5219	21.6375	12.5994	7.6511	0.8346	17996	21250	0.4003	0.2131	0.3852	0.3849	0.0245	14.8859	13.3748	0.3917
1.3133	9.0	1310	1.3630	8804	3500	1754	920	17661	15457	13253	11049	49.85	22.6435	13.2347	8.3265	0.8161	17661	21250	0.4078	0.219	0.3932	0.3935	0.025	15.3264	13.2019	0.4
1.261	10.0	1455	1.3685	8910	3602	1849	1000	17709	15505	13301	11097	50.3134	23.2312	13.9012	9.0114	0.8188	17709	21250	0.4135	0.223	0.3991	0.3992	0.0295	16.0163	13.1892	0.4055
1.1897	11.0	1601	1.3639	9096	3690	1902	1012	18261	16057	13853	11649	49.8111	22.9806	13.7299	8.6874	0.849	18261	21250	0.4201	0.2289	0.4059	0.4057	0.0281	16.3202	13.5077	0.4121
1.1453	11.99	1746	1.3610	9106	3735	1932	1023	18329	16125	13921	11717	49.6808	23.1628	13.8783	8.7309	0.8527	18329	21250	0.4173	0.2303	0.4026	0.4025	0.0281	16.4772	13.8013	0.4099
1.0858	13.0	1892	1.3716	9245	3778	1955	1049	18556	16352	14148	11944	49.8222	23.1042	13.8182	8.7827	0.8649	18556	21250	0.4244	0.2327	0.409	0.409	0.0322	16.7204	13.8144	0.417
1.0472	13.99	2037	1.3770	9166	3756	1946	1054	18315	16111	13907	11703	50.0464	23.3133	13.993	9.0062	0.8519	18315	21250	0.4216	0.2311	0.4068	0.4067	0.0309	16.6825	13.8099	0.4143
0.9953	15.0	2183	1.3881	9342	3926	2046	1108	18132	15928	13724	11520	51.5222	24.6484	14.9082	9.6181	0.842	18132	21250	0.4328	0.2418	0.4171	0.4171	0.0327	17.3937	13.5023	0.4258
0.9509	16.0	2329	1.4016	9330	3894	2024	1084	18672	16468	14264	12060	49.9679	23.6459	14.1896	8.9884	0.871	18672	21250	0.4269	0.237	0.4123	0.4122	0.0313	17.1618	13.956	0.4198
0.9183	17.0	2474	1.4152	9303	3824	1979	1084	18476	16272	14068	11864	50.3518	23.5005	14.0674	9.1369	0.8606	18476	21250	0.4269	0.2345	0.4121	0.4122	0.0327	16.995	13.7854	0.4199
0.8696	18.0	2620	1.4404	9184	3798	1993	1085	18379	16175	13971	11767	49.9701	23.4807	14.2653	9.2207	0.8554	18379	21250	0.4218	0.2333	0.4076	0.4074	0.034	16.9541	13.726	0.4148
0.8389	19.0	2765	1.4360	9476	4000	2092	1139	19003	16799	14595	12391	49.8658	23.8109	14.3337	9.1922	0.8885	19003	21250	0.4307	0.2406	0.4161	0.416	0.0299	17.67	14.2064	0.4239
0.7993	19.92	2900	1.4545	9464	3970	2078	1126	18741	16537	14333	12129	50.4989	24.0068	14.498	9.2835	0.8747	18741	21250	0.4349	0.2424	0.4194	0.4192	0.0327	17.5799	13.9959	0.4269

Framework versions

Transformers 4.32.1
Pytorch 2.1.0
Datasets 2.12.0
Tokenizers 0.13.3

GiantTreeG
/

german-jeopardy-mt5-base