pythia-410m-deduped

This model is a fine-tuned version of EleutherAI/pythia-410m-deduped on the princeton-nlp/llama3-ultrafeedback dataset. It achieves the following results on the evaluation set:

Loss: 1.6928
Original Losses: 1.7344
Weight: 1.0
Abs Diff: 0.3008
Rewards/chosen: -5.4375
Rewards/rejected: -5.4688
Rewards/accuracies: 0.4758
Rewards/margins: 0.0228
Logps/rejected: -2.1875
Logps/chosen: -2.1719
Logits/rejected: 5.7188
Logits/chosen: 5.7188
All Logps 1: -811.2697
All Logps 1 Values: -811.2697
All Logps 2: 447.4254
All Logps 2 Values: 447.4254

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 8
total_train_batch_size: 128
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Original Losses	Weight	Abs Diff	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	All Logps 1	All Logps 1 Values	All Logps 2	All Logps 2 Values
1.9439	0.0427	20	1.7861	1.8125	1.0	0.3574	-4.9688	-5.0	0.4556	0.0187	-1.9922	-1.9844	5.1875	5.2188	-694.3344	-694.3344	447.4254	447.4254
1.8637	0.0855	40	1.7850	1.8125	1.0	0.3574	-4.9688	-4.9688	0.4637	0.0112	-1.9922	-1.9844	5.1875	5.25	-694.3014	-694.3014	447.4254	447.4254
1.8856	0.1282	60	1.7741	1.8125	1.0	0.3496	-4.9375	-4.9375	0.4435	-0.0004	-1.9766	-1.9766	5.2188	5.25	-695.6515	-695.6515	447.4254	447.4254
1.8193	0.1710	80	1.7628	1.8047	1.0	0.3477	-4.9375	-4.9375	0.4637	0.0016	-1.9844	-1.9766	5.3125	5.3438	-699.6716	-699.6716	447.4254	447.4254
1.8542	0.2137	100	1.7501	1.7891	1.0	0.3340	-4.9375	-4.9688	0.4758	0.0138	-1.9844	-1.9766	5.4062	5.4375	-707.3261	-707.3261	447.4254	447.4254
1.7907	0.2565	120	1.7458	1.7891	1.0	0.3301	-5.0	-4.9688	0.4315	-0.0052	-1.9922	-1.9922	5.4688	5.5	-714.8251	-714.8251	447.4254	447.4254
1.8332	0.2992	140	1.7375	1.7969	1.0	0.3281	-5.0312	-5.0	0.4637	-0.0200	-2.0	-2.0156	5.5312	5.5625	-723.8403	-723.8403	447.4254	447.4254
1.7599	0.3420	160	1.7328	1.7969	1.0	0.3301	-5.0938	-5.0625	0.4355	-0.0156	-2.0312	-2.0312	5.5625	5.5938	-734.5149	-734.5149	447.4254	447.4254
1.8462	0.3847	180	1.7246	1.7734	1.0	0.3184	-5.125	-5.125	0.4516	-0.0015	-2.0469	-2.0469	5.5625	5.5938	-745.0103	-745.0103	447.4254	447.4254
1.8253	0.4275	200	1.7154	1.7656	1.0	0.3145	-5.1562	-5.1875	0.4476	0.0043	-2.0625	-2.0625	5.5625	5.5938	-755.3181	-755.3181	447.4254	447.4254
1.8056	0.4702	220	1.7119	1.7734	1.0	0.3203	-5.2188	-5.2188	0.4476	0.0032	-2.0938	-2.0938	5.5938	5.625	-762.7902	-762.7902	447.4254	447.4254
1.7958	0.5130	240	1.7096	1.7734	1.0	0.3164	-5.25	-5.25	0.4556	-0.0002	-2.1094	-2.1094	5.5938	5.625	-770.9695	-770.9695	447.4254	447.4254
1.7141	0.5557	260	1.7073	1.7578	1.0	0.3086	-5.2812	-5.2812	0.4355	0.0052	-2.1094	-2.1094	5.625	5.625	-775.2407	-775.2407	447.4254	447.4254
1.7021	0.5985	280	1.7085	1.7656	1.0	0.3125	-5.2812	-5.2812	0.4597	-0.0014	-2.1094	-2.1094	5.625	5.6562	-778.4560	-778.4560	447.4254	447.4254
1.7788	0.6412	300	1.7020	1.7578	1.0	0.3066	-5.3125	-5.3125	0.4677	0.0104	-2.125	-2.125	5.6562	5.6875	-784.0049	-784.0049	447.4254	447.4254
1.679	0.6839	320	1.7053	1.7578	1.0	0.3105	-5.3438	-5.3438	0.4476	0.0002	-2.1406	-2.1406	5.6562	5.6875	-791.0703	-791.0703	447.4254	447.4254
1.751	0.7267	340	1.7006	1.7578	1.0	0.3105	-5.375	-5.4062	0.4919	0.0085	-2.1562	-2.1562	5.6562	5.6875	-797.0882	-797.0882	447.4254	447.4254
1.7191	0.7694	360	1.6990	1.7656	1.0	0.3086	-5.4375	-5.4062	0.4476	-0.0044	-2.1719	-2.1719	5.6875	5.6875	-803.0909	-803.0909	447.4254	447.4254
1.7226	0.8122	380	1.6993	1.7578	1.0	0.3086	-5.4375	-5.4375	0.4758	0.0093	-2.1719	-2.1719	5.6875	5.7188	-806.9357	-806.9357	447.4254	447.4254
1.7198	0.8549	400	1.6968	1.7578	1.0	0.3066	-5.4688	-5.4688	0.4556	0.0020	-2.1875	-2.1875	5.6875	5.7188	-810.5368	-810.5368	447.4254	447.4254
1.7057	0.8977	420	1.6963	1.75	1.0	0.3047	-5.4688	-5.4688	0.4718	0.0151	-2.1875	-2.1875	5.6875	5.7188	-811.7772	-811.7772	447.4254	447.4254
1.75	0.9404	440	1.6973	1.7578	1.0	0.3086	-5.4688	-5.4688	0.4677	0.0077	-2.1875	-2.1875	5.6875	5.7188	-811.8970	-811.8970	447.4254	447.4254
1.6912	0.9832	460	1.6928	1.7344	1.0	0.3008	-5.4375	-5.4688	0.4758	0.0228	-2.1875	-2.1719	5.7188	5.7188	-811.2697	-811.2697	447.4254	447.4254

Framework versions

Transformers 4.42.3
Pytorch 2.2.2+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RAY2L
/

pythia-410m-deduped-SimPOW-1

pythia-410m-deduped

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RAY2L/pythia-410m-deduped-SimPOW-1

Dataset used to train RAY2L/pythia-410m-deduped-SimPOW-1

Evaluation results