End of training

a4e7439 verified 3 months ago

3.94 kB

	---
	base_model: unsloth/mistral-7b-v0.3-bnb-4bit
	library_name: peft
	license: apache-2.0
	tags:
	- unsloth
	- generated_from_trainer
	model-index:
	- name: Mistral-7B-v0.3_pct_reverse
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Mistral-7B-v0.3_pct_reverse

	This model is a fine-tuned version of [unsloth/mistral-7b-v0.3-bnb-4bit](https://huggingface.co/unsloth/mistral-7b-v0.3-bnb-4bit) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 6.8605

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0003
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.02
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 2.1177 \| 0.0206 \| 8 \| 2.6702 \|
	\| 8.9887 \| 0.0413 \| 16 \| 9.0083 \|
	\| 7.777 \| 0.0619 \| 24 \| 7.6913 \|
	\| 7.6327 \| 0.0825 \| 32 \| 7.6181 \|
	\| 7.6585 \| 0.1032 \| 40 \| 7.6409 \|
	\| 7.6813 \| 0.1238 \| 48 \| 7.5593 \|
	\| 7.6016 \| 0.1444 \| 56 \| 7.5868 \|
	\| 7.5595 \| 0.1651 \| 64 \| 7.5960 \|
	\| 7.7069 \| 0.1857 \| 72 \| 7.5984 \|
	\| 7.6285 \| 0.2063 \| 80 \| 7.4589 \|
	\| 7.5374 \| 0.2270 \| 88 \| 7.4251 \|
	\| 7.4161 \| 0.2476 \| 96 \| 7.3111 \|
	\| 7.3713 \| 0.2682 \| 104 \| 7.2864 \|
	\| 7.2921 \| 0.2888 \| 112 \| 7.2224 \|
	\| 7.2529 \| 0.3095 \| 120 \| 7.1938 \|
	\| 7.3559 \| 0.3301 \| 128 \| 7.1139 \|
	\| 7.1657 \| 0.3507 \| 136 \| 7.0930 \|
	\| 7.066 \| 0.3714 \| 144 \| 7.0315 \|
	\| 7.1481 \| 0.3920 \| 152 \| 7.0332 \|
	\| 7.0394 \| 0.4126 \| 160 \| 7.0583 \|
	\| 7.0685 \| 0.4333 \| 168 \| 7.0682 \|
	\| 6.9791 \| 0.4539 \| 176 \| 6.9472 \|
	\| 7.1428 \| 0.4745 \| 184 \| 7.0126 \|
	\| 7.1661 \| 0.4952 \| 192 \| 6.9513 \|
	\| 6.9757 \| 0.5158 \| 200 \| 7.0717 \|
	\| 6.9685 \| 0.5364 \| 208 \| 6.9399 \|
	\| 7.0811 \| 0.5571 \| 216 \| 6.8879 \|
	\| 7.0126 \| 0.5777 \| 224 \| 6.9264 \|
	\| 6.9712 \| 0.5983 \| 232 \| 6.8394 \|
	\| 6.9533 \| 0.6190 \| 240 \| 6.9073 \|
	\| 6.9744 \| 0.6396 \| 248 \| 6.9239 \|
	\| 7.1531 \| 0.6602 \| 256 \| 6.9109 \|
	\| 6.9527 \| 0.6809 \| 264 \| 6.8941 \|
	\| 7.1027 \| 0.7015 \| 272 \| 6.9498 \|
	\| 7.1718 \| 0.7221 \| 280 \| 6.9495 \|
	\| 7.0877 \| 0.7427 \| 288 \| 6.9761 \|
	\| 6.9879 \| 0.7634 \| 296 \| 6.9905 \|
	\| 6.9813 \| 0.7840 \| 304 \| 6.9238 \|
	\| 7.0798 \| 0.8046 \| 312 \| 6.8707 \|
	\| 7.0531 \| 0.8253 \| 320 \| 6.8658 \|
	\| 7.0518 \| 0.8459 \| 328 \| 6.8576 \|
	\| 7.127 \| 0.8665 \| 336 \| 6.9017 \|
	\| 6.9259 \| 0.8872 \| 344 \| 6.8581 \|
	\| 6.9477 \| 0.9078 \| 352 \| 6.8727 \|
	\| 7.0367 \| 0.9284 \| 360 \| 6.8629 \|
	\| 6.9114 \| 0.9491 \| 368 \| 6.8469 \|
	\| 7.0537 \| 0.9697 \| 376 \| 6.8627 \|
	\| 6.9656 \| 0.9903 \| 384 \| 6.8605 \|


	### Framework versions

	- PEFT 0.12.0
	- Transformers 4.44.0
	- Pytorch 2.4.0+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1

	---
	base_model: unsloth/mistral-7b-v0.3-bnb-4bit
	library_name: peft
	license: apache-2.0
	tags:
	- unsloth
	- generated_from_trainer
	model-index:
	- name: Mistral-7B-v0.3_pct_reverse
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Mistral-7B-v0.3_pct_reverse

	This model is a fine-tuned version of [unsloth/mistral-7b-v0.3-bnb-4bit](https://huggingface.co/unsloth/mistral-7b-v0.3-bnb-4bit) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 6.8605

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0003
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.02
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 2.1177 \| 0.0206 \| 8 \| 2.6702 \|
	\| 8.9887 \| 0.0413 \| 16 \| 9.0083 \|
	\| 7.777 \| 0.0619 \| 24 \| 7.6913 \|
	\| 7.6327 \| 0.0825 \| 32 \| 7.6181 \|
	\| 7.6585 \| 0.1032 \| 40 \| 7.6409 \|
	\| 7.6813 \| 0.1238 \| 48 \| 7.5593 \|
	\| 7.6016 \| 0.1444 \| 56 \| 7.5868 \|
	\| 7.5595 \| 0.1651 \| 64 \| 7.5960 \|
	\| 7.7069 \| 0.1857 \| 72 \| 7.5984 \|
	\| 7.6285 \| 0.2063 \| 80 \| 7.4589 \|
	\| 7.5374 \| 0.2270 \| 88 \| 7.4251 \|
	\| 7.4161 \| 0.2476 \| 96 \| 7.3111 \|
	\| 7.3713 \| 0.2682 \| 104 \| 7.2864 \|
	\| 7.2921 \| 0.2888 \| 112 \| 7.2224 \|
	\| 7.2529 \| 0.3095 \| 120 \| 7.1938 \|
	\| 7.3559 \| 0.3301 \| 128 \| 7.1139 \|
	\| 7.1657 \| 0.3507 \| 136 \| 7.0930 \|
	\| 7.066 \| 0.3714 \| 144 \| 7.0315 \|
	\| 7.1481 \| 0.3920 \| 152 \| 7.0332 \|
	\| 7.0394 \| 0.4126 \| 160 \| 7.0583 \|
	\| 7.0685 \| 0.4333 \| 168 \| 7.0682 \|
	\| 6.9791 \| 0.4539 \| 176 \| 6.9472 \|
	\| 7.1428 \| 0.4745 \| 184 \| 7.0126 \|
	\| 7.1661 \| 0.4952 \| 192 \| 6.9513 \|
	\| 6.9757 \| 0.5158 \| 200 \| 7.0717 \|
	\| 6.9685 \| 0.5364 \| 208 \| 6.9399 \|
	\| 7.0811 \| 0.5571 \| 216 \| 6.8879 \|
	\| 7.0126 \| 0.5777 \| 224 \| 6.9264 \|
	\| 6.9712 \| 0.5983 \| 232 \| 6.8394 \|
	\| 6.9533 \| 0.6190 \| 240 \| 6.9073 \|
	\| 6.9744 \| 0.6396 \| 248 \| 6.9239 \|
	\| 7.1531 \| 0.6602 \| 256 \| 6.9109 \|
	\| 6.9527 \| 0.6809 \| 264 \| 6.8941 \|
	\| 7.1027 \| 0.7015 \| 272 \| 6.9498 \|
	\| 7.1718 \| 0.7221 \| 280 \| 6.9495 \|
	\| 7.0877 \| 0.7427 \| 288 \| 6.9761 \|
	\| 6.9879 \| 0.7634 \| 296 \| 6.9905 \|
	\| 6.9813 \| 0.7840 \| 304 \| 6.9238 \|
	\| 7.0798 \| 0.8046 \| 312 \| 6.8707 \|
	\| 7.0531 \| 0.8253 \| 320 \| 6.8658 \|
	\| 7.0518 \| 0.8459 \| 328 \| 6.8576 \|
	\| 7.127 \| 0.8665 \| 336 \| 6.9017 \|
	\| 6.9259 \| 0.8872 \| 344 \| 6.8581 \|
	\| 6.9477 \| 0.9078 \| 352 \| 6.8727 \|
	\| 7.0367 \| 0.9284 \| 360 \| 6.8629 \|
	\| 6.9114 \| 0.9491 \| 368 \| 6.8469 \|
	\| 7.0537 \| 0.9697 \| 376 \| 6.8627 \|
	\| 6.9656 \| 0.9903 \| 384 \| 6.8605 \|


	### Framework versions

	- PEFT 0.12.0
	- Transformers 4.44.0
	- Pytorch 2.4.0+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1