Francois2511
/

cuad_qa_model

Question Answering

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

cuad_qa_model / README.md

Francois2511's picture

End of training

4abdc15 verified 14 days ago

|

3.76 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: answerdotai/ModernBERT-base
	tags:
	- generated_from_trainer
	datasets:
	- cuad-qa
	model-index:
	- name: cuad_qa_model
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# cuad_qa_model

	This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the cuad-qa dataset.
	It achieves the following results on the evaluation set:
	- Loss: 56.3253
	- Jaccard: 0.1325

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 3
	- eval_batch_size: 4
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 12
	- optimizer: Use adamw_torch with betas=(0.9,0.98) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 4

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Jaccard \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:-------:\|
	\| 2931.2947 \| 0.1075 \| 100 \| 125.3868 \| 0.0261 \|
	\| 114.0476 \| 0.2149 \| 200 \| 98.0385 \| 0.0225 \|
	\| 92.3046 \| 0.3224 \| 300 \| 86.1094 \| 0.0279 \|
	\| 83.4547 \| 0.4299 \| 400 \| 80.0709 \| 0.0403 \|
	\| 80.4591 \| 0.5373 \| 500 \| 75.2658 \| 0.0433 \|
	\| 76.238 \| 0.6448 \| 600 \| 71.9617 \| 0.0445 \|
	\| 73.2576 \| 0.7523 \| 700 \| 68.1718 \| 0.0463 \|
	\| 70.5061 \| 0.8598 \| 800 \| 64.2118 \| 0.0536 \|
	\| 72.0594 \| 0.9672 \| 900 \| 82.5902 \| 0.0243 \|
	\| 65.2249 \| 1.0742 \| 1000 \| 59.8434 \| 0.0647 \|
	\| 63.2437 \| 1.1816 \| 1100 \| 60.3719 \| 0.0932 \|
	\| 67.1502 \| 1.2891 \| 1200 \| 63.5264 \| 0.1114 \|
	\| 65.1003 \| 1.3966 \| 1300 \| 60.7845 \| 0.1243 \|
	\| 64.7538 \| 1.5040 \| 1400 \| 66.3558 \| 0.1200 \|
	\| 66.7688 \| 1.6115 \| 1500 \| 69.2212 \| 0.1149 \|
	\| 76.4721 \| 1.7190 \| 1600 \| 69.5449 \| 0.1458 \|
	\| 82.2733 \| 1.8264 \| 1700 \| 82.1182 \| 0.0449 \|
	\| 78.7475 \| 1.9339 \| 1800 \| 62.4942 \| 0.1581 \|
	\| 69.5967 \| 2.0408 \| 1900 \| 63.3104 \| 0.1507 \|
	\| 67.6753 \| 2.1483 \| 2000 \| 56.4553 \| 0.2238 \|
	\| 64.0365 \| 2.2558 \| 2100 \| 60.3552 \| 0.1978 \|
	\| 62.561 \| 2.3632 \| 2200 \| 55.5222 \| 0.2238 \|
	\| 62.0848 \| 2.4707 \| 2300 \| 51.5148 \| 0.2239 \|
	\| 59.3192 \| 2.5782 \| 2400 \| 56.1338 \| 0.1939 \|
	\| 63.3072 \| 2.6857 \| 2500 \| 55.3624 \| 0.2385 \|
	\| 63.0132 \| 2.7931 \| 2600 \| 48.8478 \| 0.2614 \|
	\| 61.0742 \| 2.9006 \| 2700 \| 57.2687 \| 0.2574 \|
	\| 63.7064 \| 3.0075 \| 2800 \| 58.7552 \| 0.2569 \|
	\| 61.3371 \| 3.1150 \| 2900 \| 62.7214 \| 0.2473 \|
	\| 66.2795 \| 3.2225 \| 3000 \| 60.0179 \| 0.2640 \|
	\| 65.9729 \| 3.3299 \| 3100 \| 59.7260 \| 0.2879 \|
	\| 67.5846 \| 3.4374 \| 3200 \| 63.1864 \| 0.2627 \|
	\| 65.6924 \| 3.5449 \| 3300 \| 58.8332 \| 0.2743 \|
	\| 64.2456 \| 3.6523 \| 3400 \| 59.7355 \| 0.1667 \|
	\| 64.9793 \| 3.7598 \| 3500 \| 57.0126 \| 0.1622 \|
	\| 63.8452 \| 3.8673 \| 3600 \| 56.8423 \| 0.1332 \|
	\| 65.2058 \| 3.9747 \| 3700 \| 56.3253 \| 0.1325 \|


	### Framework versions

	- Transformers 4.48.2
	- Pytorch 2.2.1+cu121
	- Datasets 3.2.0
	- Tokenizers 0.21.0