pszemraj
/

mega-small-2048-C1024-tk_id-simplewiki-MR50

+---
+license: apache-2.0
+base_model: pszemraj/random-mega-small-2048
+tags:
+- generated_from_trainer
+metrics:
+- accuracy
+model-index:
+- name: PT-simple_wikipedia_LM-random-mega-small-2048-MR0.50-C1024-tk_id
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# PT-simple_wikipedia_LM-random-mega-small-2048-MR0.50-C1024-tk_id
+This model is a fine-tuned version of [pszemraj/random-mega-small-2048](https://huggingface.co/pszemraj/random-mega-small-2048) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 3.4773
+- Accuracy: 0.4591
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0005
+- train_batch_size: 1
+- eval_batch_size: 1
+- seed: 3208
+- gradient_accumulation_steps: 64
+- total_train_batch_size: 64
+- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.05
+- num_epochs: 3.0
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Accuracy |
+|:-------------:|:-----:|:----:|:---------------:|:--------:|
+| 7.2691        | 0.11  | 50   | 7.1000          | 0.0677   |
+| 7.1597        | 0.22  | 100  | 6.8388          | 0.0794   |
+| 6.5476        | 0.33  | 150  | 6.4004          | 0.1359   |
+| 6.5335        | 0.44  | 200  | 6.1776          | 0.1708   |
+| 5.7228        | 0.55  | 250  | 5.6106          | 0.2437   |
+| 5.4574        | 0.66  | 300  | 5.1391          | 0.2884   |
+| 5.2275        | 0.78  | 350  | 4.8626          | 0.3174   |
+| 4.9589        | 0.89  | 400  | 4.6454          | 0.3374   |
+| 4.6406        | 1.0   | 450  | 4.4498          | 0.3578   |
+| 4.8251        | 1.11  | 500  | 4.3055          | 0.3706   |
+| 4.4728        | 1.22  | 550  | 4.1877          | 0.3821   |
+| 4.3975        | 1.33  | 600  | 4.0709          | 0.3955   |
+| 4.4245        | 1.44  | 650  | 3.9909          | 0.4045   |
+| 4.2613        | 1.55  | 700  | 3.8976          | 0.4128   |
+| 4.1806        | 1.66  | 750  | 3.8515          | 0.4177   |
+| 3.9469        | 1.77  | 800  | 3.7883          | 0.4227   |
+| 3.9563        | 1.88  | 850  | 3.7314          | 0.4306   |
+| 4.0063        | 1.99  | 900  | 3.6975          | 0.4336   |
+| 3.9274        | 2.1   | 950  | 3.6561          | 0.4378   |
+| 3.788         | 2.21  | 1000 | 3.6280          | 0.4410   |
+| 3.8711        | 2.33  | 1050 | 3.5736          | 0.4467   |
+| 3.8623        | 2.44  | 1100 | 3.5535          | 0.4496   |
+| 3.8575        | 2.55  | 1150 | 3.5407          | 0.4521   |
+| 4.0079        | 2.66  | 1200 | 3.5172          | 0.4543   |
+| 3.8265        | 2.77  | 1250 | 3.4786          | 0.4591   |
+| 3.9513        | 2.88  | 1300 | 3.4741          | 0.4578   |
+| 3.554         | 2.99  | 1350 | 3.4773          | 0.4591   |
+### Framework versions
+- Transformers 4.33.1
+- Pytorch 2.2.0.dev20230907+cu118
+- Datasets 2.13.1
+- Tokenizers 0.13.3

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:34b57042a19e6dc02908161fa9ca55788881587b4ac15ae81aa18008c72306fb
 size 270274934

 version https://git-lfs.github.com/spec/v1
+oid sha256:1cde798b4b202c201046f490e78d3f5f60478aeb7dfaea85b414854329a37be2
 size 270274934