Crystalcareai commited on
Commit
37bcac8
·
verified ·
1 Parent(s): 632e4e6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -51
README.md CHANGED
@@ -420,20 +420,20 @@ unfrozen_parameters:
420
  - model.layers.12.block_sparse_moe.experts.7.w3
421
  - model.layers.13.block_sparse_moe.experts.7.w3
422
  - model.layers.14.block_sparse_moe.experts.7.w3
423
- # - model.layers.0.block_sparse_moe.gate
424
- # - model.layers.1.block_sparse_moe.gate
425
- # - model.layers.2.block_sparse_moe.gate
426
- # - model.layers.3.block_sparse_moe.gate
427
- # - model.layers.4.block_sparse_moe.gate
428
- # - model.layers.5.block_sparse_moe.gate
429
- # - model.layers.6.block_sparse_moe.gate
430
- # - model.layers.7.block_sparse_moe.gate
431
- # - model.layers.8.block_sparse_moe.gate
432
- # - model.layers.9.block_sparse_moe.gate
433
- # - model.layers.10.block_sparse_moe.gate
434
- # - model.layers.11.block_sparse_moe.gate
435
- # - model.layers.12.block_sparse_moe.gate
436
- # - model.layers.13.block_sparse_moe.gate
437
 
438
  model_config:
439
  output_router_logits: true
@@ -541,43 +541,6 @@ tokens:
541
 
542
  </details><br>
543
 
544
- # out
545
-
546
- This model is a fine-tuned version of [mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1) on the None dataset.
547
- It achieves the following results on the evaluation set:
548
- - Loss: 0.5217
549
-
550
- ## Model description
551
-
552
- More information needed
553
-
554
- ## Intended uses & limitations
555
-
556
- More information needed
557
-
558
- ## Training and evaluation data
559
-
560
- More information needed
561
-
562
- ## Training procedure
563
-
564
- ### Training hyperparameters
565
-
566
- The following hyperparameters were used during training:
567
- - learning_rate: 2.7e-05
568
- - train_batch_size: 4
569
- - eval_batch_size: 4
570
- - seed: 42
571
- - distributed_type: multi-GPU
572
- - num_devices: 8
573
- - gradient_accumulation_steps: 8
574
- - total_train_batch_size: 256
575
- - total_eval_batch_size: 32
576
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
577
- - lr_scheduler_type: cosine
578
- - lr_scheduler_warmup_steps: 2
579
- - num_epochs: 3
580
-
581
  ### Training results
582
 
583
  | Training Loss | Epoch | Step | Validation Loss |
 
420
  - model.layers.12.block_sparse_moe.experts.7.w3
421
  - model.layers.13.block_sparse_moe.experts.7.w3
422
  - model.layers.14.block_sparse_moe.experts.7.w3
423
+ - model.layers.0.block_sparse_moe.gate
424
+ - model.layers.1.block_sparse_moe.gate
425
+ - model.layers.2.block_sparse_moe.gate
426
+ - model.layers.3.block_sparse_moe.gate
427
+ - model.layers.4.block_sparse_moe.gate
428
+ - model.layers.5.block_sparse_moe.gate
429
+ - model.layers.6.block_sparse_moe.gate
430
+ - model.layers.7.block_sparse_moe.gate
431
+ - model.layers.8.block_sparse_moe.gate
432
+ - model.layers.9.block_sparse_moe.gate
433
+ - model.layers.10.block_sparse_moe.gate
434
+ - model.layers.11.block_sparse_moe.gate
435
+ - model.layers.12.block_sparse_moe.gate
436
+ - model.layers.13.block_sparse_moe.gate
437
 
438
  model_config:
439
  output_router_logits: true
 
541
 
542
  </details><br>
543
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
544
  ### Training results
545
 
546
  | Training Loss | Epoch | Step | Validation Loss |