Update README.md
Browse files
README.md
CHANGED
@@ -420,20 +420,20 @@ unfrozen_parameters:
|
|
420 |
- model.layers.12.block_sparse_moe.experts.7.w3
|
421 |
- model.layers.13.block_sparse_moe.experts.7.w3
|
422 |
- model.layers.14.block_sparse_moe.experts.7.w3
|
423 |
-
|
424 |
-
|
425 |
-
|
426 |
-
|
427 |
-
|
428 |
-
|
429 |
-
|
430 |
-
|
431 |
-
|
432 |
-
|
433 |
-
|
434 |
-
|
435 |
-
|
436 |
-
|
437 |
|
438 |
model_config:
|
439 |
output_router_logits: true
|
@@ -541,43 +541,6 @@ tokens:
|
|
541 |
|
542 |
</details><br>
|
543 |
|
544 |
-
# out
|
545 |
-
|
546 |
-
This model is a fine-tuned version of [mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1) on the None dataset.
|
547 |
-
It achieves the following results on the evaluation set:
|
548 |
-
- Loss: 0.5217
|
549 |
-
|
550 |
-
## Model description
|
551 |
-
|
552 |
-
More information needed
|
553 |
-
|
554 |
-
## Intended uses & limitations
|
555 |
-
|
556 |
-
More information needed
|
557 |
-
|
558 |
-
## Training and evaluation data
|
559 |
-
|
560 |
-
More information needed
|
561 |
-
|
562 |
-
## Training procedure
|
563 |
-
|
564 |
-
### Training hyperparameters
|
565 |
-
|
566 |
-
The following hyperparameters were used during training:
|
567 |
-
- learning_rate: 2.7e-05
|
568 |
-
- train_batch_size: 4
|
569 |
-
- eval_batch_size: 4
|
570 |
-
- seed: 42
|
571 |
-
- distributed_type: multi-GPU
|
572 |
-
- num_devices: 8
|
573 |
-
- gradient_accumulation_steps: 8
|
574 |
-
- total_train_batch_size: 256
|
575 |
-
- total_eval_batch_size: 32
|
576 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
577 |
-
- lr_scheduler_type: cosine
|
578 |
-
- lr_scheduler_warmup_steps: 2
|
579 |
-
- num_epochs: 3
|
580 |
-
|
581 |
### Training results
|
582 |
|
583 |
| Training Loss | Epoch | Step | Validation Loss |
|
|
|
420 |
- model.layers.12.block_sparse_moe.experts.7.w3
|
421 |
- model.layers.13.block_sparse_moe.experts.7.w3
|
422 |
- model.layers.14.block_sparse_moe.experts.7.w3
|
423 |
+
- model.layers.0.block_sparse_moe.gate
|
424 |
+
- model.layers.1.block_sparse_moe.gate
|
425 |
+
- model.layers.2.block_sparse_moe.gate
|
426 |
+
- model.layers.3.block_sparse_moe.gate
|
427 |
+
- model.layers.4.block_sparse_moe.gate
|
428 |
+
- model.layers.5.block_sparse_moe.gate
|
429 |
+
- model.layers.6.block_sparse_moe.gate
|
430 |
+
- model.layers.7.block_sparse_moe.gate
|
431 |
+
- model.layers.8.block_sparse_moe.gate
|
432 |
+
- model.layers.9.block_sparse_moe.gate
|
433 |
+
- model.layers.10.block_sparse_moe.gate
|
434 |
+
- model.layers.11.block_sparse_moe.gate
|
435 |
+
- model.layers.12.block_sparse_moe.gate
|
436 |
+
- model.layers.13.block_sparse_moe.gate
|
437 |
|
438 |
model_config:
|
439 |
output_router_logits: true
|
|
|
541 |
|
542 |
</details><br>
|
543 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
544 |
### Training results
|
545 |
|
546 |
| Training Loss | Epoch | Step | Validation Loss |
|