README.md · crumb/test-00-switchllama-i3b-f10b-e4-init at 3061e15813acab9f79fb185fb8c895764cc83a3c

metadata

datasets:
  - crumb/Wizard-EvolInstruct70k-k4
language:
  - en
tags:
  - switch_transformers
  - llama
  - MoE

This is the very first testing switchllama model from MoLora2, starting from OpenLlama-3b-v2 and adding 4 experts in the MLP blocks of the model. The experts were trained with QLora and merged properly (in 4bit) after individually training adapters on gate_proj, up_proj, down_proj. The 4 expert models were trained on clusters from crumb/Wizard-EvolInstruct70k-k4 then their trained MLP weights were taken and transplanted in a model initialized from OpenLlama-3b with 4 switchtransformer experts. The routers are not trained in this version of the model and are randomly initialized.

Modeling code is not included until this proof-of-concept is entirely trained.