Upcycling Experiments
Collection
Models I pre-trained initialising SMoE models using dense model weights and the upcycling process used for Qwen1.5-MoE2.7BA (or something similar)
•
6 items
•
Updated
No model card