Crystalcareai
/

Qwen1.5-8x7b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Crystalcareai commited on Mar 1

Commit

3c733f6

•

1 Parent(s): b5c954d

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -21,10 +21,10 @@ I curated a dataset, named Mixture of Data (MoD), from various sources, includin
 After training Qwen1.5-7b on 100k random samples from MoD over four epochs and merging the fine-tuned model 8x, I used an approach utilizing a random gate, without specialized fine-tuning done to any of the 8 experts. The result was a model that initially made no sense, lacking a base model and clear guidance on expert usage.
-Despite challenges, such as training interruptions via cuda errors with Runpod , the model showed promising adaptability to the rest of the MoD dataset, even with limited training (0.45/4 planned epochs were completed before my compute budget ran out). While I haven't been able to benchmark it fully (I will when I can get this runpod situation sorted) it appears to perform comparably to Mixtral in (admittedly naive) preliminary reasoning tests.
 These weeks have been incredibly rewarding and educational, thanks to the contributions of Jon Durbin, Maxime Labonne, Teknium, Eric Hartford, and Charles Goddard. Their work has made these technologies accessible and inspired my project. A special thank you to Teknium and Eric Hartford, who have been generous with their time - answering my questions with kindness and humility.
-Thank you for your interest and support. Let's push the boundaries of what's possible together.
 Lucas

 After training Qwen1.5-7b on 100k random samples from MoD over four epochs and merging the fine-tuned model 8x, I used an approach utilizing a random gate, without specialized fine-tuning done to any of the 8 experts. The result was a model that initially made no sense, lacking a base model and clear guidance on expert usage.
+Despite challenges, such as training interruptions via cuda errors with Runpod , the model showed promising adaptability to the rest of the MoD dataset, even with limited training (0.45/4 planned epochs were completed before my compute budget ran out). It performs comparably to Mixtral in (admittedly naive) preliminary reasoning tests.
 These weeks have been incredibly rewarding and educational, thanks to the contributions of Jon Durbin, Maxime Labonne, Teknium, Eric Hartford, and Charles Goddard. Their work has made these technologies accessible and inspired my project. A special thank you to Teknium and Eric Hartford, who have been generous with their time - answering my questions with kindness and humility.
+I am currently training a 2.0 model - that I expect to beat Mixtral on most benchmarks. Thank you for your interest and support. Let's push the boundaries of what's possible together.
 Lucas