Crystalcareai
/

Qwen1.5-8x7b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Crystalcareai commited on Feb 18

Commit

2ab3b14

•

1 Parent(s): 817b0d6

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ datasets:
 I'm excited to share an early release of a project that has kept me busy for the last couple of weeks. Mixtral's release propelled me into a deep dive into MoEs. This led to my first experiments with post-training, starting with fine tuning using monsterapi around the middle of December, and later transitioning to axolotl as I got more comfortable with command lines and terminals.
-With the release of Qwen1.5, I was curious to see how it would compare to Mixtral. Thanks to Matthew Lebonne's lazymergekit, which simplifies the process for newcomers, I was able to give Qwen1.5-7B a unique twist.
 Coming from a background as an acting teacher and coach, I saw parallels between high-quality scripts' impact on performances and the importance of curating high-quality data for training models. This led me to explore data curation, especially for training Mixture of Experts (MoE) models. I looked into Teknium's OpenHermes dataset, Jon Durbin's collections on GitHub, and Eric Hartford's methods for achieving specific outcomes with models.

 I'm excited to share an early release of a project that has kept me busy for the last couple of weeks. Mixtral's release propelled me into a deep dive into MoEs. This led to my first experiments with post-training, starting with fine tuning using monsterapi around the middle of December, and later transitioning to axolotl as I got more comfortable with command lines and terminals.
+With the release of Qwen1.5, I was curious to see how it would compare to Mixtral. Thanks to lazymergekit, which simplifies the process for newcomers, I was able to give Qwen1.5-7B a unique twist.
 Coming from a background as an acting teacher and coach, I saw parallels between high-quality scripts' impact on performances and the importance of curating high-quality data for training models. This led me to explore data curation, especially for training Mixture of Experts (MoE) models. I looked into Teknium's OpenHermes dataset, Jon Durbin's collections on GitHub, and Eric Hartford's methods for achieving specific outcomes with models.