Sweaterdog commited on
Commit
8b9294e
·
verified ·
1 Parent(s): 14eec2e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -8
README.md CHANGED
@@ -47,10 +47,10 @@ The model *may* experience bugs, such as not saying your name, getting previous
47
 
48
  # What models can I choose?
49
 
50
- There are going to be 3 model sizes avaliable, Regular, Mini, and Teensy
51
  * Regular is a 7B parameter model, tuned from [Deepseek-R1 Distilled](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)
52
  * Mini is a 1.5B parameter model, also tuned from [Deepseek-R1 Distilled](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
53
- * Teensy is a tiny little 360M parameter model, trained from [SmolLM2](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct)
54
 
55
  Out of all of the models, Teensy had the largest percent of parameters tuned, being 1/2 the models total size
56
 
@@ -70,11 +70,8 @@ A. No, if you are making a post about MindCraft, and using this model, you only
70
 
71
  ## Important notes and considerations
72
 
73
- The preview model of Andy-3.5-mini *(Andy-3.5-mini-preview)* was trained on a context length of 4096, this was meant to speed up training and VRAM usage.
 
74
 
75
- The Base model of Andy-3.5-mini-preview was a distilled version of Deepseek-R1, which was a tuned model of Qwen-2.5-1.5b
76
 
77
- Since a context window of 4096 is not nearly enough for MindCraft, you can go higher, Qwen-2.5-1.5b was trained on a context length of 64,000, the distilled version of Deepseek-R1 was trained on a length of 128,000, the usable length may be closer to 32,000 tokens for Andy-3.5-mini-preview
78
-
79
-
80
- When the full versions of Andy-3.5 and Andy-3.5-preview release, they will both be trained on a context length of 128,000 to ensure proper usage during playing.
 
47
 
48
  # What models can I choose?
49
 
50
+ There are going to be 2 *(maybe 3)* model sizes avaliable, Regular, Mini *(And Maybe large)*
51
  * Regular is a 7B parameter model, tuned from [Deepseek-R1 Distilled](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)
52
  * Mini is a 1.5B parameter model, also tuned from [Deepseek-R1 Distilled](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
53
+ * Large *(Might)* be a 32b parameter model, again tuned from [Deepseek-R1 Distilled](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) *- This model may not exist,* ***ever***
54
 
55
  Out of all of the models, Teensy had the largest percent of parameters tuned, being 1/2 the models total size
56
 
 
70
 
71
  ## Important notes and considerations
72
 
73
+ The preview model of Andy-3.5, is Andy-3.5-teensy, a small model and tune with only 360 million parameters, it ***"understand Minecraft"***.
74
+ I would not recommend Andy-3.5-teensy, I felt like making a joke, and a joke was made, *(The Andy-3.5-teensy model was a big hope, but it sucks, try out the q2_k model!)*
75
 
 
76
 
77
+ When the full versions of Andy-3.5 and Andy-3.5-mini *(And possibly Andy-3.5-large)* release, they will both be trained on a context length of 32,000 to ensure proper usage during playing.