Sweaterdog
/

Andy-3.5

Inference Endpoints

Model card Files Files and versions Community

Sweaterdog commited on 8 days ago

Commit

8b9294e

·

verified ·

1 Parent(s): 14eec2e

Update README.md

Files changed (1) hide show

README.md +5 -8

README.md CHANGED Viewed

@@ -47,10 +47,10 @@ The model *may* experience bugs, such as not saying your name, getting previous
 # What models can I choose?
-There are going to be 3 model sizes avaliable, Regular, Mini, and Teensy
 * Regular is a 7B parameter model, tuned from [Deepseek-R1 Distilled](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)
 * Mini is a 1.5B parameter model, also tuned from [Deepseek-R1 Distilled](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
-* Teensy is a tiny little 360M parameter model, trained from [SmolLM2](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct)
 Out of all of the models, Teensy had the largest percent of parameters tuned, being 1/2 the models total size
@@ -70,11 +70,8 @@ A. No, if you are making a post about MindCraft, and using this model, you only
 ## Important notes and considerations
-The preview model of Andy-3.5-mini *(Andy-3.5-mini-preview)* was trained on a context length of 4096, this was meant to speed up training and VRAM usage.
-The Base model of Andy-3.5-mini-preview was a distilled version of Deepseek-R1, which was a tuned model of Qwen-2.5-1.5b
-Since a context window of 4096 is not nearly enough for MindCraft, you can go higher, Qwen-2.5-1.5b was trained on a context length of 64,000, the distilled version of Deepseek-R1 was trained on a length of 128,000, the usable length may be closer to 32,000 tokens for Andy-3.5-mini-preview
-When the full versions of Andy-3.5 and Andy-3.5-preview release, they will both be trained on a context length of 128,000 to ensure proper usage during playing.

 # What models can I choose?
+There are going to be 2 *(maybe 3)* model sizes avaliable, Regular, Mini *(And Maybe large)*
 * Regular is a 7B parameter model, tuned from [Deepseek-R1 Distilled](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)
 * Mini is a 1.5B parameter model, also tuned from [Deepseek-R1 Distilled](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
+* Large *(Might)* be a 32b parameter model, again tuned from [Deepseek-R1 Distilled](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) *- This model may not exist,* ***ever***
 Out of all of the models, Teensy had the largest percent of parameters tuned, being 1/2 the models total size
 ## Important notes and considerations
+The preview model of Andy-3.5, is Andy-3.5-teensy, a small model and tune with only 360 million parameters, it ***"understand Minecraft"***.
+I would not recommend Andy-3.5-teensy, I felt like making a joke, and a joke was made, *(The Andy-3.5-teensy model was a big hope, but it sucks, try out the q2_k model!)*
+When the full versions of Andy-3.5 and Andy-3.5-mini *(And possibly Andy-3.5-large)* release, they will both be trained on a context length of 32,000 to ensure proper usage during playing.