🚀 Welcome to a new generation of Minecraft with Andy 3.5 🚀

Andy 3.5 is a collection of LOCAL LLM's designed for playing Minecraft

Andy 3.5 is designed to be used with MindCraft, and is not designed nor intended to be used for any other applications

How to Install

Select the model you would like to use (Larger is better)
Download the Modelfile
Once downloaded, open Modelfile in a text editor, and change the path to the download location of the gguf file
When changed, save the file, and open command terminal
(Optional if CMD is opened via file explorer) Navigate to the correct directory using "cd"
Run the command ollama create Andy-3.5 -f Modelfile, If you want multiple models, include a tag afterwards. Example: Andy-3.5:mini-fp16 or Andy-3.5:q2_k
Go to a profile in MindCraft
Change the model to be Andy-3.5
Enjoy playing with an AI

How was model trained?

The model was trained on a dataset of ~9,000 messages coming directly from MindCraft, ensuring quality data, not the newer version with ~12,000 prompts

What are capabilities and Limitations?

The smaller model (The preview ones at least) had 1/3 of the parameters tuned, the larger preview model has not been released yet. Andy-3.5 was trained on EVERYTHING regarding Minecraft and MindCraft, it knows how to use commands natively without a system prompt. Andy-3.5 also knows how to build / use !newAction to perform commands, it was trained on lots of building, as well as, using !newAction to do tasks like manually making something or strip mining.

Know this is a PREVIEW model, it is NOT finished!

Why a preview model?

Andy-3.5-preview was made to test the intelligence of a Minecraft Ai with the current dataset, it was meant to see the progress of the training and what area's are needed for the future DO NOT expect this model to be able to do everything perfectly, it only knows as much as the dataset told it, as well as the other 2/3 of the untouched parameters allow. The model may experience bugs, such as not saying your name, getting previous messages confused, or other small things.

What models can I choose?

There are going to be 2 (maybe 3) model sizes avaliable, Regular, Mini (And Maybe large)

Regular is a 7B parameter model, tuned from Deepseek-R1 Distilled
Mini is a 1.5B parameter model, also tuned from Deepseek-R1 Distilled
Large (Might) be a 32b parameter model, again tuned from Deepseek-R1 Distilled - This model may not exist, ever

Out of all of the models, Teensy had the largest percent of parameters tuned, being 1/2 the models total size

Safety and FAQ

Q: Is this model safe to use? A. Yes, this model is non-volatile, and cannot generate malicous content

Q. Can this model be used on a server? A. Yes, In theory and practice the model is only capable of building and performing manual tasks via newAction

Q. Who is responsible if this model does generate malicous content? A. You are responsible, even though the model was never trained to be able to make malicous content, there is a very very slight chance it still generates malicous code.

Q. If I make media based on this model, like photos / videos, do I have to mention the Creator? A. No, if you are making a post about MindCraft, and using this model, you only have to mention the creator if you mention the model being used.

Important notes and considerations

The preview model of Andy-3.5, is Andy-3.5-teensy, a small model and tune with only 360 million parameters, it "understand Minecraft". I would not recommend Andy-3.5-teensy, I felt like making a joke, and a joke was made, (The Andy-3.5-teensy model was a big hope, but it sucks, try out the q2_k model!)

When the full versions of Andy-3.5 and Andy-3.5-mini (And possibly Andy-3.5-large) release, they will both be trained on a context length of 32,000 to ensure proper usage during playing.

Performance Metrics

These benchmarks are a-typical, since most standard benchmarks don't apply to Minecraft The benchmarks below include models via API that are cheap, and other fine-tuned local models (Excluding Andy-v2 and Andy-v3, since they are bad)

Zero info Prompting

How fast can a model collect 16 oak logs, and convert them all into sticks

Currently, Andy-3.5 and Andy-3.5-mini are the ONLY models that can play without command documentation, or any other instruction, and Andy-3.5-Mini sometimes fares better without the unnecessary data. Test this for yourself using this profile

Time to get a stone pickaxe

I am sure other models like Deepseek-R1 may be faster at getting a stone pickaxe, however the Demo was to show the performance of Andy-3.5

For Andy-3.5-mini, I used the FP16 model, I had enough VRAM to do so For Andy-3.5, I used the Q4_K_M quantization For Andy-3.5-Teensy, I used the FP16 quantization For Mineslayerv1 and Mineslayerv2, I used the default (and only) quantization, Q4_K_M

Notes about the benchmarks

Zero Info Prompting

Andy-3.5-Teensy was able to use one command successfully, but was not able to afterwards Andy-3.5-Mini collected 32 oak_log instead of 16 oak_log Andy-3.5 attempted to continue playing, and make a wooden_pickaxe after the goal was done.

Both Mineslayerv1 and Mineslayerv2 hallucinated commands, like !chop or !grab

Time to get a stone pickaxe

Andy-3.5-teensy hallucinates too much for stable gameplay (It is a 360M parameter model, what can be expected) Andy-3.5-Mini was unable to make itself a stone pickaxe, however it collected enough wood, but then got stuck on converting logs to planks, it kept trying "!craftRecipe("wooden_planks", 6) instead of oak_planks Andy-3.5 Made a stone pickaxe the fastest out of all models, including GPT-4o-mini and Claude-3.5-Haiku Mineslayerv1 Was unable to use !collectBlocks, instead kept trying !collectBlock Mineslayerv2 Was unable to play, it kept hallucinating on the first command

Sweaterdog
/

Andy-3.5