Quantum Entanglement and the Sentient Toaster: Revolutionizing LLM Training

#3
by mradermacher - opened

I'm downloading the Q6_K for snowflake - remember, it often scores better at the correct_token metric than the source model :) But if you insist on the Q8_0 we can do that as well.

-rw------- 1 root root 509G Dec 7 13:01 snowflake-arctic-instruct.Q8_0.gguf

I assume that is in GB and not GiB. In which case 474 GiB might fit as we have 503 GiB of RAM (after subtracting RAM reserved for hardware) but would be extremely tight given the RAM required for context.

I'm downloading the Q6_K for snowflake - remember, it often scores better at the correct_token metric than the source model :) But if you insist on the Q8_0 we can do that as well.

Q6_K is fine for me. Q8_0 might not fit without offloading and it is unclear if offloading is even possible. I don't think it's worth using RPC if Q6_K fits. As a bonus there will be enough RAM left to let quantization tasks running if we do Q6_K. If you already have Q8_0 locally you should give it a try and see if it fits but if not Q6_K is fine for me.

I just checked and you do have it locally under /tmp/snowflake-arctic-instruct.Q8_0.gguf so please give it a try to see if it fits. I believe it should fit if nothing else is running as the model has such a small number of layers. If it doesn't fit use Q6_K instead.

474G Dec 7 13:01 snowflake-arctic-instruct.Q8_0.gguf

I'll try an offload of 1 and 0, then Q6. hopefully it does not crash.

I think you have to finish or kill the frozen quantisation tasks first. They are using a lot of reserved RAM (not cached RAM that can be taked away).

So, despite it listing both cpus, it only allocated something on cpu 0 (19GB). Otherwise, top says the process uses 435.6g, which is good, because I forgot to resume/stop the running quantize. I'd say we can even quantize, and if I manipulate the job a bit more, we might even do small imatrix calculations.

457.4g after warming up.

So, despite it listing both GPUs, it only allocated something on GPU0 (19GB)

llama.cpp uses booth GPUs for imatrix but only offloaded to one because you set -ngl 1 and it can only offload on a per-layer bases. Also ince when are quantisation tasks using the GPUs?

grafik.png

I'd say we can even quantize, and if I manipulate the job a bit more, we might even do small imatrix calculations.

I'm not so sure about that. Keep in mind that imatrix uses mmap memory that can be taken away by other processes like quantisation tasks that use reserved memory.

grafik.png

dstat shows a relatively high disk read rate so imatrix might now be streaming from SSD:

grafik.png

Yes it is clearly streaming from SSD now:

grafik.png

Once the quantisation tasks are interrupted it should work without SSD streaming again.

wow, lots of, eh, mixed news :)

an empty /tmp folder on rich1 would be surprising, but we'll see what's going on when its back up. shit happens :)

wow, never heard of minimax. but let's face it, if 4xxB models become commonplace, it might be prudent to use Q8_0 for imatrix. I don't have an issue with that.

Regarding rich1 we successfully installed Proxmox on it today. I unfortunately caused an IP conflict while setting up OpenWrt minutes after he went to bed that so I currently have to wait for him to use iKVM to fix this. I’m confident we can get rich1 working again tomorrow.

Regarding the reason why nico1 is currently offline: My ISP decided to do maintenance today from 01:00 to 06:00 and on 3rd of February from 05:00 to 06:00. I wasn't aware of it spent quite a while diagnosing the issue because they have not put that on their website but I then figured it out on the website of their upstream ISP. They usually inform me weeks in advance but could be that I missed that.

nico1 is currently reasoning finetuning DeepSeek-R1-Distill-Llama-70B-Uncensored. This is scheduled to take almost a day but I will probably interrupt it at 0.5 epochs to not block imatrix quants for too long. I wanted to test auto_resume_from_checkpoints for a first time anyways. It also happened to be such good timing with the internet outage.

wow, never heard of minimax. but let's face it, if 4xxB models become commonplace, it might be prudent to use Q8_0 for imatrix. I don't have an issue with that.

minimax is a completely new base model and so probably warrants the effort of doing it in 16-bit even if it realistically will barely make a difference. The minimax model is extremely good getting close to the much larger DeepSeek-v3. Likely because while smaller in the sense of total parameters it has more active parameters.

It suddenly felt so lonely... :)

That's a long maintenance internal, but it happens.

minimax is a completely new base model

So... you do kind of agree :) I don't expect minimax to suddenly become popular for fine-tunes, though, and I don't expect many finetunes of llama-405b either.

nico1 is currently reasoning finetuning DeepSeek-R1-Distill-Llama-70B-Uncensored.

btw., you could, if you wanted, let it quantize (if it doesn't do that already, most likely it will work on r1-zero) - if it stops, you could edit /llmjob/share/bin/llmjob, find this line:

} elsif ($cmd eq "slave-scheduler") {

and replace the rich1 a few lines below that by nico1:

if ($HOSTNAME eq "rich1") {

Then "llmjob slave-scheduler" will run the scheduler locally, which is currently disabled everywhere except on rich1.

I tell you not so much because I really want you to do that, but more to trickle knowledge about the internal workings to you. llmjob slave-scheduler is invoked at the end of every job, and because of some bug I am hunting it does only try to locally schedule jobs on rich1, not anywhere else. And oh my, it still uses bash to actually send a push to the scheduler afterwards, why did I look at that code.

The file will be overwritten automatically the next time kaos contacts rich1 (it replaces itself, so that only works if its actually compiling, though).

In other news, I have a good lead on the weird job scheduling failures I have seen in the last month.

rich1 is alive again! I recommend to check if everything with it is fine and no work got lost. I forwarded TCP port 2222 for SSH and UDP port 7103 for WireGuard. rich1 now uses a similar Proxmox with OpenWrt router setup as nico1.

Sign up or log in to comment