Quantum Entanglement and the Sentient Toaster: Revolutionizing LLM Training

by mradermacher - opened Dec 7, 2024

Owner Dec 7, 2024

•

edited Dec 7, 2024

I'm downloading the Q6_K for snowflake - remember, it often scores better at the correct_token metric than the source model :) But if you insist on the Q8_0 we can do that as well.

nicoboss

Dec 7, 2024

-rw------- 1 root root 509G Dec 7 13:01 snowflake-arctic-instruct.Q8_0.gguf

I assume that is in GB and not GiB. In which case 474 GiB might fit as we have 503 GiB of RAM (after subtracting RAM reserved for hardware) but would be extremely tight given the RAM required for context.

I'm downloading the Q6_K for snowflake - remember, it often scores better at the correct_token metric than the source model :) But if you insist on the Q8_0 we can do that as well.

Q6_K is fine for me. Q8_0 might not fit without offloading and it is unclear if offloading is even possible. I don't think it's worth using RPC if Q6_K fits. As a bonus there will be enough RAM left to let quantization tasks running if we do Q6_K. If you already have Q8_0 locally you should give it a try and see if it fits but if not Q6_K is fine for me.

nicoboss

Dec 7, 2024

•

edited Dec 7, 2024

I just checked and you do have it locally under /tmp/snowflake-arctic-instruct.Q8_0.gguf so please give it a try to see if it fits. I believe it should fit if nothing else is running as the model has such a small number of layers. If it doesn't fit use Q6_K instead.

474G Dec 7 13:01 snowflake-arctic-instruct.Q8_0.gguf

mradermacher

Owner Dec 7, 2024

I'll try an offload of 1 and 0, then Q6. hopefully it does not crash.

nicoboss

Dec 7, 2024

•

edited Dec 7, 2024

I think you have to finish or kill the frozen quantisation tasks first. They are using a lot of reserved RAM (not cached RAM that can be taked away).

mradermacher

Owner Dec 7, 2024

So, despite it listing both cpus, it only allocated something on cpu 0 (19GB). Otherwise, top says the process uses 435.6g, which is good, because I forgot to resume/stop the running quantize. I'd say we can even quantize, and if I manipulate the job a bit more, we might even do small imatrix calculations.

mradermacher

Owner Dec 7, 2024

457.4g after warming up.

nicoboss

Dec 7, 2024

•

edited Dec 7, 2024

So, despite it listing both GPUs, it only allocated something on GPU0 (19GB)

llama.cpp uses booth GPUs for imatrix but only offloaded to one because you set -ngl 1 and it can only offload on a per-layer bases. Also ince when are quantisation tasks using the GPUs?

I'd say we can even quantize, and if I manipulate the job a bit more, we might even do small imatrix calculations.

I'm not so sure about that. Keep in mind that imatrix uses mmap memory that can be taken away by other processes like quantisation tasks that use reserved memory.

dstat shows a relatively high disk read rate so imatrix might now be streaming from SSD:

Yes it is clearly streaming from SSD now:

Once the quantisation tasks are interrupted it should work without SSD streaming again.

486 hidden messages

Expand all

nicoboss

1 day ago

•

edited 1 day ago

@mradermacher What is the opposite of llmc pause llmjob.nico1? I tried llmc resume llmjob.nico1 but that just returns fail.

nicoboss

1 day ago

•

edited 1 day ago

@mradermacher Please unpause llmjob for nico1. I tried everything and nothing I do seems to work.

mradermacher

Owner 1 day ago

That's the right command, I'll have a look.

mradermacher

Owner 1 day ago

Should be fixed and therefore work in the future. Also, it's unpaused.

mradermacher

Owner about 9 hours ago

wonderful. all model card uploads fail. And I do not think it has anything to do with us:

BadRequestError('Bad request for commit endpoint:\n[31m------------------------------------------------------------------------- Unexpected internal error hook: yaml. (Request ID: Root=1-67c58a3d-55c4e98b47b8948c61143a8f;a7008891-9698-4001-83c0-06b1f54a85da) ------------------------------------------------------------------------- [0m\n\x1b[31m-------------------------------------------------------------------------\nUnexpected internal error hook: yaml. (Request ID: Root=1-67c58a3d-55c4e98b47b8948c61143a8f;a7008891-9698-4001-83c0-06b1f54a85da)\n-------------------------------------------------------------------------\x1b[0m')

nicoboss

about 9 hours ago

@mradermacher Why is almost every worker idle and stuck at run/static README.md upload? Does this mean we reached repository creating rate limit? But wouldn't it then be stuck at creating repositories instead?

mradermacher

Owner about 9 hours ago

No, it means huggingface fucked up, I think uploads were gobally down. They seem to have fixed it.

mradermacher

Owner about 7 hours ago

The pause and llmjob. flags should now be communicated to the host, so once set (and the scheduler has contacted it successfully, which should normally be almost immediate), it should reliably prevent hosts from starting new jobs. does not solve the problem of the scheduler trying to contact hosts, but that has to be a separate thing.

the upshot is that host-pause should now reliably be able to stop activity on a host. the next step would be to stop activity, and then set another (already existing) flag that keeps the scheduler from contactting that host. but thats for another time.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment