General discussion.

by Lewdiculous - opened Mar 22, 2024

Mar 22, 2024

Call me clueless but I swore there were at least some general prebuilt executables for Linux in the regular llama.cpp releases, well, so it's all MacOS and Windows. My day is ruined.

Virt-io

Owner Mar 22, 2024

Can't blame them, too much overhead when people who use linux should already know how to build their own packages.

Lewdiculous

Mar 22, 2024

•

edited Mar 22, 2024

So I didn't want to believe this...

After some testing, making the actual quants is really slow, recommended to only use it for the intial FP16 GGUF and imatrix.dat generation.

...because I was thinking that "it can't be that bad".

If anyone also thought that, well...

It actually is very slow. I don't want to imagine what quanting the new smaller stuff like IQ3/2 would look like. I used free Colab but I don't think that would scale.

But!

It's really not a bad solution if you need to generate the Imatrix data and don't have the hardware for it. That is pretty fast as it's GPU-bound.

Marcus-Arcadius

Jul 31, 2024

got this error running the notebook

Virt-io

Owner Jul 31, 2024

@Marcus-Arcadius

The script is broken because of upstream changes.
I don't have time to fix it, at the moment.

This is not a good way to do it.

Colab has limited storage space for GPU instances
Colab only has 2 CPU cores

Recommended to do it locally or on another cloud provider. (paid colab isn't great)

Marcus-Arcadius

Jul 31, 2024

@Marcus-Arcadius

The script is broken because of upstream changes.
I don't have time to fix it, at the moment.

This is not a good way to do it.

Colab has limited storage space for GPU instances

Colab only has 2 CPU cores

Recommended to do it locally or on another cloud provider. (paid colab isn't great)

I probably do it locally but I've got to figure how to do it 😅

Virt-io

Owner Jul 31, 2024

If you are on windows give https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script a try.

Marcus-Arcadius

Jul 31, 2024

If you are on windows give https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script a try.

I am on Linux

Virt-io

Owner Jul 31, 2024

@Marcus-Arcadius

You will have to compile llamacpp from source.

I was gone for a couple of months, I am also unsure how to do it now.
Some of the build flags changed and I can't get it to compile with cuda. (most likely due to me being on arch)

When I get it working, I will see if I can add support for linux to the script.
It will take a while however, I am not a coder, and I don't have much free time.

Marcus-Arcadius

Jul 31, 2024

@Marcus-Arcadius

You will have to compile llamacpp from source.

I was gone for a couple of months, I am also unsure how to do it now.
Some of the build flags changed and I can't get it to compile with cuda. (most likely due to me being on arch)

When I get it working, I will see if I can add support for linux to the script.
It will take a while however, I am not a coder, and I don't have much free time.

I'm also on arch which looks like we are in the same dilemma 😂

Virt-io

Owner Jul 31, 2024

I think it is either gcc or the new 555 nvidia drivers :|

Marcus-Arcadius

Jul 31, 2024

If I figure it out I'll let you know

Lewdiculous

Jul 31, 2024

•

edited Jul 31, 2024

got this error running the notebook

@Marcus-Arcadius

They have changed the naming for a few things, I did some changes that reflected that in the Windows script, you should be good to start there as reference, convert script changed to underlines instead of hifens, the executables received a llama- prefix:

https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script/commit/234f95c659ecf10213bf0bb51344d098943dc641

https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script/commit/8d0a75b62cae4261ed71cd4dbdc396fc444b053b

I think Llama.cpp now provides pre-built Linux binaries? They are tagged ubuntu so I'm imagining they are expected to be used for servers using it... I'm not too familiar with the Linux side of things or about the broader compatibility situation across the variations, my experience is basically just Ubuntu server side.

Virt-io

Owner Aug 3, 2024

@Marcus-Arcadius

I finally got time to figure out the issue

You need to set your cuda architecture

Example

make -j 16 GGML_CUDA=1 CUDA_POWER_ARCH=75

Virt-io

Owner Aug 8, 2024

@Marcus-Arcadius

Never mind you just need to run make -j 8 GGML_CUDA=1

I added Linux support to the script
https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script/discussions/36

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment