can you share your quantization code?

by shawei3000 - opened Jul 22

Discussion

shawei3000

Jul 22

can you share your quantization code? I would like to have 4.5 or 5 bit quantized model...

Alias1964

Owner Jul 22

I ran the following command:
python convert.py -i C:\users\pc\Ex2bot\models\Athene\ -o C:\users\pc\exl2\ -cf C:\users\pc\atheneexl\ -b 3.5
This is using the convert.py program in the exllamav2 project as documented here: https://github.com/turboderp/exllamav2/blob/master/doc/convert.md.
It took about 3 hours on my machine.
Please let me know if you have any other questions.

shawei3000

Jul 24

thanks, I did 5.0 bit version, just fit into a A6000 Ada GPU, result much better...

Alias1964

Owner Jul 24

Good to hear! So far I'm liking Llama 3.1 70B a bit better than Athene.

shawei3000

Jul 25

really! Llama 3.1 really good as they promised ; ) I will do a 70B 5.0 bit exl2, see if that improve my use case, thanks for the knowledge!!! ; )

Alias1964

Owner Jul 25

Turboderp has you covered: https://huggingface.co/turboderp/Llama-3.1-70B-Instruct-exl2

shawei3000

Jul 25

Thanks, Wow, they are fast, thank!

shawei3000

Jul 25

I have encountered the known error raised by others as well "Value for eos_token_id is not of expected type <class 'int'>", cant test the above llama 3.1 (5bpw) model from the link... you did not have that issue?

Alias1964

Owner Jul 25

Yes I should have mentioned that. You need to install the dev branch of exllamav2 to use llama3.1 If you're not sure how to do that, just wait a few days and the main exllamav2 should be fixed and you can just update to that.

shawei3000

Jul 25

thnx!

shawei3000

Jul 25

in my use case (complex customer knowledge extraction), my old favorate( https://huggingface.co/gbueno86/Meta-LLama-3-Cat-A-LLama-70b-exl2-5.0bpw ) is way better than
Athene-70B-5.0btw.

will update u reagrding llama 3.1 in a few days...

I am searching the best model for single GPU run (A6000 48GB), not lucky w Qwen2 model as well...

Alias1964

Owner Jul 25

I tried using Qwen2 before Athene and also had some serious problems with it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment