AetherArchitectural
/

GGUF-Quantization-Script

Text Generation

text-generation-inference

Model card Files Files and versions Community

FantasiaFoundry commited on May 12, 2024

Commit

4e61cc9

·

verified ·

1 Parent(s): 6df628f

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -11,15 +11,18 @@ tags:
 > [!TIP]
 > **Credits:**
 >
-> Made with love by [**@Lewdiculous**](https://huggingface.co/Lewdiculous). <br>
 > If this proves useful for you, feel free to credit and share the repository and authors.
 > [!WARNING]
 > **[Important] Llama-3:**
 >
 > For those converting LLama-3 BPE models, you'll have to read [**llama.cpp/#6920**](https://github.com/ggerganov/llama.cpp/pull/6920#issue-2265280504) for more context. <br>
 > Basically, make sure you're in the latest llama.cpp repo commit, then run the new `convert-hf-to-gguf-update.py` script inside the repo (you will need to provide a huggingface-read-token, and you need to have access to the Meta-Llama-3 repositories – [here](https://huggingface.co/meta-llama/Meta-Llama-3-8B) and [here](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) – to be sure, so fill the access request from right away to be able to fetch the necessary files), afterwards you need to manually copy the config files from `llama.cpp\models\tokenizers\llama-bpe` into your downloaded **model** folder, replacing the existing ones. <br>
 > Try again and the conversion procress should work as expected.
 Pull Requests with your own features and improvements to this script are always welcome.

 > [!TIP]
 > **Credits:**
 >
+> Made with love by [**@Lewdiculous**](https://huggingface.co/Lewdiculous) with the handy contributions by [**@SolidSnacke**](https://huggingface.co/SolidSnacke). <br>
 > If this proves useful for you, feel free to credit and share the repository and authors.
 > [!WARNING]
 > **[Important] Llama-3:**
 >
 > For those converting LLama-3 BPE models, you'll have to read [**llama.cpp/#6920**](https://github.com/ggerganov/llama.cpp/pull/6920#issue-2265280504) for more context. <br>
+>
 > Basically, make sure you're in the latest llama.cpp repo commit, then run the new `convert-hf-to-gguf-update.py` script inside the repo (you will need to provide a huggingface-read-token, and you need to have access to the Meta-Llama-3 repositories – [here](https://huggingface.co/meta-llama/Meta-Llama-3-8B) and [here](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) – to be sure, so fill the access request from right away to be able to fetch the necessary files), afterwards you need to manually copy the config files from `llama.cpp\models\tokenizers\llama-bpe` into your downloaded **model** folder, replacing the existing ones. <br>
 > Try again and the conversion procress should work as expected.
+>
+> The is a new experimental script added, `gguf-imat-llama-3-lossless.py`, which performs the conversions directly from a BF16 GGUF to hopefully generate lossless Llama-3 model quantizations, it is more resourse intensive and will generate more writes in the drive as there's a whole additional conversion step that isn't performed in the previous version. This should be only be necessary until we have GPU support for BF16 to run directly without conversion.
 Pull Requests with your own features and improvements to this script are always welcome.