fimbulvntr
/

Yi-34B-200K-RPMerge.IQ2_XXS

Inference Endpoints

Model card Files Files and versions Community

Yi-34B-200K-RPMerge.IQ2_XXS / README.md

fimbulvntr's picture

Create README.md

e557c95 verified 9 months ago

|

history blame contribute delete

927 Bytes

	Original model: https://huggingface.co/brucethemoose/Yi-34B-200K-RPMerge

	Steps:
	1. Convert to GGUF using llama.cpp (clone from source, install requirements, then run this)
	> `python convert.py /mnt/d/LLM_Models/Yi-34B-200K-RPMerge/ --vocab-type hfft --outtype f32 --outfile Yi-34B-200K-RPMerge.gguf`
	2. Create imatrix (offload as much as you can to the GPU)
	> `./imatrix -m /mnt/d/LLM_Models/Yi-34B-200K-RPMerge.gguf -f /mnt/d/LLM_Models/8k_random_data.txt -o /mnt/d/LLM_Models/Yi-34B-200K-RPMerge.imatrix.dat -ngl 20`
	3. Quantize using imatrix
	> `./quantize --imatrix /mnt/d/LLM_Models/Yi-34B-200K-RPMerge.imatrix.dat /mnt/d/LLM_Models/Yi-34B-200K-RPMerge.gguf /mnt/d/LLM_Models/Yi-34B-200K-RPMerge.IQ2_XXS.gguf IQ2_XXS

	I have also uploaded [8k_random_data.txt from this github discussion](https://github.com/ggerganov/llama.cpp/discussions/5006)
	And the importance matrix I made (`Yi-34B-200K-RPMerge.imatrix.dat`)