RWKV Quantisation
@thefaheem , xzuyn have released some: https://huggingface.co/models?sort=modified&search=xzuyn+rwkv+raven+ggml
I Can't use it with LLama cpp python, it gives a value error. I Think its because RWKV is not Decoder Architecture and its not unidirectional.
So, What Should I Use To Run These?
Can Anyone Help?
It worked for me with koboldcpp: https://github.com/LostRuins/koboldcpp
I only had time to try the 14B q5_1.
I'm a Dumb. Can You Please Tell me How to Run in linux or colab
Instructions for Windows:
- Download the latest release: https://github.com/LostRuins/koboldcpp/releases/latest/download/koboldcpp.exe
- double-click koboldcpp.exe
- click Launch and open your model .bin file
That's it! You may be able to improve performance if you launch it from command prompt and set a number of threads and give high priority to the process. Run koboldcpp.exe --help to see all the options. I launch it using the following command: koboldcpp.exe ggml-model-q5_1.bin --launch --threads 16 --highpriority --smartcontext
For Linux?..
I have yet to try koboldcpp on Linux. Check the README.md on the GitHub page for Linux instructions. I see that oobabooga's text-generation-webui should support RWKV as well: https://github.com/oobabooga/text-generation-webui/blob/main/docs/RWKV-model.md
Edit: I just realized that you specifically asked for linux or colab, and I gave you Windows instructions. Sorry for that. As for oobabooga, I may be wrong, but I don't think it supports quantized versions.
No Problem Mate, I Found it to be works well with rwkv.cpp.
Anyways Thanks For Your Help...