Update README.md
Browse files
README.md
CHANGED
@@ -26,6 +26,14 @@ GGML files are for CPU inference using [llama.cpp](https://github.com/ggerganov/
|
|
26 |
* [4bit and 5bit GGML models for CPU inference](https://huggingface.co/TheBloke/h2ogpt-oasst1-512-30B-GGML).
|
27 |
* [float16 HF format unquantised model for GPU inference and further conversions](https://huggingface.co/TheBloke/h2ogpt-oasst1-512-30B-HF)
|
28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
## Provided files
|
30 |
| Name | Quant method | Bits | Size | RAM required | Use case |
|
31 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
|
|
26 |
* [4bit and 5bit GGML models for CPU inference](https://huggingface.co/TheBloke/h2ogpt-oasst1-512-30B-GGML).
|
27 |
* [float16 HF format unquantised model for GPU inference and further conversions](https://huggingface.co/TheBloke/h2ogpt-oasst1-512-30B-HF)
|
28 |
|
29 |
+
## THESE FILES REQUIRE LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)!
|
30 |
+
|
31 |
+
llama.cpp recently made a breaking change to its quantisation methods.
|
32 |
+
|
33 |
+
I have quantised the GGML files in this repo with the latest version. Therefore you will require llama.cpp compiled on May 12th or later (commit `b9fd7ee` or later) to use them.
|
34 |
+
|
35 |
+
If you are currently unable to update llama.cpp, eg because you use a UI which hasn't updated yet, you can find GGML files for the previous version of llama.cpp in the `previous_llama` branch.
|
36 |
+
|
37 |
## Provided files
|
38 |
| Name | Quant method | Bits | Size | RAM required | Use case |
|
39 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|