Update README.md
Browse files
README.md
CHANGED
@@ -30,7 +30,9 @@ Support is also expected to come to llama.cpp, however work is still being done
|
|
30 |
|
31 |
To use the increased context with KoboldCpp, use `--contextsize` to set the desired context, eg `--contextsize 4096` or `--contextsize 8192` or `--contextsize 16384`.
|
32 |
|
33 |
-
**NOTE**:
|
|
|
|
|
34 |
|
35 |
## Repositories available
|
36 |
|
@@ -87,9 +89,11 @@ Refer to the Provided Files table below to see what files use which methods, and
|
|
87 |
On Linux I use the following command line to launch the KoboldCpp UI with OpenCL aceleration and a context size of 4096:
|
88 |
|
89 |
```
|
90 |
-
python ./koboldcpp.py --stream --unbantokens --threads 8 --usecublas --gpulayers 100 longchat-13b-16k.ggmlv3.q4_K_M.bin
|
91 |
```
|
92 |
|
|
|
|
|
93 |
Change `--gpulayers 100` to the number of layers you want/are able to offload to the GPU. Remove it if you don't have GPU acceleration.
|
94 |
|
95 |
For OpenCL acceleration, change `--usecublas` to `--useclblast 0 0`. You may need to change the second `0` to `1` if you have both an iGPU and a discrete GPU.
|
|
|
30 |
|
31 |
To use the increased context with KoboldCpp, use `--contextsize` to set the desired context, eg `--contextsize 4096` or `--contextsize 8192` or `--contextsize 16384`.
|
32 |
|
33 |
+
**NOTE 1**: Currently RoPE models can _only_ be used at a context size greater than 2048. At 2048 it will produce gibberish. Please make sure you're always setting `--contextsize` and specifying a value higher than 2048, eg 3072, 4096, etc.
|
34 |
+
|
35 |
+
**NOTE 2**: Increased context length is an area seeing rapid developments and improvements. It is quite possible that these models may be superseded by new developments in the coming days. If that's the case, I will remove them, or update this README as appropriate.
|
36 |
|
37 |
## Repositories available
|
38 |
|
|
|
89 |
On Linux I use the following command line to launch the KoboldCpp UI with OpenCL aceleration and a context size of 4096:
|
90 |
|
91 |
```
|
92 |
+
python ./koboldcpp.py --contextsize 4096 --stream --unbantokens --threads 8 --usecublas --gpulayers 100 longchat-13b-16k.ggmlv3.q4_K_M.bin
|
93 |
```
|
94 |
|
95 |
+
Change `--contextsize` to the context size you want - **it must be higher than 2048 else the model will produce gibberish**
|
96 |
+
|
97 |
Change `--gpulayers 100` to the number of layers you want/are able to offload to the GPU. Remove it if you don't have GPU acceleration.
|
98 |
|
99 |
For OpenCL acceleration, change `--usecublas` to `--useclblast 0 0`. You may need to change the second `0` to `1` if you have both an iGPU and a discrete GPU.
|