TheBloke
/

LongChat-13B-GGML

Model card Files Files and versions Community

TheBloke commited on Jul 1, 2023

Commit

d99179a

·

1 Parent(s): fbbc632

Update README.md

Files changed (1) hide show

README.md +6 -2

README.md CHANGED Viewed

@@ -30,7 +30,9 @@ Support is also expected to come to llama.cpp, however work is still being done
 To use the increased context with KoboldCpp, use `--contextsize` to set the desired context, eg `--contextsize 4096` or `--contextsize 8192` or `--contextsize 16384`.
-**NOTE**: Increased context length is an area seeing rapid developments and improvements. It is quite possible that these models may be superseded by new developments in the coming days. If that's the case, I will remove them, or update this README as appropriate.
 ## Repositories available
@@ -87,9 +89,11 @@ Refer to the Provided Files table below to see what files use which methods, and
 On Linux I use the following command line to launch the KoboldCpp UI with OpenCL aceleration and a context size of 4096:
 ```
-python ./koboldcpp.py --stream --unbantokens --threads 8 --usecublas --gpulayers 100 longchat-13b-16k.ggmlv3.q4_K_M.bin
 ```
 Change `--gpulayers 100` to the number of layers you want/are able to offload to the GPU. Remove it if you don't have GPU acceleration.
 For OpenCL acceleration, change `--usecublas` to `--useclblast 0 0`. You may need to change the second `0` to `1` if you have both an iGPU and a discrete GPU.

 To use the increased context with KoboldCpp, use `--contextsize` to set the desired context, eg `--contextsize 4096` or `--contextsize 8192` or `--contextsize 16384`.
+**NOTE 1**: Currently RoPE models can _only_ be used at a context size greater than 2048. At 2048 it will produce gibberish. Please make sure you're always setting `--contextsize` and specifying a value higher than 2048, eg 3072, 4096, etc.
+**NOTE 2**: Increased context length is an area seeing rapid developments and improvements. It is quite possible that these models may be superseded by new developments in the coming days. If that's the case, I will remove them, or update this README as appropriate.
 ## Repositories available
 On Linux I use the following command line to launch the KoboldCpp UI with OpenCL aceleration and a context size of 4096:
 ```
+python ./koboldcpp.py --contextsize 4096 --stream --unbantokens --threads 8 --usecublas --gpulayers 100 longchat-13b-16k.ggmlv3.q4_K_M.bin
 ```
+Change `--contextsize` to the context size you want - **it must be higher than 2048 else the model will produce gibberish**
 Change `--gpulayers 100` to the number of layers you want/are able to offload to the GPU. Remove it if you don't have GPU acceleration.
 For OpenCL acceleration, change `--usecublas` to `--useclblast 0 0`. You may need to change the second `0` to `1` if you have both an iGPU and a discrete GPU.