|
--- |
|
license: |
|
- cc-by-nc-4.0 |
|
- llama2 |
|
language: |
|
- en |
|
library_name: ExLlamaV2 |
|
pipeline_tag: text-generation |
|
tags: |
|
- Mytho |
|
- ReMM |
|
- LLaMA 2 |
|
- Quantized Model |
|
- exl2 |
|
base_model: |
|
- Undi95/ReMM-v2.2-L2-13B |
|
--- |
|
|
|
# exl2 quants for ReMM V2.2 |
|
|
|
This repository includes the quantized models for the [ReMM V2.2](https://huggingface.co/Undi95/ReMM-v2.2-L2-13B) model by [Undi](https://huggingface.co/Undi95). ReMM is a model merge attempting to recreate [MythoMax](https://huggingface.co/Gryphe/MythoMax-L2-13b) using the [SLERP](https://github.com/Undi95/LLM-SLERP-MergeTest) merging method and newer models. |
|
|
|
## Current models |
|
|
|
| exl2 Quant | Model Branch | Model Size | Minimum Recommended VRAM (4096 Context, fp16 cache) | BPW | |
|
|-|-|-|-|-| |
|
| 3-Bit | main | 5.44 GB | 8GB GPU | 3.14 | |
|
| 3-Bit | 3bit | 6.36 GB | 10GB GPU | 3.72 | |
|
| 4-Bit | 4bit | 7.13 GB | 12GB GPU (10GB with swap) | 4.2 | |
|
| 4-Bit | 4.6bit | 7.81 GB | 12GB GPU | 4.63 | |
|
| 5-Bit | [R136a1's Repo](https://huggingface.co/R136a1/ReMM-v2.2-L2-13B-exl2) | 8.96 GB | 16GB GPU (12GB with swap) | 5.33 | |
|
|
|
## Where to use |
|
|
|
There are a couple places you can use an exl2 model, here are a few: |
|
|
|
- [tabbyAPI](https://github.com/theroyallab/tabbyAPI) |
|
- [Aphrodite Engine](https://github.com/PygmalionAI/aphrodite-engine) |
|
- [ExUI](https://github.com/turboderp/exui) |
|
- [oobabooga's Text Gen Webui](https://github.com/oobabooga/text-generation-webui) |
|
- When using the downloader, make sure to format like this: Anthonyg5005/ReMM-v2.2-L2-13B-exl2**\:QuantBranch** |
|
- With 5-Bit download: [R136a1/ReMM-v2.2-L2-13B-exl2](https://huggingface.co/R136a1/ReMM-v2.2-L2-13B-exl2) |
|
- [KoboldAI](https://github.com/henk717/KoboldAI) (Clone repo, don't use snapshot) |
|
|
|
# How to download: |
|
|
|
### oobabooga's downloader |
|
|
|
use something like [download-model.py](https://github.com/oobabooga/text-generation-webui/blob/main/download-model.py) to download with python requests.\ |
|
Install requirements: |
|
|
|
```shell |
|
pip install requests tqdm |
|
``` |
|
|
|
Example for downloading 3bpw: |
|
|
|
```shell |
|
python download-model.py Anthonyg5005/ReMM-v2.2-L2-13B-exl2:3bit |
|
``` |
|
|
|
### huggingface-cli |
|
|
|
You may also use huggingface-cli\ |
|
To install it, install python hf-hub |
|
|
|
```shell |
|
pip install huggingface-hub |
|
``` |
|
|
|
Example for 3bpw: |
|
|
|
```shell |
|
huggingface-cli download Anthonyg5005/ReMM-v2.2-L2-13B-exl2 --local-dir ReMM-v2.2-L2-13B-exl2-3bpw --revision 3bit |
|
``` |
|
### Git LFS (not recommended) |
|
|
|
I would recommend the http downloaders over using git, they can resume downloads if failed and are much easier to work with.\ |
|
Make sure to have git and git LFS installed.\ |
|
Example for 3bpw download with git: |
|
|
|
Have LFS file skip disabled |
|
```shell |
|
# windows |
|
set GIT_LFS_SKIP_SMUDGE=0 |
|
# linux |
|
export GIT_LFS_SKIP_SMUDGE=0 |
|
``` |
|
|
|
Clone repo branch |
|
```shell |
|
git clone https://huggingface.co/Anthonyg5005/ReMM-v2.2-L2-13B-exl2 -b 3bit |
|
``` |
|
|