Doctor-Shotgun's picture
Create README.md
e348084
metadata
language:
  - en

Information

This is a Exl2 quantized version of Norobara-ZLoss-8x7B

Please refer to the original creator for more information.

Calibration dataset: Exllamav2 default

Branches:

  • main: Measurement files
  • 3.5bpw-h6: 3.5 bits per weight, 6 head bits, for 24gb VRAM
  • 6.0bpw-h6: 6 bits per weight, 6 head bits, for 48gb VRAM

Notes

  • 6.0bpw-h6 is recommended for the best quality to vram usage ratio (assuming you have enough vram).
  • Please ask for more bpws in the community tab if necessary.

Run in TabbyAPI

TabbyAPI is a pure exllamav2 FastAPI server developed by us. You can find TabbyAPI's source code here: https://github.com/theroyallab/TabbyAPI

If you don't have huggingface-cli, please run pip install huggingface_hub.

To run this model, follow these steps:

  1. Make a directory inside your models folder called Norobara-ZLoss-8x7B-exl2

  2. Open a terminal inside your models folder

  3. Run huggingface-cli download royallab/Norobara-ZLoss-8x7B-exl2 --revision 6.0bpw-h6 --local-dir Norobara-ZLoss-8x7B-exl2 --local-dir-use-symlinks False

    1. The --revision flag corresponds to the branch name on the model repo. Please select the appropriate bpw branch for your system.
  4. Inside TabbyAPI's config.yml, set model_name to Norobara-ZLoss-8x7B-exl2 or you can use the /model/load endpoint after launching.

  5. Launch TabbyAPI inside your python env by running python main.py

Donate?

All my infrastructure and cloud expenses are paid out of pocket. If you'd like to donate, you can do so here: https://ko-fi.com/doctorshotgun

You should not feel obligated to donate, but if you do, I'd appreciate it.