wizardLM-7B-GPTQ / README.md
TheBloke's picture
Update README.md
e4839da
|
raw
history blame
5.23 kB
metadata
license: other
inference: false
TheBlokeAI

WizardLM: An Instruction-following LLM Using Evol-Instruct

These files are the result of merging the delta weights with the original Llama7B model.

The code for merging is provided in the WizardLM official Github repo.

WizardLM-7B 4bit GPTQ

This repo contains 4bit GPTQ models for GPU inference, quantised using GPTQ-for-LLaMa.

Other repositories available

How to easily download and use this model in text-generation-webui

Make sure text-generation-webui is updated to the latest version.

How to easily download and use this model in text-generation-webui

Please make sure you're using the latest version of text-generation-webui

  1. Click the Model tab.
  2. Under Download custom model or LoRA, enter TheBloke/wizardLM-7B-GPTQ.
  3. Click Download.
  4. The model will start downloading. Once it's finished it will say "Done"
  5. In the top left, click the refresh icon next to Model.
  6. In the Model dropdown, choose the model you just downloaded: wizardLM-7B-GPTQ
  7. The model will automatically load, and is now ready for use!
  8. If you want any custom settings, set them and then click Save settings for this model followed by Reload the Model in the top right.
  • Note that you do not need to set GPTQ parameters any more. These are set automatically from the file quantize_config.json.
  1. Once you're ready, click the Text Generation tab and enter a prompt to get started!

Provided files

Two files are provided. The 'latest' file will not work unless you use a recent version of GPTQ-for-LLaMa

Specifically, the 'latest' file uses --act-order for maximum quantisation quality and will not work with oobabooga's fork of GPTQ-for-LLaMa. Therefore at this time it will also not work with text-generation-webui one-click installers.

The 'compat' file will be used by default in text-generation-webui so you don't need to do anything special to use it. If you want to use the 'latest' file, please remove the 'cmopat' file - but only do this if you are able to use the latest GPTQ-for-LLaMa code.

  • wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors
    • Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
    • Works with text-generation-webui one-click-installers
    • Parameters: Groupsize = 128g. No act-order.
    • Command used to create the GPTQ:
      CUDA_VISIBLE_DEVICES=0 python3 llama.py wizardLM-7B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors
      

Discord

For further support, and discussions on these models and AI in general, join us at:

TheBloke AI's Discord server

Thanks, and how to contribute.

Thanks to the chirper.ai team!

I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.

If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.

Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.

Patreon special mentions: Aemon Algiz, Dmitriy Samsonov, Nathan LeClaire, Trenton Dambrowitz, Mano Prime, David Flickinger, vamX, Nikolai Manek, senxiiz, Khalefa Al-Ahmad, Illia Dulskyi, Jonathan Leane, Talal Aujan, V. Lukas, Joseph William Delisle, Pyrater, Oscar Rangel, Lone Striker, Luke Pendergrass, Eugene Pentland, Sebastain Graf, Johann-Peter Hartman.

Thank you to all my generous patrons and donaters!

Original model info

Overview of Evol-Instruct Evol-Instruct is a novel method using LLMs instead of humans to automatically mass-produce open-domain instructions of various difficulty levels and skills range, to improve the performance of LLMs.

info