Pygmalion 6B GGML
This repository contains quantized conversions of the current Pygmalion 6B checkpoints.
For use with frontends that support GGML quantized GPT-J models, such as KoboldCpp and Oobabooga (with the CTransformers loader).
Last updated on 2023-09-26.
Description:
- The
pygmalion-6b-main
files are quantized from the main branch of Pygmalion 6B. Also known as "experiment 2", released on January 13th. - The
pygmalion-6b-dev
files are quantized from the dev branch of Pygmalion 6B. Also known as "part 4/10 of experiment 7", released on March 12th. - The motivation behind these quantizations was to have one repository for both the main and dev versions of Pygmalion, as well as all quantization formats available. Some users may prefer the prose and creativity of Pygmalion 6B (and its lack of synthetic GPT-4 data) over newer models, or find 6B's requirements more affordable than 7B. For a modern alternative, Pygmalion 2 7B is worth investigating.
RAM usage:
Model | Startup RAM usage (KoboldCpp) | Startup RAM usage (Oobabooga) |
---|---|---|
pygmalion-6b-dev.q4_0.bin | 3.7 GiB | 3.7 GiB |
pygmalion-6b-dev.q4_1.bin | 4.1 GiB | 4.1 GiB |
pygmalion-6b-dev.q5_0.bin | 4.4 GiB | 4.4 GiB |
pygmalion-6b-dev.q5_1.bin | 4.8 GiB | 4.8 GiB |
pygmalion-6b-dev.q8_0.bin | 6.5 GiB | 6.6 GiB |
Notes:
- Tested with these SillyTavern settings:
- ggerganov/ggml [8ca2c19]'s gpt-j conversion script was used for conversion and quantization. First they were converted to f16 ggml files, then quantized.
The original model can be found here, and the original model card can be found below.
Pygmalion 6B
Model description
Pymalion 6B is a proof-of-concept dialogue model based on EleutherAI's GPT-J-6B.
Warning: This model is NOT suitable for use by minors. It will output X-rated content under certain circumstances.
Training data
The fine-tuning dataset consisted of 56MB of dialogue data gathered from multiple sources, which includes both real and partially machine-generated conversations.
Training procedure
Model weights were initialized from the uft-6b
ConvoGPT model made available in this commit.
The model was then further fine-tuned on ~48.5 million tokens for ~5k steps on 4 NVIDIA A40s using DeepSpeed.
Intended use
The easy way
We provide a notebook with a Gradio UI for playing around with the model without having to manually format inputs. This notebook can be found here.
The manual way
The model can be used as a regular text generation model, but it'll perform best if the input prompt adheres to the following format:
[CHARACTER]'s Persona: [A few sentences about the character you want the model to play]
<START>
[DIALOGUE HISTORY]
You: [Your input message here]
[CHARACTER]:
Where [CHARACTER]
is, as you can probably guess, the name of the character you want the model to portray, <START>
should be used verbatim as a delimiter token to separate persona and scenario data from the dialogue, and [DIALOGUE HISTORY]
is chat history so the model can have some conversational context to draw from. Ideally it'll be pairs of messages like:
[CHARACTER]: [some dialogue here]
You: [your response to the dialogue above]
Apart from chat history, you can also just add example conversations in [DIALOGUE HISTORY]
to show how the character should speak - ideally at the beginning, so it doesn't get confused as to what's conversation history vs. character definition.
Known issues
We haven't played around with the model enough to enumerate them. Feel free to give us some feedback!