metadata
language:
- en
pipeline_tag: text-generation
tags:
- not-for-all-audiences
Notes
There is no template, just BOS+text
It can also start from nothing
Temperature, repetition penalty, etc should all be left as defaults
It will not go lewd immediately, it will try to form a coherent story
It's best to generate 1~3 paragraphs at a time, it loses coherence if you try to make it generate the full context all at once
LLaMA-3-8B base
RoPEd to 16k context
Name | Quant | Size | VRAM (With FA) | VRAM (No FA) |
---|---|---|---|---|
llama-3-8b-lewd-stories-v6-16k.F16 | F16 | 14.9 | 16.6 | 17.4 |
llama-3-8b-lewd-stories-v6-16k.Q8_0 | Q8_0 | 8.0 | 10.1 | 10.5 |
llama-3-8b-lewd-stories-v6-16k.Q6_K | Q6_K | 6.1 | 8.4 | 9.2 |
llama-3-8b-lewd-stories-v6-16k.Q5_K_M | Q5_K_M | 5.3 | 7.6 | 8.1 |
llama-3-8b-lewd-stories-v6-16k.Q4_K_M | Q4_K_M | 4.6 | 6.9 | 7.8 |
Yi-1.5-9B-32K
Native 32k context
Name | Quant | Size | VRAM (With FA) | VRAM (No FA) |
---|---|---|---|---|
yi-lewd-stories-32k.F16 | F16 | 16.4 | ||
yi-lewd-stories-32k.Q8_0 | Q8_0 | 8.7 | ||
yi-lewd-stories-32k.Q6_K | Q6_K | 6.7 | ||
yi-lewd-stories-32k.Q5_K_M | Q5_K_M | 5.8 | ||
yi-lewd-stories-32k.Q4_K_M | Q4_K_M | 5.0 |
Mistral-7B-v0.3
Native 32k context
Name | Quant | Size | VRAM (With FA) | VRAM (No FA) |
---|---|---|---|---|
mistral-lewd-stories-32k.F16 | F16 | 13.5 | ||
mistral-lewd-stories-32k.Q8_0 | Q8_0 | 7.2 | ||
mistral-lewd-stories-32k.Q6_K | Q6_K | 5.5 | ||
mistral-lewd-stories-32k.Q5_K_M | Q5_K_M | 4.8 | ||
mistral-lewd-stories-32k.Q4_K_M | Q4_K_M | 4.0 |