LoneStriker's picture
ExLLaMA V2 quant of goliath-120b-2.18bpw-h6-exl2
810e8f4
|
raw
history blame
1.73 kB
metadata
license: llama2
language:
  - en
pipeline_tag: conversational

Goliath 120B

An auto-regressive causal LM created by combining 2x finetuned Llama-2 70B into one.

Please check out the quantized formats provided by @TheBloke and @Panchovix:

  • GGUF (llama.cpp)
  • GPTQ (KoboldAI, TGW, Aphrodite)
  • AWQ (TGW, Aphrodite, vLLM)
  • Exllamav2 (TGW, KoboldAI)

Prompting Format

Both Vicuna and Alpaca will work, but due the initial and final layers belonging primarily to Xwin, I expect Vicuna to work the best.

Merge process

The models used in the merge are Xwin and Euryale.

The layer ranges used are as follows:

- range 0, 16
  Xwin
- range 8, 24
  Euryale
- range 17, 32
  Xwin
- range 25, 40
  Euryale
- range 33, 48
  Xwin
- range 41, 56
  Euryale
- range 49, 64
  Xwin
- range 57, 72
  Euryale
- range 65, 80
  Xwin

Screenshots

image/png

Benchmarks

Coming soon.

Acknowledgements

Credits goes to @chargoddard for developing the framework used to merge the model - mergekit.

Special thanks to @Undi95 for helping with the merge ratios.