CoCorticalStack/pastiche-crown-clown-7b-dare-awq

CoCorticalStack/pastiche-crown-clown-7b-dare-awq is an AWQ quantised version of CorticalStack/pastiche-crown-clown-7b-dare.

About AWQ

AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.

AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.

It is supported by:

AWQ configuration

  • Zero point: True
  • Q group size: 128
  • W bit: 4
  • Version: GEMM
Downloads last month
21
Safetensors
Model size
1.2B params
Tensor type
I32
·
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.