using quants with pipeline

#1
by supercharge19 - opened

Is it possible to use a quantized version of model through huggingface (transformers') pipeline? or can a model be loaded as int4 or even fp4 (instead of fp16 as this model) through pipeline? How will model behave if done so this way, how much accuracy/output degrade with quantization through pipeline (if possible)?

Sign up or log in to comment