metadata
language:
- ja
license: mit
tags:
- ja
- japanese
- gpt_neox
- gpt
- text-generation
- lm
- nlp
- int8
- neural-compressor
- Intel® Neural Compressor
- PostTrainingStatic
datasets:
- oscar
model-index:
- name: gpt-neox-japanese-2.7b-int8
results:
- task:
name: Text Generation
type: text-generation
dataset:
name: oscar
type: oscar
args: unshuffled_original_ast
metrics:
- name: Acurracy
type: loss
value: 4.992
INT8 gpt-neox-japanese-2.7b-int8
Post-training static quantization
PyTorch
This is an INT8 PyTorch model quantized with Intel® Neural Compressor.
The original fp32 model comes from the fine-tuned model abeja/gpt-neox-japanese-2.7b.
The calibration dataloader is the train dataloader. The default calibration sampling size 100 isn't divisible exactly by batch size 8, so the real sampling size is 104.
Test result
INT8 | FP32 | |
---|---|---|
Accuracy (eval-loss) | 4.9920 | 3.5219 |
Model size (MB) | 2570 | 5360 |
Load with Intel® Neural Compressor:
from optimum.intel import INCModelForCausalLM
model_id = "Intel/gpt-neox-japanese-2.7b-int8"
int8_model = INCModelForCausalLM.from_pretrained(model_id)