--- language: en tags: - bert - rte - glue - torchdistill - nlp - int8 - neural-compressor - Intel® Neural Compressor - text-classfication - PostTrainingStatic license: apache-2.0 datasets: - rte metrics: - f1 --- # INT8 bert-large-uncased-rte-int8-static ## Post-training static quantization ### PyTorch This is an INT8 PyTorch model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor). The original fp32 model comes from the fine-tuned model [yoshitomo-matsubara/bert-large-uncased-rte](https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte). #### Test result | |INT8|FP32| |---|:---:|:---:| | **Accuracy (eval-f1)** |0.7365|0.7401| | **Model size (MB)** |1244|1349| #### Load with Intel® Neural Compressor: ```python from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification int8_model = IncQuantizedModelForSequenceClassification.from_pretrained( "Intel/bert-large-uncased-rte-int8-static", ) ```