Improved quantization using Openvino 2024.5.0rc1

Files changed (5) hide show

README.md CHANGED Viewed

@@ -7,21 +7,20 @@ tags:
 This is an INT4 quantized version of the `meta-llama/Llama-3.1-8B-Instruct` model. The Python packages used in creating this model are as follows:
 ```
-openvino==2024.4.0
 optimum==1.23.3
 optimum-intel==1.20.1
 nncf==2.13.0
 torch==2.5.1
-transformers==4.46.1
 ```
 This quantized model is created using the following command:
 ```
-optimum-cli export openvino -m "meta-llama/Llama-3.1-8B-Instruct" --task text-generation-with-past --weight-format int4 --group-size 128 --trust-remote-code ./llama-3_1-8b-instruct-ov-int4
 ```
 For more details, run the following command from your Python environment: `optimum-cli export openvino --help`
 INFO:nncf:Statistics of the bitwidth distribution:
-| Num bits (N) | % all parameters (layers) | % ratio-defining parameters (layers) |
-|--------------|---------------------------|--------------------------------------|
-| 8 | 13% (2 / 226) | 0% (0 / 224) |
-| 4 | 87% (224 / 226) | 100% (224 / 224) |

 This is an INT4 quantized version of the `meta-llama/Llama-3.1-8B-Instruct` model. The Python packages used in creating this model are as follows:
 ```
+openvino==2024.5.0rc1
 optimum==1.23.3
 optimum-intel==1.20.1
 nncf==2.13.0
 torch==2.5.1
+transformers==4.46.2
 ```
 This quantized model is created using the following command:
 ```
+optimum-cli export openvino --model "meta-llama/Llama-3.1-8B-Instruct" --weight-format int4 --group-size 128 --sym --ratio 1 --all-layers ./llama-3_1-8b-instruct-ov-int4
 ```
 For more details, run the following command from your Python environment: `optimum-cli export openvino --help`
 INFO:nncf:Statistics of the bitwidth distribution:
+| Num bits (N)   | % all parameters (layers)   | % ratio-defining parameters (layers)   |
+|----------------|-----------------------------|----------------------------------------|
+|              4 | 100% (226 / 226)            | 100% (226 / 226)                       |

config.json CHANGED Viewed

@@ -35,7 +35,7 @@
   "rope_theta": 500000.0,
   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
-  "transformers_version": "4.46.1",
   "use_cache": true,
   "vocab_size": 128256
 }

   "rope_theta": 500000.0,
   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
+  "transformers_version": "4.46.2",
   "use_cache": true,
   "vocab_size": 128256
 }

generation_config.json CHANGED Viewed

@@ -8,5 +8,5 @@
   ],
   "temperature": 0.6,
   "top_p": 0.9,
-  "transformers_version": "4.46.1"
 }

   ],
   "temperature": 0.6,
   "top_p": 0.9,
+  "transformers_version": "4.46.2"
 }

openvino_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e1b6e71c049ddaf58b55a03172c5dce21ea4848f2960f5fa1faf44382c308661
-size 4678484032

 version https://git-lfs.github.com/spec/v1
+oid sha256:3e412669f665674fd8cab4953934a2de259a0d25efb6a6b4bd114f781efab67e
+size 4141531712

openvino_model.xml CHANGED Viewed

The diff for this file is too large to render. See raw diff