jojo1899 commited on
Commit
9fbf3ab
·
1 Parent(s): 9226a0e

Improved quantization using Openvino 2024.5.0rc1

Browse files
README.md CHANGED
@@ -7,21 +7,20 @@ tags:
7
 
8
  This is an INT4 quantized version of the `meta-llama/Llama-3.1-8B-Instruct` model. The Python packages used in creating this model are as follows:
9
  ```
10
- openvino==2024.4.0
11
  optimum==1.23.3
12
  optimum-intel==1.20.1
13
  nncf==2.13.0
14
  torch==2.5.1
15
- transformers==4.46.1
16
  ```
17
  This quantized model is created using the following command:
18
  ```
19
- optimum-cli export openvino -m "meta-llama/Llama-3.1-8B-Instruct" --task text-generation-with-past --weight-format int4 --group-size 128 --trust-remote-code ./llama-3_1-8b-instruct-ov-int4
20
  ```
21
  For more details, run the following command from your Python environment: `optimum-cli export openvino --help`
22
 
23
  INFO:nncf:Statistics of the bitwidth distribution:
24
- | Num bits (N) | % all parameters (layers) | % ratio-defining parameters (layers) |
25
- |--------------|---------------------------|--------------------------------------|
26
- | 8 | 13% (2 / 226) | 0% (0 / 224) |
27
- | 4 | 87% (224 / 226) | 100% (224 / 224) |
 
7
 
8
  This is an INT4 quantized version of the `meta-llama/Llama-3.1-8B-Instruct` model. The Python packages used in creating this model are as follows:
9
  ```
10
+ openvino==2024.5.0rc1
11
  optimum==1.23.3
12
  optimum-intel==1.20.1
13
  nncf==2.13.0
14
  torch==2.5.1
15
+ transformers==4.46.2
16
  ```
17
  This quantized model is created using the following command:
18
  ```
19
+ optimum-cli export openvino --model "meta-llama/Llama-3.1-8B-Instruct" --weight-format int4 --group-size 128 --sym --ratio 1 --all-layers ./llama-3_1-8b-instruct-ov-int4
20
  ```
21
  For more details, run the following command from your Python environment: `optimum-cli export openvino --help`
22
 
23
  INFO:nncf:Statistics of the bitwidth distribution:
24
+ | Num bits (N) | % all parameters (layers) | % ratio-defining parameters (layers) |
25
+ |----------------|-----------------------------|----------------------------------------|
26
+ | 4 | 100% (226 / 226) | 100% (226 / 226) |
 
config.json CHANGED
@@ -35,7 +35,7 @@
35
  "rope_theta": 500000.0,
36
  "tie_word_embeddings": false,
37
  "torch_dtype": "bfloat16",
38
- "transformers_version": "4.46.1",
39
  "use_cache": true,
40
  "vocab_size": 128256
41
  }
 
35
  "rope_theta": 500000.0,
36
  "tie_word_embeddings": false,
37
  "torch_dtype": "bfloat16",
38
+ "transformers_version": "4.46.2",
39
  "use_cache": true,
40
  "vocab_size": 128256
41
  }
generation_config.json CHANGED
@@ -8,5 +8,5 @@
8
  ],
9
  "temperature": 0.6,
10
  "top_p": 0.9,
11
- "transformers_version": "4.46.1"
12
  }
 
8
  ],
9
  "temperature": 0.6,
10
  "top_p": 0.9,
11
+ "transformers_version": "4.46.2"
12
  }
openvino_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e1b6e71c049ddaf58b55a03172c5dce21ea4848f2960f5fa1faf44382c308661
3
- size 4678484032
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e412669f665674fd8cab4953934a2de259a0d25efb6a6b4bd114f781efab67e
3
+ size 4141531712
openvino_model.xml CHANGED
The diff for this file is too large to render. See raw diff