justinthelaw
commited on
Commit
•
0799a23
1
Parent(s):
73b9f2c
fix README links
Browse files
README.md
CHANGED
@@ -15,7 +15,10 @@ tags:
|
|
15 |
# Phi-3-mini-128k-instruct GPTQ 4-bit 128g Group Size
|
16 |
|
17 |
- Model creator: [Microsoft](https://huggingface.co/microsoft)
|
18 |
-
- Original model: [Phi-3-mini-128k-instruct](https://huggingface.co/
|
|
|
|
|
|
|
19 |
|
20 |
<!-- description start -->
|
21 |
## Description
|
@@ -37,7 +40,7 @@ Models are released as sharded safetensors files.
|
|
37 |
|
38 |
The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.
|
39 |
This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.
|
40 |
-
The model belongs to the Phi-3 family with the Mini version in two variants [4K](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) and [128K](https://huggingface.co/
|
41 |
|
42 |
After initial training, the model underwent a post-training process that involved supervised fine-tuning and direct preference optimization to enhance its ability to follow instructions and adhere to safety measures.
|
43 |
When evaluated against benchmarks that test common sense, language understanding, mathematics, coding, long-term context, and logical reasoning, the Phi-3 Mini-128K-Instruct demonstrated robust and state-of-the-art performance among models with fewer than 13 billion parameters.
|
@@ -239,7 +242,7 @@ Developers should apply responsible AI best practices and are responsible for en
|
|
239 |
|
240 |
### Model
|
241 |
|
242 |
-
- Architecture: Phi-3 Mini-128K-Instruct has 3.8B parameters and is a dense decoder-only Transformer model. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety
|
243 |
- Inputs: Text. It is best suited for prompts using chat format.
|
244 |
- Context length: 128K tokens
|
245 |
- GPUs: 512 H100-80G
|
@@ -262,7 +265,7 @@ We are focusing on the quality of data that could potentially improve the reason
|
|
262 |
|
263 |
### Fine-tuning
|
264 |
|
265 |
-
A basic example of multi-GPUs supervised fine-tuning (SFT) with TRL and Accelerate modules is provided [here](https://huggingface.co/
|
266 |
|
267 |
## Benchmarks
|
268 |
|
|
|
15 |
# Phi-3-mini-128k-instruct GPTQ 4-bit 128g Group Size
|
16 |
|
17 |
- Model creator: [Microsoft](https://huggingface.co/microsoft)
|
18 |
+
- Original model: [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)
|
19 |
+
|
20 |
+
- Quantization code: [justinthelaw's GitHub](https://github.com/justinthelaw/quantization-pipeline-experiments)
|
21 |
+
- Quantization creator: [Justin Law](https://huggingface.co/justinthelaw)
|
22 |
|
23 |
<!-- description start -->
|
24 |
## Description
|
|
|
40 |
|
41 |
The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.
|
42 |
This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.
|
43 |
+
The model belongs to the Phi-3 family with the Mini version in two variants [4K](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) and [128K](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) which is the context length (in tokens) that it can support.
|
44 |
|
45 |
After initial training, the model underwent a post-training process that involved supervised fine-tuning and direct preference optimization to enhance its ability to follow instructions and adhere to safety measures.
|
46 |
When evaluated against benchmarks that test common sense, language understanding, mathematics, coding, long-term context, and logical reasoning, the Phi-3 Mini-128K-Instruct demonstrated robust and state-of-the-art performance among models with fewer than 13 billion parameters.
|
|
|
242 |
|
243 |
### Model
|
244 |
|
245 |
+
- Architecture: Phi-3 Mini-128K-Instruct has 3.8B parameters and is a dense decoder-only Transformer model. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidelines.
|
246 |
- Inputs: Text. It is best suited for prompts using chat format.
|
247 |
- Context length: 128K tokens
|
248 |
- GPUs: 512 H100-80G
|
|
|
265 |
|
266 |
### Fine-tuning
|
267 |
|
268 |
+
A basic example of multi-GPUs supervised fine-tuning (SFT) with TRL and Accelerate modules is provided [here](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/sample_finetune.py).
|
269 |
|
270 |
## Benchmarks
|
271 |
|