justinthelaw
/

Phi-3-mini-128k-instruct-4bit-128g-GPTQ

@@ -15,7 +15,10 @@ tags:
 # Phi-3-mini-128k-instruct GPTQ 4-bit 128g Group Size
 - Model creator: [Microsoft](https://huggingface.co/microsoft)
-- Original model: [Phi-3-mini-128k-instruct](https://huggingface.co/justinthelaw/Phi-3-mini-128k-instruct-4bit-128g)
 <!-- description start -->
 ## Description
@@ -37,7 +40,7 @@ Models are released as sharded safetensors files.
 The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.
 This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.
-The model belongs to the Phi-3 family with the Mini version in two variants [4K](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) and [128K](https://huggingface.co/justinthelaw/Phi-3-mini-128k-instruct-4bit-128g) which is the context length (in tokens) that it can support.
 After initial training, the model underwent a post-training process that involved supervised fine-tuning and direct preference optimization to enhance its ability to follow instructions and adhere to safety measures.
 When evaluated against benchmarks that test common sense, language understanding, mathematics, coding, long-term context, and logical reasoning, the Phi-3 Mini-128K-Instruct demonstrated robust and state-of-the-art performance among models with fewer than 13 billion parameters.
@@ -239,7 +242,7 @@ Developers should apply responsible AI best practices and are responsible for en
 ### Model
-- Architecture: Phi-3 Mini-128K-Instruct has 3.8B parameters and is a dense decoder-only Transformer model. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidlines.
 - Inputs: Text. It is best suited for prompts using chat format.
 - Context length: 128K tokens
 - GPUs: 512 H100-80G
@@ -262,7 +265,7 @@ We are focusing on the quality of data that could potentially improve the reason
 ### Fine-tuning
-A basic example of multi-GPUs supervised fine-tuning (SFT) with TRL and Accelerate modules is provided [here](https://huggingface.co/justinthelaw/Phi-3-mini-128k-instruct-4bit-128g/resolve/main/sample_finetune.py).
 ## Benchmarks

 # Phi-3-mini-128k-instruct GPTQ 4-bit 128g Group Size
 - Model creator: [Microsoft](https://huggingface.co/microsoft)
+- Original model: [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)
+- Quantization code: [justinthelaw's GitHub](https://github.com/justinthelaw/quantization-pipeline-experiments)
+- Quantization creator: [Justin Law](https://huggingface.co/justinthelaw)
 <!-- description start -->
 ## Description
 The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.
 This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.
+The model belongs to the Phi-3 family with the Mini version in two variants [4K](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) and [128K](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) which is the context length (in tokens) that it can support.
 After initial training, the model underwent a post-training process that involved supervised fine-tuning and direct preference optimization to enhance its ability to follow instructions and adhere to safety measures.
 When evaluated against benchmarks that test common sense, language understanding, mathematics, coding, long-term context, and logical reasoning, the Phi-3 Mini-128K-Instruct demonstrated robust and state-of-the-art performance among models with fewer than 13 billion parameters.
 ### Model
+- Architecture: Phi-3 Mini-128K-Instruct has 3.8B parameters and is a dense decoder-only Transformer model. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidelines.
 - Inputs: Text. It is best suited for prompts using chat format.
 - Context length: 128K tokens
 - GPUs: 512 H100-80G
 ### Fine-tuning
+A basic example of multi-GPUs supervised fine-tuning (SFT) with TRL and Accelerate modules is provided [here](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/sample_finetune.py).
 ## Benchmarks