justinthelaw commited on
Commit
0799a23
1 Parent(s): 73b9f2c

fix README links

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -15,7 +15,10 @@ tags:
15
  # Phi-3-mini-128k-instruct GPTQ 4-bit 128g Group Size
16
 
17
  - Model creator: [Microsoft](https://huggingface.co/microsoft)
18
- - Original model: [Phi-3-mini-128k-instruct](https://huggingface.co/justinthelaw/Phi-3-mini-128k-instruct-4bit-128g)
 
 
 
19
 
20
  <!-- description start -->
21
  ## Description
@@ -37,7 +40,7 @@ Models are released as sharded safetensors files.
37
 
38
  The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.
39
  This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.
40
- The model belongs to the Phi-3 family with the Mini version in two variants [4K](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) and [128K](https://huggingface.co/justinthelaw/Phi-3-mini-128k-instruct-4bit-128g) which is the context length (in tokens) that it can support.
41
 
42
  After initial training, the model underwent a post-training process that involved supervised fine-tuning and direct preference optimization to enhance its ability to follow instructions and adhere to safety measures.
43
  When evaluated against benchmarks that test common sense, language understanding, mathematics, coding, long-term context, and logical reasoning, the Phi-3 Mini-128K-Instruct demonstrated robust and state-of-the-art performance among models with fewer than 13 billion parameters.
@@ -239,7 +242,7 @@ Developers should apply responsible AI best practices and are responsible for en
239
 
240
  ### Model
241
 
242
- - Architecture: Phi-3 Mini-128K-Instruct has 3.8B parameters and is a dense decoder-only Transformer model. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidlines.
243
  - Inputs: Text. It is best suited for prompts using chat format.
244
  - Context length: 128K tokens
245
  - GPUs: 512 H100-80G
@@ -262,7 +265,7 @@ We are focusing on the quality of data that could potentially improve the reason
262
 
263
  ### Fine-tuning
264
 
265
- A basic example of multi-GPUs supervised fine-tuning (SFT) with TRL and Accelerate modules is provided [here](https://huggingface.co/justinthelaw/Phi-3-mini-128k-instruct-4bit-128g/resolve/main/sample_finetune.py).
266
 
267
  ## Benchmarks
268
 
 
15
  # Phi-3-mini-128k-instruct GPTQ 4-bit 128g Group Size
16
 
17
  - Model creator: [Microsoft](https://huggingface.co/microsoft)
18
+ - Original model: [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)
19
+
20
+ - Quantization code: [justinthelaw's GitHub](https://github.com/justinthelaw/quantization-pipeline-experiments)
21
+ - Quantization creator: [Justin Law](https://huggingface.co/justinthelaw)
22
 
23
  <!-- description start -->
24
  ## Description
 
40
 
41
  The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.
42
  This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.
43
+ The model belongs to the Phi-3 family with the Mini version in two variants [4K](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) and [128K](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) which is the context length (in tokens) that it can support.
44
 
45
  After initial training, the model underwent a post-training process that involved supervised fine-tuning and direct preference optimization to enhance its ability to follow instructions and adhere to safety measures.
46
  When evaluated against benchmarks that test common sense, language understanding, mathematics, coding, long-term context, and logical reasoning, the Phi-3 Mini-128K-Instruct demonstrated robust and state-of-the-art performance among models with fewer than 13 billion parameters.
 
242
 
243
  ### Model
244
 
245
+ - Architecture: Phi-3 Mini-128K-Instruct has 3.8B parameters and is a dense decoder-only Transformer model. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidelines.
246
  - Inputs: Text. It is best suited for prompts using chat format.
247
  - Context length: 128K tokens
248
  - GPUs: 512 H100-80G
 
265
 
266
  ### Fine-tuning
267
 
268
+ A basic example of multi-GPUs supervised fine-tuning (SFT) with TRL and Accelerate modules is provided [here](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/sample_finetune.py).
269
 
270
  ## Benchmarks
271