xrsrke commited on
Commit
54af1f1
·
1 Parent(s): 1355d8e

link nanotron's fp8 implementation

Browse files
dist/bibliography.bib CHANGED
@@ -510,4 +510,10 @@ url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
510
  archivePrefix={arXiv},
511
  primaryClass={cs.LG},
512
  url={https://arxiv.org/abs/2309.14322},
 
 
 
 
 
 
513
  }
 
510
  archivePrefix={arXiv},
511
  primaryClass={cs.LG},
512
  url={https://arxiv.org/abs/2309.14322},
513
+ }
514
+ @software{nanotronfp8,
515
+ title = {naotron's FP8 implementation},
516
+ author = {nanotron},
517
+ url = {https://github.com/huggingface/nanotron/pull/70},
518
+ year = {2024}
519
  }
dist/index.html CHANGED
@@ -2215,7 +2215,7 @@
2215
  </tbody>
2216
  </table>
2217
 
2218
- <p>Overall, FP8 is still an experimental technique and methods are evolving, but will likely become the standard soon replacing bf16 mixed-precision. To follow public implementations of this, please head to the nanotron’s implementation in [TODO: link to appendix]. </p>
2219
 
2220
  <p>In the future, Blackwell, the next generation of NVIDIA chips, <a href="https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/">have been announced </a> to support FP4 training, further speeding up training but without a doubt also introducing a new training stability challenge.</p>
2221
 
 
2215
  </tbody>
2216
  </table>
2217
 
2218
+ <p>Overall, FP8 is still an experimental technique and methods are evolving, but will likely become the standard soon replacing bf16 mixed-precision. To follow public implementations of this, please head to the nanotron’s implementation<d-cite bibtex-key="nanotronfp8"></d-cite>. </p>
2219
 
2220
  <p>In the future, Blackwell, the next generation of NVIDIA chips, <a href="https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/">have been announced </a> to support FP4 training, further speeding up training but without a doubt also introducing a new training stability challenge.</p>
2221
 
src/bibliography.bib CHANGED
@@ -510,4 +510,10 @@ url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
510
  archivePrefix={arXiv},
511
  primaryClass={cs.LG},
512
  url={https://arxiv.org/abs/2309.14322},
 
 
 
 
 
 
513
  }
 
510
  archivePrefix={arXiv},
511
  primaryClass={cs.LG},
512
  url={https://arxiv.org/abs/2309.14322},
513
+ }
514
+ @software{nanotronfp8,
515
+ title = {naotron's FP8 implementation},
516
+ author = {nanotron},
517
+ url = {https://github.com/huggingface/nanotron/pull/70},
518
+ year = {2024}
519
  }
src/index.html CHANGED
@@ -2215,7 +2215,7 @@
2215
  </tbody>
2216
  </table>
2217
 
2218
- <p>Overall, FP8 is still an experimental technique and methods are evolving, but will likely become the standard soon replacing bf16 mixed-precision. To follow public implementations of this, please head to the nanotron’s implementation in [TODO: link to appendix]. </p>
2219
 
2220
  <p>In the future, Blackwell, the next generation of NVIDIA chips, <a href="https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/">have been announced </a> to support FP4 training, further speeding up training but without a doubt also introducing a new training stability challenge.</p>
2221
 
 
2215
  </tbody>
2216
  </table>
2217
 
2218
+ <p>Overall, FP8 is still an experimental technique and methods are evolving, but will likely become the standard soon replacing bf16 mixed-precision. To follow public implementations of this, please head to the nanotron’s implementation<d-cite bibtex-key="nanotronfp8"></d-cite>. </p>
2219
 
2220
  <p>In the future, Blackwell, the next generation of NVIDIA chips, <a href="https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/">have been announced </a> to support FP4 training, further speeding up training but without a doubt also introducing a new training stability challenge.</p>
2221