Spaces:
Running
Running
xrsrke
commited on
Commit
·
54af1f1
1
Parent(s):
1355d8e
link nanotron's fp8 implementation
Browse files- dist/bibliography.bib +6 -0
- dist/index.html +1 -1
- src/bibliography.bib +6 -0
- src/index.html +1 -1
dist/bibliography.bib
CHANGED
@@ -510,4 +510,10 @@ url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
|
|
510 |
archivePrefix={arXiv},
|
511 |
primaryClass={cs.LG},
|
512 |
url={https://arxiv.org/abs/2309.14322},
|
|
|
|
|
|
|
|
|
|
|
|
|
513 |
}
|
|
|
510 |
archivePrefix={arXiv},
|
511 |
primaryClass={cs.LG},
|
512 |
url={https://arxiv.org/abs/2309.14322},
|
513 |
+
}
|
514 |
+
@software{nanotronfp8,
|
515 |
+
title = {naotron's FP8 implementation},
|
516 |
+
author = {nanotron},
|
517 |
+
url = {https://github.com/huggingface/nanotron/pull/70},
|
518 |
+
year = {2024}
|
519 |
}
|
dist/index.html
CHANGED
@@ -2215,7 +2215,7 @@
|
|
2215 |
</tbody>
|
2216 |
</table>
|
2217 |
|
2218 |
-
<p>Overall, FP8 is still an experimental technique and methods are evolving, but will likely become the standard soon replacing bf16 mixed-precision. To follow public implementations of this, please head to the nanotron’s implementation
|
2219 |
|
2220 |
<p>In the future, Blackwell, the next generation of NVIDIA chips, <a href="https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/">have been announced </a> to support FP4 training, further speeding up training but without a doubt also introducing a new training stability challenge.</p>
|
2221 |
|
|
|
2215 |
</tbody>
|
2216 |
</table>
|
2217 |
|
2218 |
+
<p>Overall, FP8 is still an experimental technique and methods are evolving, but will likely become the standard soon replacing bf16 mixed-precision. To follow public implementations of this, please head to the nanotron’s implementation<d-cite bibtex-key="nanotronfp8"></d-cite>. </p>
|
2219 |
|
2220 |
<p>In the future, Blackwell, the next generation of NVIDIA chips, <a href="https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/">have been announced </a> to support FP4 training, further speeding up training but without a doubt also introducing a new training stability challenge.</p>
|
2221 |
|
src/bibliography.bib
CHANGED
@@ -510,4 +510,10 @@ url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
|
|
510 |
archivePrefix={arXiv},
|
511 |
primaryClass={cs.LG},
|
512 |
url={https://arxiv.org/abs/2309.14322},
|
|
|
|
|
|
|
|
|
|
|
|
|
513 |
}
|
|
|
510 |
archivePrefix={arXiv},
|
511 |
primaryClass={cs.LG},
|
512 |
url={https://arxiv.org/abs/2309.14322},
|
513 |
+
}
|
514 |
+
@software{nanotronfp8,
|
515 |
+
title = {naotron's FP8 implementation},
|
516 |
+
author = {nanotron},
|
517 |
+
url = {https://github.com/huggingface/nanotron/pull/70},
|
518 |
+
year = {2024}
|
519 |
}
|
src/index.html
CHANGED
@@ -2215,7 +2215,7 @@
|
|
2215 |
</tbody>
|
2216 |
</table>
|
2217 |
|
2218 |
-
<p>Overall, FP8 is still an experimental technique and methods are evolving, but will likely become the standard soon replacing bf16 mixed-precision. To follow public implementations of this, please head to the nanotron’s implementation
|
2219 |
|
2220 |
<p>In the future, Blackwell, the next generation of NVIDIA chips, <a href="https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/">have been announced </a> to support FP4 training, further speeding up training but without a doubt also introducing a new training stability challenge.</p>
|
2221 |
|
|
|
2215 |
</tbody>
|
2216 |
</table>
|
2217 |
|
2218 |
+
<p>Overall, FP8 is still an experimental technique and methods are evolving, but will likely become the standard soon replacing bf16 mixed-precision. To follow public implementations of this, please head to the nanotron’s implementation<d-cite bibtex-key="nanotronfp8"></d-cite>. </p>
|
2219 |
|
2220 |
<p>In the future, Blackwell, the next generation of NVIDIA chips, <a href="https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/">have been announced </a> to support FP4 training, further speeding up training but without a doubt also introducing a new training stability challenge.</p>
|
2221 |
|