Spaces:

CATIE-AQ
/

FAT5-report

Running

bourdoiscatie commited on 17 days ago

Commit

677554f

verified ·

1 Parent(s): 50714a6

Update dist/index.html

Files changed (1) hide show

dist/index.html CHANGED Viewed

@@ -58,7 +58,7 @@
     That's why we've decided to focus on the T5 <d-cite bibtex-key="JMLR:v21:20-074"></d-cite>.<br><br>
-    This article presents the optimisations we have implemented to efficiently pre-train a T5 with 147M French parameters in a reasonable time (1,461 H for 419B tokens) and with limited resources (1 single A100; i.e. a computing budget of around 2,200 euros).
     To achieve this, we designed CUDA/Triton kernels to make Flash Attention compatible with T5 and provide linear inference, thus extending the context size that can be taken into account by the model.<br><br>
     <strong>The pre-training code is available in our <a class="link" href="https://github.com/catie-aq/flashT5">GitHub repository</a> under Apache-2.0 license and weights on our <a class="link" href="https://hf.co/CATIE-AQ">Hugging Face</a> account.</strong>
     <p class="width_125"><br><br><br></p>

     That's why we've decided to focus on the T5 <d-cite bibtex-key="JMLR:v21:20-074"></d-cite>.<br><br>
+    This article presents the optimisations we have implemented to efficiently pre-train a T5 in French with 147M parameters in a reasonable time (1,461 H for 419B tokens) and with limited resources (1 single A100; i.e. a computing budget of around 2,200 euros).
     To achieve this, we designed CUDA/Triton kernels to make Flash Attention compatible with T5 and provide linear inference, thus extending the context size that can be taken into account by the model.<br><br>
     <strong>The pre-training code is available in our <a class="link" href="https://github.com/catie-aq/flashT5">GitHub repository</a> under Apache-2.0 license and weights on our <a class="link" href="https://hf.co/CATIE-AQ">Hugging Face</a> account.</strong>
     <p class="width_125"><br><br><br></p>