Spaces:

CATIE-AQ
/

FAT5-report

Running

App Files Files Community

bourdoiscatie commited on 13 days ago

Commit

8e8433c

verified ·

1 Parent(s): 677554f

Add DOI

Browse files

Files changed (1) hide show

dist/index.html +11 -10

dist/index.html CHANGED Viewed

@@ -1256,7 +1256,8 @@
     <h4 id="passage-l-chelle">Model size</h4>
     <p class="width_125"> T5/FLAN-T5 have been trained to 11 billion parameters, demonstrating that this architecture can scale.<br>
     We would like to offer larger models with a FAT5-base and a FAT5-large with 305M and 973M parameters respectively, which we would then like to distil. The aim is to offer models that consume as little as possible in routine/inference.<br>
-    We also expect the distilled models to perform better than models of equivalent size trained from scratch.
     <br><br></p>
     <h4 id="modeles-specialises">Training data</h4>
@@ -1277,7 +1278,7 @@
     <p class="width_125">
     We introduced the FAT5 (Flash Attention T5) model, detailing our approach to optimizing various elements of the pre-training and finetuning processes.
     This is based on kernels that enable Flash Attention to be used with a T5 and give the model a linear memory.
-    In particular, we've applied our work to French, and made sure that it can also be used in any other language.
     We hope that our method, which enables a model with 147M parameters to be pre-trained from scratch for €1,600, will be useful for people with limited computational resources.
     It also opens the way for a possible comeback of encoder-decoder models, rather than only decoder models.<br>
     <p class="width_125"><br><br></p>
@@ -1302,14 +1303,14 @@
     </style>
     <h3 id="citation">Citation</h3>
-    <pre class="citation long">@misc{FAT5_blogpost,
-      title={ FAT5: Flash Attention T5 },
-      author={ Boris ALBAR and Loïck BOURDOIS },
-      organization={ Centre Aquitain des Technologies de l'Information et Electroniques },
-      year={2024},
-      url={ https://huggingface.co/spaces/CATIE-AQ/FAT5-report },
-      doi={ 10.57967/hf/0821 },
-      publisher= { Hugging Face }
       }</pre>
     <d-appendix style="color: #9CA3AF;" >

     <h4 id="passage-l-chelle">Model size</h4>
     <p class="width_125"> T5/FLAN-T5 have been trained to 11 billion parameters, demonstrating that this architecture can scale.<br>
     We would like to offer larger models with a FAT5-base and a FAT5-large with 305M and 973M parameters respectively, which we would then like to distil. The aim is to offer models that consume as little as possible in routine/inference.<br>
+    We also expect the distilled models to perform better than models of equivalent size trained from scratch.<br>
+    This should also allow us to propose models that will be used in practice. Indeed, in the current state for French, if the user is more motivated by performance than by the memory size of the model, he has more interest in using a CamemBERTa 2.0 for classification tasks. The present FAT5 should therefore be seen more as a proof of concept before being scaled up to make it competitive.
     <br><br></p>
     <h4 id="modeles-specialises">Training data</h4>
     <p class="width_125">
     We introduced the FAT5 (Flash Attention T5) model, detailing our approach to optimizing various elements of the pre-training and finetuning processes.
     This is based on kernels that enable Flash Attention to be used with a T5 and give the model a linear memory.
+    In particular, we've applied our work to French as a proof of concept, and made sure that it can also be used in any other language.
     We hope that our method, which enables a model with 147M parameters to be pre-trained from scratch for €1,600, will be useful for people with limited computational resources.
     It also opens the way for a possible comeback of encoder-decoder models, rather than only decoder models.<br>
     <p class="width_125"><br><br></p>
     </style>
     <h3 id="citation">Citation</h3>
+    <pre class="citation long">@misc {FAT5,
+    title        = { FAT5: Flash Attention T5 },
+    author       = { Boris ALBAR and Loïck BOURDOIS },
+    organization = { Centre Aquitain des Technologies de l'Information et Electroniques },
+    year         = 2025,
+    url          = { https://huggingface.co/spaces/CATIE-AQ/FAT5-report },
+    doi          = { 10.57967/hf/4160 },
+    publisher    = { Hugging Face }
       }</pre>
     <d-appendix style="color: #9CA3AF;" >