bourdoiscatie commited on
Commit
8e8433c
·
verified ·
1 Parent(s): 677554f
Files changed (1) hide show
  1. dist/index.html +11 -10
dist/index.html CHANGED
@@ -1256,7 +1256,8 @@
1256
  <h4 id="passage-l-chelle">Model size</h4>
1257
  <p class="width_125"> T5/FLAN-T5 have been trained to 11 billion parameters, demonstrating that this architecture can scale.<br>
1258
  We would like to offer larger models with a FAT5-base and a FAT5-large with 305M and 973M parameters respectively, which we would then like to distil. The aim is to offer models that consume as little as possible in routine/inference.<br>
1259
- We also expect the distilled models to perform better than models of equivalent size trained from scratch.
 
1260
  <br><br></p>
1261
 
1262
  <h4 id="modeles-specialises">Training data</h4>
@@ -1277,7 +1278,7 @@
1277
  <p class="width_125">
1278
  We introduced the FAT5 (Flash Attention T5) model, detailing our approach to optimizing various elements of the pre-training and finetuning processes.
1279
  This is based on kernels that enable Flash Attention to be used with a T5 and give the model a linear memory.
1280
- In particular, we've applied our work to French, and made sure that it can also be used in any other language.
1281
  We hope that our method, which enables a model with 147M parameters to be pre-trained from scratch for €1,600, will be useful for people with limited computational resources.
1282
  It also opens the way for a possible comeback of encoder-decoder models, rather than only decoder models.<br>
1283
  <p class="width_125"><br><br></p>
@@ -1302,14 +1303,14 @@
1302
  </style>
1303
 
1304
  <h3 id="citation">Citation</h3>
1305
- <pre class="citation long">@misc{FAT5_blogpost,
1306
- title={ FAT5: Flash Attention T5 },
1307
- author={ Boris ALBAR and Loïck BOURDOIS },
1308
- organization={ Centre Aquitain des Technologies de l'Information et Electroniques },
1309
- year={2024},
1310
- url={ https://huggingface.co/spaces/CATIE-AQ/FAT5-report },
1311
- doi={ 10.57967/hf/0821 },
1312
- publisher= { Hugging Face }
1313
  }</pre>
1314
 
1315
  <d-appendix style="color: #9CA3AF;" >
 
1256
  <h4 id="passage-l-chelle">Model size</h4>
1257
  <p class="width_125"> T5/FLAN-T5 have been trained to 11 billion parameters, demonstrating that this architecture can scale.<br>
1258
  We would like to offer larger models with a FAT5-base and a FAT5-large with 305M and 973M parameters respectively, which we would then like to distil. The aim is to offer models that consume as little as possible in routine/inference.<br>
1259
+ We also expect the distilled models to perform better than models of equivalent size trained from scratch.<br>
1260
+ This should also allow us to propose models that will be used in practice. Indeed, in the current state for French, if the user is more motivated by performance than by the memory size of the model, he has more interest in using a CamemBERTa 2.0 for classification tasks. The present FAT5 should therefore be seen more as a proof of concept before being scaled up to make it competitive.
1261
  <br><br></p>
1262
 
1263
  <h4 id="modeles-specialises">Training data</h4>
 
1278
  <p class="width_125">
1279
  We introduced the FAT5 (Flash Attention T5) model, detailing our approach to optimizing various elements of the pre-training and finetuning processes.
1280
  This is based on kernels that enable Flash Attention to be used with a T5 and give the model a linear memory.
1281
+ In particular, we've applied our work to French as a proof of concept, and made sure that it can also be used in any other language.
1282
  We hope that our method, which enables a model with 147M parameters to be pre-trained from scratch for €1,600, will be useful for people with limited computational resources.
1283
  It also opens the way for a possible comeback of encoder-decoder models, rather than only decoder models.<br>
1284
  <p class="width_125"><br><br></p>
 
1303
  </style>
1304
 
1305
  <h3 id="citation">Citation</h3>
1306
+ <pre class="citation long">@misc {FAT5,
1307
+ title = { FAT5: Flash Attention T5 },
1308
+ author = { Boris ALBAR and Loïck BOURDOIS },
1309
+ organization = { Centre Aquitain des Technologies de l'Information et Electroniques },
1310
+ year = 2025,
1311
+ url = { https://huggingface.co/spaces/CATIE-AQ/FAT5-report },
1312
+ doi = { 10.57967/hf/4160 },
1313
+ publisher = { Hugging Face }
1314
  }</pre>
1315
 
1316
  <d-appendix style="color: #9CA3AF;" >