Spaces:
Running
Running
bourdoiscatie
commited on
Add DOI
Browse files- dist/index.html +11 -10
dist/index.html
CHANGED
@@ -1256,7 +1256,8 @@
|
|
1256 |
<h4 id="passage-l-chelle">Model size</h4>
|
1257 |
<p class="width_125"> T5/FLAN-T5 have been trained to 11 billion parameters, demonstrating that this architecture can scale.<br>
|
1258 |
We would like to offer larger models with a FAT5-base and a FAT5-large with 305M and 973M parameters respectively, which we would then like to distil. The aim is to offer models that consume as little as possible in routine/inference.<br>
|
1259 |
-
We also expect the distilled models to perform better than models of equivalent size trained from scratch
|
|
|
1260 |
<br><br></p>
|
1261 |
|
1262 |
<h4 id="modeles-specialises">Training data</h4>
|
@@ -1277,7 +1278,7 @@
|
|
1277 |
<p class="width_125">
|
1278 |
We introduced the FAT5 (Flash Attention T5) model, detailing our approach to optimizing various elements of the pre-training and finetuning processes.
|
1279 |
This is based on kernels that enable Flash Attention to be used with a T5 and give the model a linear memory.
|
1280 |
-
In particular, we've applied our work to French, and made sure that it can also be used in any other language.
|
1281 |
We hope that our method, which enables a model with 147M parameters to be pre-trained from scratch for €1,600, will be useful for people with limited computational resources.
|
1282 |
It also opens the way for a possible comeback of encoder-decoder models, rather than only decoder models.<br>
|
1283 |
<p class="width_125"><br><br></p>
|
@@ -1302,14 +1303,14 @@
|
|
1302 |
</style>
|
1303 |
|
1304 |
<h3 id="citation">Citation</h3>
|
1305 |
-
<pre class="citation long">@misc{
|
1306 |
-
|
1307 |
-
|
1308 |
-
|
1309 |
-
|
1310 |
-
|
1311 |
-
|
1312 |
-
|
1313 |
}</pre>
|
1314 |
|
1315 |
<d-appendix style="color: #9CA3AF;" >
|
|
|
1256 |
<h4 id="passage-l-chelle">Model size</h4>
|
1257 |
<p class="width_125"> T5/FLAN-T5 have been trained to 11 billion parameters, demonstrating that this architecture can scale.<br>
|
1258 |
We would like to offer larger models with a FAT5-base and a FAT5-large with 305M and 973M parameters respectively, which we would then like to distil. The aim is to offer models that consume as little as possible in routine/inference.<br>
|
1259 |
+
We also expect the distilled models to perform better than models of equivalent size trained from scratch.<br>
|
1260 |
+
This should also allow us to propose models that will be used in practice. Indeed, in the current state for French, if the user is more motivated by performance than by the memory size of the model, he has more interest in using a CamemBERTa 2.0 for classification tasks. The present FAT5 should therefore be seen more as a proof of concept before being scaled up to make it competitive.
|
1261 |
<br><br></p>
|
1262 |
|
1263 |
<h4 id="modeles-specialises">Training data</h4>
|
|
|
1278 |
<p class="width_125">
|
1279 |
We introduced the FAT5 (Flash Attention T5) model, detailing our approach to optimizing various elements of the pre-training and finetuning processes.
|
1280 |
This is based on kernels that enable Flash Attention to be used with a T5 and give the model a linear memory.
|
1281 |
+
In particular, we've applied our work to French as a proof of concept, and made sure that it can also be used in any other language.
|
1282 |
We hope that our method, which enables a model with 147M parameters to be pre-trained from scratch for €1,600, will be useful for people with limited computational resources.
|
1283 |
It also opens the way for a possible comeback of encoder-decoder models, rather than only decoder models.<br>
|
1284 |
<p class="width_125"><br><br></p>
|
|
|
1303 |
</style>
|
1304 |
|
1305 |
<h3 id="citation">Citation</h3>
|
1306 |
+
<pre class="citation long">@misc {FAT5,
|
1307 |
+
title = { FAT5: Flash Attention T5 },
|
1308 |
+
author = { Boris ALBAR and Loïck BOURDOIS },
|
1309 |
+
organization = { Centre Aquitain des Technologies de l'Information et Electroniques },
|
1310 |
+
year = 2025,
|
1311 |
+
url = { https://huggingface.co/spaces/CATIE-AQ/FAT5-report },
|
1312 |
+
doi = { 10.57967/hf/4160 },
|
1313 |
+
publisher = { Hugging Face }
|
1314 |
}</pre>
|
1315 |
|
1316 |
<d-appendix style="color: #9CA3AF;" >
|