Update README.md
#2
by
zmadscientist
- opened
README.md
CHANGED
@@ -1,3 +1,20 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
|
5 |
+
QuaLA-MiniLM: a Quantized Length Adaptive
|
6 |
+
MiniLM
|
7 |
+
|
8 |
+
The article discusses the challenge of making transformer-based models efficient enough for practical use,
|
9 |
+
given their size and computational requirements. The authors propose a new approach called QuaLA-MiniLM,
|
10 |
+
which combines knowledge distillation, the length-adaptive transformer (LAT) technique,
|
11 |
+
and low-bit quantization. This approach trains a single model that can adapt to any
|
12 |
+
inference scenario with a given computational budget, achieving a superior accuracy-efficiency
|
13 |
+
trade-off on the SQuAD1.1 dataset. The authors compare this approach to other efficient methods
|
14 |
+
and find that it achieves up to an x8.8 speedup with less than 1% accuracy loss.
|
15 |
+
The authors also provide their code publicly on GitHub. The article also discusses other related work
|
16 |
+
in the field, including dynamic transformers and other knowledge distillation approaches.
|
17 |
+
|
18 |
+
|
19 |
+
|
20 |
+
|