Intel
/

dynamic-minilmv2-L6-H384-squad1.1-int8-static

Question Answering

Inference Endpoints

Model card Files Files and versions Community

Update README.md

#2

by zmadscientist - opened May 4, 2023

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

Files changed (1) hide show

README.md +17 -0

README.md CHANGED Viewed

@@ -1,3 +1,20 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+QuaLA-MiniLM: a Quantized Length Adaptive
+MiniLM
+The article discusses the challenge of making transformer-based models efficient enough for practical use,
+given their size and computational requirements. The authors propose a new approach called QuaLA-MiniLM,
+which combines knowledge distillation, the length-adaptive transformer (LAT) technique,
+and low-bit quantization. This approach trains a single model that can adapt to any
+inference scenario with a given computational budget, achieving a superior accuracy-efficiency
+trade-off on the SQuAD1.1 dataset. The authors compare this approach to other efficient methods
+and find that it achieves up to an x8.8 speedup with less than 1% accuracy loss.
+The authors also provide their code publicly on GitHub. The article also discusses other related work
+in the field, including dynamic transformers and other knowledge distillation approaches.