HuggingFaceTB
/

smollm-360M-instruct-add-basics

Text Generation

alignment-handbook

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

[WIP] Optimized q4f16 ONNX export

#3

by Xenova HF staff - opened Aug 15

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

Files changed (2) hide show

config.json +3 -0
onnx/model_q4f16.onnx +2 -2

config.json CHANGED Viewed

@@ -25,6 +25,9 @@
   "tie_word_embeddings": true,
   "torch_dtype": "bfloat16",
   "transformers_version": "4.42.3",
   "use_cache": true,
   "vocab_size": 49152
 }

   "tie_word_embeddings": true,
   "torch_dtype": "bfloat16",
   "transformers_version": "4.42.3",
+  "transformers.js_config": {
+    "kv_cache_dtype": "float16"
+  },
   "use_cache": true,
   "vocab_size": 49152
 }

onnx/model_q4f16.onnx CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e1a788453e1393e8642f43ca729b7f2301ba61cc1f8ac1f1904c809869fc1ffb
-size 272513495

 version https://git-lfs.github.com/spec/v1
+oid sha256:bc95f62ea740d675d75c0f263ecf467c950f4002d18428dce832cb2fd5705b9e
+size 298430898