Initialized

Files changed (8) hide show

README.md ADDED Viewed

+---
+language: fa
+license: apache-2.0
+---
+# DistilBERT
+This model can tackle the zero-width non-joiner character for Persian writing. Also, the model was trained on new multi-types corpora with a new set of vocabulary.
+## Questions?
+Post a Github issue on the [ParsBERT Issues](https://github.com/hooshvare/parsbert/issues) repo.

config.json ADDED Viewed

+{
+  "activation": "gelu",
+  "architectures": [
+    "DistilBertForMaskedLM"
+  ],
+  "attention_dropout": 0.1,
+  "dim": 768,
+  "dropout": 0.1,
+  "hidden_dim": 3072,
+  "initializer_range": 0.02,
+  "max_position_embeddings": 512,
+  "model_type": "distilbert",
+  "n_heads": 12,
+  "n_layers": 6,
+  "output_past": true,
+  "pad_token_id": 0,
+  "qa_dropout": 0.1,
+  "seq_classif_dropout": 0.2,
+  "sinusoidal_pos_embds": false,
+  "tie_weights_": true,
+  "transformers_version": "4.2.2",
+  "vocab_size": 42000
+}

pytorch_model.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:6fb06d71fdc2d51b9d43e80dd0255e85505f88d0adea623e1559706161284b40
+size 303291090

special_tokens_map.json ADDED Viewed

+{
+    "unk_token": "[UNK]",
+    "sep_token": "[SEP]",
+    "pad_token": "[PAD]",
+    "cls_token": "[CLS]",
+    "mask_token": "[MASK]"
+}

tf_model.h5 ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:4dff42295d516084101a09f1c730bce340a945660de493dbdfc9a4357f03d498
+size 433990736

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

+{
+    "do_lower_case": false,
+    "unk_token": "[UNK]",
+    "sep_token": "[SEP]",
+    "pad_token": "[PAD]",
+    "cls_token": "[CLS]",
+    "mask_token": "[MASK]",
+    "tokenize_chinese_chars": true,
+    "strip_accents": false,
+    "special_tokens_map_file": null
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff