Update README.md
Browse files
README.md
CHANGED
@@ -48,6 +48,9 @@ language:
|
|
48 |
This model has been trained on the EntityCS corpus, a multilingual corpus from Wikipedia with replaces entities in different languages.
|
49 |
The corpus can be found in [https://huggingface.co/huawei-noah/entity_cs](https://huggingface.co/huawei-noah/entity_cs), check the link for more details.
|
50 |
|
|
|
|
|
|
|
51 |
To integrate entity-level cross-lingual knowledge into the model, we propose Entity Prediction objectives, where we only mask subwords belonging
|
52 |
to an entity. By predicting the masked entities in ENTITYCS sentences, we expect the model to capture the semantics of the same entity in different
|
53 |
languages.
|
|
|
48 |
This model has been trained on the EntityCS corpus, a multilingual corpus from Wikipedia with replaces entities in different languages.
|
49 |
The corpus can be found in [https://huggingface.co/huawei-noah/entity_cs](https://huggingface.co/huawei-noah/entity_cs), check the link for more details.
|
50 |
|
51 |
+
Firstly, we employ the conventional 80-10-10 MLM objective, where 15% of sentence subwords are considered as masking candidates. From those, we replace subwords
|
52 |
+
with [MASK] 80% of the time, with Random subwords (from the entire vocabulary) 10% of the time, and leave the remaining 10% unchanged (Same).
|
53 |
+
|
54 |
To integrate entity-level cross-lingual knowledge into the model, we propose Entity Prediction objectives, where we only mask subwords belonging
|
55 |
to an entity. By predicting the masked entities in ENTITYCS sentences, we expect the model to capture the semantics of the same entity in different
|
56 |
languages.
|