readme
Browse files
README.md
CHANGED
@@ -1,6 +1,7 @@
|
|
1 |
---
|
2 |
language: en
|
3 |
tags:
|
|
|
4 |
- long context
|
5 |
---
|
6 |
|
@@ -9,6 +10,8 @@ tags:
|
|
9 |
**This model relies on a custom modeling file, you need to add trust_remote_code=True**\
|
10 |
**See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
|
11 |
|
|
|
|
|
12 |
* [Usage](#usage)
|
13 |
* [Parameters](#parameters)
|
14 |
* [Sparse selection type](#sparse-selection-type)
|
@@ -20,8 +23,7 @@ This model is adapted from [distilbert-base-uncased](https://huggingface.co/dist
|
|
20 |
|
21 |
This model can handle long sequences but faster and more efficiently than Longformer or BigBird (from Transformers) and relies on Local + Sparse + Global attention (LSG).
|
22 |
|
23 |
-
|
24 |
-
The model requires sequences whose length is a multiple of the block size. The model is "adaptive" and automatically pads the sequences if needed (adaptive=True in config). It is however recommended, thanks to the tokenizer, to truncate the inputs (truncation=True) and optionally to pad with a multiple of the block size (pad_to_multiple_of=...). \
|
25 |
|
26 |
|
27 |
Support encoder-decoder and causal masking but I didnt test it extensively.\
|
|
|
1 |
---
|
2 |
language: en
|
3 |
tags:
|
4 |
+
- distilbert
|
5 |
- long context
|
6 |
---
|
7 |
|
|
|
10 |
**This model relies on a custom modeling file, you need to add trust_remote_code=True**\
|
11 |
**See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
|
12 |
|
13 |
+
Conversion script is available at this [link](https://github.com/ccdv-ai/convert_checkpoint_to_lsg).
|
14 |
+
|
15 |
* [Usage](#usage)
|
16 |
* [Parameters](#parameters)
|
17 |
* [Sparse selection type](#sparse-selection-type)
|
|
|
23 |
|
24 |
This model can handle long sequences but faster and more efficiently than Longformer or BigBird (from Transformers) and relies on Local + Sparse + Global attention (LSG).
|
25 |
|
26 |
+
The model requires sequences whose length is a multiple of the block size. The model is "adaptive" and automatically pads the sequences if needed (adaptive=True in config). It is however recommended, thanks to the tokenizer, to truncate the inputs (truncation=True) and optionally to pad with a multiple of the block size (pad_to_multiple_of=...).
|
|
|
27 |
|
28 |
|
29 |
Support encoder-decoder and causal masking but I didnt test it extensively.\
|