ccdv commited on
Commit
647c3ff
·
1 Parent(s): 686895e
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ---
2
  language: en
3
  tags:
 
4
  - long context
5
  ---
6
 
@@ -9,6 +10,8 @@ tags:
9
  **This model relies on a custom modeling file, you need to add trust_remote_code=True**\
10
  **See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
11
 
 
 
12
  * [Usage](#usage)
13
  * [Parameters](#parameters)
14
  * [Sparse selection type](#sparse-selection-type)
@@ -20,8 +23,7 @@ This model is adapted from [distilbert-base-uncased](https://huggingface.co/dist
20
 
21
  This model can handle long sequences but faster and more efficiently than Longformer or BigBird (from Transformers) and relies on Local + Sparse + Global attention (LSG).
22
 
23
-
24
- The model requires sequences whose length is a multiple of the block size. The model is "adaptive" and automatically pads the sequences if needed (adaptive=True in config). It is however recommended, thanks to the tokenizer, to truncate the inputs (truncation=True) and optionally to pad with a multiple of the block size (pad_to_multiple_of=...). \
25
 
26
 
27
  Support encoder-decoder and causal masking but I didnt test it extensively.\
 
1
  ---
2
  language: en
3
  tags:
4
+ - distilbert
5
  - long context
6
  ---
7
 
 
10
  **This model relies on a custom modeling file, you need to add trust_remote_code=True**\
11
  **See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
12
 
13
+ Conversion script is available at this [link](https://github.com/ccdv-ai/convert_checkpoint_to_lsg).
14
+
15
  * [Usage](#usage)
16
  * [Parameters](#parameters)
17
  * [Sparse selection type](#sparse-selection-type)
 
23
 
24
  This model can handle long sequences but faster and more efficiently than Longformer or BigBird (from Transformers) and relies on Local + Sparse + Global attention (LSG).
25
 
26
+ The model requires sequences whose length is a multiple of the block size. The model is "adaptive" and automatically pads the sequences if needed (adaptive=True in config). It is however recommended, thanks to the tokenizer, to truncate the inputs (truncation=True) and optionally to pad with a multiple of the block size (pad_to_multiple_of=...).
 
27
 
28
 
29
  Support encoder-decoder and causal masking but I didnt test it extensively.\