File size: 4,562 Bytes
f42842d 17f07da 24140c4 5d59cce faed870 aa2bd8d faed870 f42842d dc92cb5 f42842d d910415 e810c3f d910415 b9e82be 263ae9e d910415 b9e82be d910415 f42842d 379f42d f42842d 33368d3 f42842d d910415 f42842d d910415 f42842d d910415 f42842d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
---
language:
- en
tags:
- summarization
datasets:
- scientific_papers
metrics:
- rouge
model-index:
- name: ccdv/lsg-bart-base-4096-arxiv
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
**Transformers >= 4.23.1**\
**This model relies on a custom modeling file, you need to add trust_remote_code=True**\
**See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
LSG ArXiv [paper](https://arxiv.org/abs/2210.15497). \
Github/conversion script is available at this [link](https://github.com/ccdv-ai/convert_checkpoint_to_lsg).
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-base-4096-arxiv", trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("ccdv/lsg-bart-base-4096-arxiv", trust_remote_code=True)
text = "Replace by what you want."
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=0)
generated_text = pipe(
text,
truncation=True,
max_length=64,
no_repeat_ngram_size=7,
num_beams=2,
early_stopping=True
)
```
# ccdv/lsg-bart-base-4096-arxiv
This model is a fine-tuned version of [ccdv/lsg-bart-base-4096](https://huggingface.co/ccdv/lsg-bart-base-4096) on the [scientific_papers arxiv](https://huggingface.co/datasets/scientific_papers) dataset. \
It achieves the following results on the test set:
| Length | Sparse Type | Block Size | Sparsity | Connexions | R1 | R2 | RL | RLsum |
|:------ |:------------ |:---------- |:-------- | :--------- |:----- |:----- |:----- |:----- |
| 4096 | Local | 256 | 0 | 768 | 46.65 | 18.91 | 26.90 | 42.18 |
| 4096 | Local | 128 | 0 | 384 | 46.18 | 18.57 | 26.71 | 41.69 |
| 4096 | Pooling | 128 | 4 | 644 | 46.27 | 18.68 | 26.87 | 41.82 |
| 4096 | Stride | 128 | 4 | 644 | 46.34 | 18.64 | 26.69 | 41.87 |
| 4096 | Block Stride | 128 | 4 | 644 | 46.23 | 18.62 | 26.62 | 41.80 |
| 4096 | Norm | 128 | 4 | 644 | 45.96 | 18.46 | 26.52 | 41.51 |
| 4096 | LSH | 128 | 4 | 644 | 46.19 | 18.72 | 26.89 | 41.76 |
With smaller block size (lower ressources):
| Length | Sparse Type | Block Size | Sparsity | Connexions | R1 | R2 | RL | RLsum |
|:------ |:------------ |:---------- |:-------- | :--------- |:----- |:----- |:----- |:----- |
| 4096 | Local | 64 | 0 | 192 | 44.71 | 17.53 | 26.03 | 40.23 |
| 4096 | Local | 32 | 0 | 96 | 39.67 | 14.34 | 23.81 | 35.00 |
| 4096 | Pooling | 32 | 4 | 160 | 42.75 | 16.34 | 25.20 | 38.23 |
| 4096 | Stride | 32 | 4 | 160 | 44.23 | 17.21 | 25.71 | 39.72 |
| 4096 | Block Stride | 32 | 4 | 160 | 44.15 | 17.10 | 25.68 | 39.60 |
| 4096 | Norm | 32 | 4 | 160 | 42.02 | 15.65 | 24.56 | 37.45 |
| 4096 | LSH | 32 | 4 | 160 | 42.58 | 16.21 | 25.10 | 38.04 |
## Model description
The model relies on Local-Sparse-Global attention to handle long sequences:
data:image/s3,"s3://crabby-images/958e7/958e7ed1a344af1e2ba4bffc90842a1f1bba3c09" alt="attn"
The model has about ~145 millions parameters (6 encoder layers - 6 decoder layers). \
The model is warm started from BART-base, converted to handle long sequences (encoder only) and fine tuned.
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-05
- train_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 6.0
### Generate hyperparameters
The following hyperparameters were used during generation:
- dataset_name: scientific_papers
- dataset_config_name: arxiv
- eval_batch_size: 8
- eval_samples: 6440
- early_stopping: True
- ignore_pad_token_for_loss: True
- length_penalty: 2.0
- max_length: 320
- min_length: 32
- num_beams: 5
- no_repeat_ngram_size: None
- seed: 123
### Framework versions
- Transformers 4.18.0
- Pytorch 1.10.1+cu102
- Datasets 2.1.0
- Tokenizers 0.11.6
|