Papers
arxiv:2201.11838

Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Published on Jan 27, 2022
Authors:
,
,
,
,

Abstract

Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks. The clinical knowledge enriched model, namely Clinical<PRE_TAG>BERT</POST_TAG>, also achieved state-of-the-art results when performed on clinical <PRE_TAG>named entity recognition</POST_TAG> and natural language inference tasks. One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism. To overcome this, long sequence transformer models, e.g. Longformer and BigBird, were proposed with the idea of sparse attention mechanism to reduce the memory usage from quadratic to the sequence length to a linear scale. These models extended the maximum input sequence length from 512 to 4096, which enhanced the ability of modeling long-term dependency and consequently achieved optimal results in a variety of tasks. Inspired by the success of these long sequence transformer models, we introduce two domain enriched language models, namely Clinical-<PRE_TAG>Longformer</POST_TAG> and Clinical-<PRE_TAG>BigBird</POST_TAG>, which are pre-trained from large-scale clinical corpora. We evaluate both pre-trained models using 10 baseline tasks including named entity recognition, question answering, and document classification tasks. The results demonstrate that Clinical-<PRE_TAG>Longformer</POST_TAG> and Clinical-<PRE_TAG>BigBird</POST_TAG> consistently and significantly outperform Clinical<PRE_TAG>BERT</POST_TAG> as well as other short-sequence transformers in all downstream tasks. We have made our source code available at [https://github.com/luoyuanlab/Clinical-<PRE_TAG>Longformer</POST_TAG>] the pre-trained models available for public download at: [https://huggingface.co/yikuan8/Clinical-<PRE_TAG>Longformer</POST_TAG>].

Community

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2201.11838 in a dataset README.md to link it from this page.

Spaces citing this paper 4

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.