|
--- |
|
license: apache-2.0 |
|
tags: |
|
- TDA |
|
metrics: |
|
- accuracy |
|
- matthews_correlation |
|
model-index: |
|
- name: roberta-large-cased-en-cola_32_2e-05_lr_0.0001_decay_balanced_frozen |
|
results: [] |
|
datasets: |
|
- shivkumarganesh/CoLA |
|
language: |
|
- en |
|
widget: |
|
- text: The book was by John written. |
|
--- |
|
|
|
[**Official repository**](https://github.com/upunaprosk/la-tda) |
|
|
|
# RoBERTa-large-TDA-frozen |
|
|
|
This model is a pre-trained version of [roberta-large](https://huggingface.co/roberta-large) with frozen weights and a linear layer |
|
trained over [CLS]-pooled text representations on [CoLA](https://nyu-mll.github.io/CoLA/). |
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.6793 |
|
- Accuracy: 0.7400 |
|
- Mcc: 0.3172 |
|
|
|
## Features extracted from Transformer |
|
|
|
The features extracted from attention maps include the following: |
|
|
|
1. **Topological features** are properties of attention graphs. Features of directed attention graphs include the number of strongly connected components, edges, simple cycles and average vertex degree. The properties of undirected graphs include |
|
the first two Betti numbers: the number of connected components and the number of simple cycles, the matching number and the chordality. |
|
|
|
2. **Features derived from barcodes** include descriptive characteristics of 0/1-dimensional barcodes and reflect the survival (death and birth) of |
|
connected components and edges throughout the filtration. |
|
|
|
3. **Distance-to-pattern** features measure the distance between attention matrices and identity matrices of pre-defined attention patterns, such as attention to the first token [CLS] and to the last |
|
[SEP] of the sequence, attention to previous and |
|
next token and to punctuation marks. |
|
|
|
The computed features and barcodes can be found in the subdirectories of the repository. *test_sub* features and barcodes were computed on the out of domain test [CoLA dataset](https://www.kaggle.com/c/cola-out-of-domain-open-evaluation/overview). |
|
Refer to notebooks 4* and 5* from the [repository](https://github.com/upunaprosk/la-tda) to construct the classification pipeline with TDA features. |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
Only a linear layer was trained over [CLS]-pooled text representations during training. |
|
The following hyperparameters were used during training: |
|
- learning_rate: 2e-05 |
|
- train_batch_size: 32 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- num_epochs: 5.0 |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.27.0.dev0 |
|
- Pytorch 1.13.1+cu116 |
|
- Datasets 2.9.0 |
|
- Tokenizers 0.13.2 |