File size: 2,532 Bytes
b3b2492
94dd72a
b3b2492
94dd72a
b3b2492
 
94dd72a
b3b2492
 
 
94dd72a
 
 
 
b3b2492
 
94dd72a
b3b2492
94dd72a
b3b2492
94dd72a
 
b3b2492
 
 
 
 
94dd72a
b3b2492
94dd72a
b3b2492
94dd72a
 
b3b2492
94dd72a
 
b3b2492
94dd72a
 
 
b3b2492
94dd72a
 
b3b2492
 
 
 
 
94dd72a
b3b2492
 
 
 
 
 
 
 
 
 
 
 
 
 
94dd72a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
license: apache-2.0
tags:
- TDA
metrics:
- accuracy
- matthews_correlation
model-index:
- name: roberta-large-cased-en-cola_32_2e-05_lr_0.0001_decay_balanced_frozen
  results: []
datasets:
- shivkumarganesh/CoLA
language:
- en
---

[**Official repository**](https://github.com/upunaprosk/la-tda)

# RoBERTa-large-TDA

This model is a pre-trained version of [roberta-large](https://huggingface.co/roberta-large) with frozen weights and a linear layer
trained over [CLS]-pooled text representations on [CoLA](https://nyu-mll.github.io/CoLA/).
It achieves the following results on the evaluation set:
- Loss: 0.6793
- Accuracy: 0.7400
- Mcc: 0.3172

## Features extracted from Transformer

The features extracted from attention maps include the following:

1. **Topological features** are properties of attention graphs. Features of directed attention graphs include the number of strongly connected components, edges, simple cycles and average vertex degree. The properties of undirected graphs include
the first two Betti numbers: the number of connected components and the number of simple cycles, the matching number and the chordality.

2. **Features derived from barcodes** include descriptive characteristics of 0/1-dimensional barcodes and reflect the survival (death and birth) of
connected components and edges throughout the filtration.

3. **Distance-to-pattern** features measure the distance between attention matrices and identity matrices of pre-defined attention patterns, such as attention to the first token [CLS] and to the last
[SEP] of the sequence, attention to previous and
next token and to punctuation marks.

The computed features and barcodes can be found in the subdirectories of the repository. *test_sub*  features and barcodes were computed on the out of somain test [CoLA dataset](https://www.kaggle.com/c/cola-out-of-domain-open-evaluation/overview).
Refer to the notebooks 4* and 5* from the [repository](https://github.com/upunaprosk/la-tda) to construct the classification pipeline with TDA features.

## Training procedure

### Training hyperparameters

Only a linear layer was trained over [CLS]-pooled text representations during training.
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 5.0

### Framework versions

- Transformers 4.27.0.dev0
- Pytorch 1.13.1+cu116
- Datasets 2.9.0
- Tokenizers 0.13.2