iproskurina
/

tda-roberta-large-frozen-en-cola

@@ -1,41 +1,52 @@
 ---
-license: mit
 tags:
-- generated_from_trainer
 metrics:
 - accuracy
 model-index:
 - name: roberta-large-cased-en-cola_32_2e-05_lr_0.0001_decay_balanced_frozen
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# roberta-large-cased-en-cola_32_2e-05_lr_0.0001_decay_balanced_frozen
-This model is a fine-tuned version of [roberta-large](https://huggingface.co/roberta-large) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.6793
 - Accuracy: 0.7400
 - Mcc: 0.3172
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
 - train_batch_size: 32
@@ -45,13 +56,9 @@ The following hyperparameters were used during training:
 - lr_scheduler_type: linear
 - num_epochs: 5.0
-### Training results
 ### Framework versions
 - Transformers 4.27.0.dev0
 - Pytorch 1.13.1+cu116
 - Datasets 2.9.0
-- Tokenizers 0.13.2

 ---
+license: apache-2.0
 tags:
+- TDA
 metrics:
 - accuracy
+- matthews_correlation
 model-index:
 - name: roberta-large-cased-en-cola_32_2e-05_lr_0.0001_decay_balanced_frozen
   results: []
+datasets:
+- shivkumarganesh/CoLA
+language:
+- en
 ---
+[**Official repository**](https://github.com/upunaprosk/la-tda)
+# RoBERTa-large-TDA
+This model is a pre-trained version of [roberta-large](https://huggingface.co/roberta-large) with frozen weights and a linear layer
+trained over [CLS]-pooled text representations on [CoLA](https://nyu-mll.github.io/CoLA/).
 It achieves the following results on the evaluation set:
 - Loss: 0.6793
 - Accuracy: 0.7400
 - Mcc: 0.3172
+## Features extracted from Transformer
+The features extracted from attention maps include the following:
+1. **Topological features** are properties of attention graphs. Features of directed attention graphs include the number of strongly connected components, edges, simple cycles and average vertex degree. The properties of undirected graphs include
+the first two Betti numbers: the number of connected components and the number of simple cycles, the matching number and the chordality.
+2. **Features derived from barcodes** include descriptive characteristics of 0/1-dimensional barcodes and reflect the survival (death and birth) of
+connected components and edges throughout the filtration.
+3. **Distance-to-pattern** features measure the distance between attention matrices and identity matrices of pre-defined attention patterns, such as attention to the first token [CLS] and to the last
+[SEP] of the sequence, attention to previous and
+next token and to punctuation marks.
+The computed features and barcodes can be found in the subdirectories of the repository. *test_sub*  features and barcodes were computed on the out of somain test [CoLA dataset](https://www.kaggle.com/c/cola-out-of-domain-open-evaluation/overview).
+Refer to the notebooks 4* and 5* from the [repository](https://github.com/upunaprosk/la-tda) to construct the classification pipeline with TDA features.
 ## Training procedure
 ### Training hyperparameters
+Only a linear layer was trained over [CLS]-pooled text representations during training.
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
 - train_batch_size: 32
 - lr_scheduler_type: linear
 - num_epochs: 5.0
 ### Framework versions
 - Transformers 4.27.0.dev0
 - Pytorch 1.13.1+cu116
 - Datasets 2.9.0
+- Tokenizers 0.13.2