Text Classification
PyTorch
English
bert
rabuahmad commited on
Commit
3eaae64
1 Parent(s): eac4ff4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -3
README.md CHANGED
@@ -1,3 +1,56 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - G11/climate_adaptation_abstracts
5
+ - pierre-pessarossi/wikipedia-climate-data
6
+ - rlacombe/ClimateX
7
+ language:
8
+ - en
9
+ base_model:
10
+ - google-bert/bert-base-uncased
11
+ pipeline_tag: text-classification
12
+ ---
13
+
14
+ ## Social Media Style Classifier for Climate Change Text
15
+
16
+
17
+ This model is a fine-tuned bert-base-uncased on a binary classification task to determine whether an English text about Climate Change is written in a social media style.
18
+
19
+ Social media texts were gathered from [ClimaConvo](https://github.com/shucoll/ClimaConvo) and [DEBAGREEMENT](https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/hash/6f3ef77ac0e3619e98159e9b6febf557-Abstract-round2.html).
20
+
21
+ Non-social media texts were gathered from diverse sources including article abstracts (G11/climate_adaptation_abstracts), Wikipedia articles (pierre-pessarossi/wikipedia-climate-data), and IPCC reports (rlacombe/ClimateX).
22
+
23
+ The dataset contained about 60K instances, with a 50/50 distribution between the two classes. It was shuffled with a random seed of 42 and split into 80/20 for training/testing.
24
+ The NVIDIA V100-16GB GPU was used for training three epochs with a batch size of 8. Other hyperparameters were default values from the HuggingFace Trainer.
25
+
26
+ The model was trained in order to evaluate a text style transfer task, converting formal-language texts to tweets.
27
+
28
+ ### How to use
29
+
30
+ ```python
31
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline
32
+
33
+ model_name = "rabuahmad/cc-tweets-classifier"
34
+
35
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
36
+ tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=512)
37
+
38
+ classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, truncation=True, max_length=512)
39
+
40
+ text = "Yesterday was a great day!"
41
+
42
+ result = classifier(text)
43
+
44
+ ```
45
+ Label 1 indicates that the text is predicted to be a tweet.
46
+
47
+ ### Evaluation
48
+
49
+ Evaluation results on the test set:
50
+
51
+ | Metric |Score |
52
+ |----------|-----------|
53
+ | Accuracy | |
54
+ | Precision| |
55
+ | Recall | |
56
+ | F1 | |