metadata

license: apache-2.0
base_model: bert-base-uncased
tags:
  - generated_from_trainer
  - sentiment_analysis
datasets:
  - ckandemir/bitcoin_tweets_sentiment_kaggle
metrics:
  - accuracy
  - f1
model-index:
  - name: crypto_sentiment
    results:
      - task:
          name: Text Classification
          type: text-classification
        dataset:
          name: ckandemir/bitcoin_tweets_sentiment_kaggle
          type: ckandemir/bitcoin_tweets_sentiment_kaggle
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.7150837988826816
          - name: F1
            type: f1
            value: 0.7212944928862212
language:
  - en
library_name: transformers
widget:
  - text: Sold all btc, tethered up before the correction.
pipeline_tag: text-classification

crypto_sentiment

This model is a fine-tuned version of bert-base-uncased on the ckandemir/bitcoin_tweets_sentiment_kaggle dataset. It achieves the following results on the evaluation set:

Loss: 0.4542
Accuracy: 0.7151
F1: 0.7213

Model description

The ckandemir/bitcoin_tweets_sentiment_kaggle is a sentiment analysis classifier fine-tuned on Bitcoin-related tweets. By leveraging bert-base-uncased model, it has been trained to classify tweets into various sentiment categories based on the content related to Bitcoin. This model is capable of understanding the nuances in the text of tweets and provides a sentiment score which can be leveraged for various analyses including market sentiment analysis, social media monitoring, and other applications where understanding public opinion regarding Bitcoin is crucial.

Intended uses

This model is intended to be used for sentiment analysis on Bitcoin-related text data, particularly tweets. It can be utilized by researchers, analysts, and developers who are interested in gauging public sentiment regarding Bitcoin on social media.

Limitations

The model may not perform well on text data that is significantly different in context or structure from the training data (Bitcoin-related tweets).
The model might not capture sentiment accurately for tweets with nuanced or sarcastic tones.

Training and evaluation data

The model was trained and evaluated on the ckandemir/bitcoin_tweets_sentiment_kaggle dataset. This dataset comprises tweets related to Bitcoin, labeled with sentiment scores.

Data Preparation

The initial dataset contained tweets in multiple languages. As part of the data preparation, only English tweets were extracted to ensure language consistency for model training. The following steps were performed for data preparation:
Language Detection: Identified and extracted only the tweets that were in English.
Data Cleaning: Removal of special characters.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 24
eval_batch_size: 24
seed: 42
gradient_accumulation_steps: 3
total_train_batch_size: 72
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine_with_restarts
lr_scheduler_warmup_steps: 1000
training_steps: 1000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1
0.8941	0.65	50	0.8733	0.5698	0.5654
0.8565	1.3	100	0.8042	0.6690	0.6031
0.7896	1.96	150	0.7219	0.6802	0.5740
0.7174	2.61	200	0.6379	0.7514	0.6955
0.633	3.26	250	0.5745	0.7514	0.6930
0.5824	3.91	300	0.5303	0.75	0.6919
0.5365	4.57	350	0.4997	0.7514	0.7014
0.5089	5.22	400	0.4766	0.7458	0.6991
0.4893	5.87	450	0.4596	0.7486	0.7174
0.463	6.52	500	0.4446	0.7514	0.7127
0.4496	7.17	550	0.4407	0.7165	0.7048
0.4357	7.83	600	0.4364	0.7277	0.7246
0.4257	8.48	650	0.4324	0.7067	0.7115
0.4029	9.13	700	0.4314	0.7277	0.7180
0.3955	9.78	750	0.4354	0.7151	0.7164
0.3886	10.43	800	0.4396	0.7221	0.7244
0.3788	11.09	850	0.4363	0.7235	0.7194
0.366	11.74	900	0.4528	0.7179	0.7215
0.3298	12.39	950	0.4766	0.7053	0.7107
0.3423	13.04	1000	0.4542	0.7151	0.7213

Framework versions

Transformers 4.35.0
Pytorch 2.1.0+cu118
Datasets 2.14.6
Tokenizers 0.14.1