--- license: apache-2.0 tags: - generated_from_trainer - financial - stocks - sentiment - sentiment-analysis - financial-news widget: - text: The company's quarterly earnings surpassed all estimates, indicating strong growth. datasets: - financial_phrasebank metrics: - accuracy model-index: - name: AnkitAI/distilbert-base-uncased-financial-news-sentiment-analysis results: - task: name: Text Classification type: text-classification dataset: name: financial_phrasebank type: financial_phrasebank args: sentences_allagree metrics: - name: Accuracy type: accuracy value: 0.96688 language: - en base_model: - distilbert/distilbert-base-uncased-finetuned-sst-2-english pipeline_tag: text-classification library_name: transformers --- # DistilBERT Fine-Tuned for Financial Sentiment Analysis ## Model Description This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) specifically tailored for sentiment analysis in the financial domain. It has been trained on the [Financial PhraseBank](https://huggingface.co/datasets/financial_phrasebank) dataset to classify financial texts into three sentiment categories: - Negative (label `0`) - Neutral (label `1`) - Positive (label `2`) ## Model Performance The model was trained for 5 epochs and evaluated on a held-out test set constituting 20 of the dataset. ### Evaluation Metrics | Epoch | Eval Loss | Eval Accuracy | |-----------|---------------|-------------------| | 1 | 0.2210 | 94.26% | | 2 | 0.1997 | 95.81% | | 3 | 0.1719 | 96.69% | | 4 | 0.2073 | 96.03% | | 5 | 0.1941 | **96.69%** | ### Training Metrics - **Final Training Loss**: 0.0797 - **Total Training Time**: Approximately 3869 seconds (~1.07 hours) - **Training Samples per Second**: 2.34 - **Training Steps per Second**: 0.147 ## Training Procedure ### Data - **Dataset**: [Financial PhraseBank](https://huggingface.co/datasets/financial_phrasebank) - **Configuration**: `sentences_allagree` (sentences where all annotators agreed on the sentiment) - **Dataset Size**: 2264 sentences - **Data Split**: 80% training (1811 samples), 20% testing (453 samples) ### Model Configuration - **Base Model**: [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) - **Number of Labels**: 3 (negative, neutral, positive) - **Tokenizer**: Same as the base model's tokenizer ### Hyperparameters - **Number of Epochs**: 5 - **Batch Size**: 16 (training), 64 (evaluation) - **Learning Rate**: 5e-5 - **Optimizer**: AdamW - **Evaluation Metric**: Accuracy - **Seed**: 42 (for reproducibility) ## Usage You can load and use the model with the Hugging Face `transformers` library as follows: ```python import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("AnkitAI/distilbert-base-uncased-financial-news-sentiment-analysis") model = AutoModelForSequenceClassification.from_pretrained("AnkitAI/distilbert-base-uncased-financial-news-sentiment-analysis") text = "The company's revenue declined significantly due to market competition." inputs = tokenizer(text, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits predicted_class_id = logits.argmax().item() label_mapping = {0: "Negative", 1: "Neutral", 2: "Positive"} predicted_label = label_mapping[predicted_class_id] print(f"Text: {text}") print(f"Predicted Sentiment: {predicted_label}") ``` ## License This model is licensed under the **Apache 2.0 License**. You are free to use, modify, and distribute this model in your applications. ## Citation If you use this model in your research or applications, please cite it as: ``` @misc{AnkitAI_2024_financial_sentiment_model, title={DistilBERT Fine-Tuned for Financial Sentiment Analysis}, author={Ankit Aglawe}, year={2024}, howpublished={\url{https://huggingface.co/AnkitAI/distilbert-base-uncased-financial-news-sentiment-analysis}}, } ``` ## Acknowledgments - **Hugging Face**: For providing the Transformers library and model hosting. - **Data Providers**: Thanks to the creators of the Financial PhraseBank dataset. - **Community**: Appreciation to the open-source community for continual support and contributions. ## Contact Information For questions, feedback, or collaboration opportunities, please contact: - **Name**: Ankit Aglawe - **Email**: [aglawe.ankit@gmail.com] - **GitHub**: [GitHub Profile](https://github.com/ankit-aglawe) - **LinkedIn**: [LinkedIn Profile](https://www.linkedin.com/in/ankit-aglawe)