--- library_name: transformers tags: [fake-news-detection, NLP, classification, transformers, DistilBERT] --- # Model Card for Fake News Detection Model ## Model Summary This is a fine-tuned DistilBERT model for **fake news detection**. It classifies news articles as either **real** or **fake** based on textual content. The model has been trained on a labeled dataset consisting of true and false news articles collected from various sources. ## Model Details ### Model Description - **Developed by:** Dhruv Pal - **Finetuned from:** `distilbert-base-uncased` - **Language:** English - **Model type:** Transformer-based text classification model - **License:** MIT - **Intended Use:** Fake news detection on social media and news websites ### Model Sources - **Repository:** [Hugging Face Model Hub](https://huggingface.co/your-model-id) - **Paper (if applicable):** N/A - **Demo (if applicable):** N/A ## Uses ### Direct Use - This model can be used to detect whether a given news article is **real or fake**. - It can be integrated into fact-checking platforms, misinformation detection systems, and social media moderation tools. ### Downstream Use - Can be further fine-tuned on domain-specific fake news datasets. - Useful for media companies, journalists, and researchers studying misinformation. ### Out-of-Scope Use - This model is **not designed for generating news content**. - It may not work well for languages other than English. - Not suitable for fact-checking complex claims requiring external knowledge. ## Bias, Risks, and Limitations ### Risks - The model may be biased towards certain topics, sources, or writing styles based on the dataset used for training. - There is a possibility of **false positives (real news misclassified as fake)** or **false negatives (fake news classified as real)**. - Model performance can degrade on out-of-distribution samples. ### Recommendations - Users should **not rely solely** on this model for determining truthfulness. - It is recommended to **use human verification** and **cross-check information** from multiple sources. ## How to Use the Model You can load the model using `transformers` and use it for inference as shown below: ```python from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification import torch tokenizer = DistilBertTokenizerFast.from_pretrained("your-model-id") model = DistilBertForSequenceClassification.from_pretrained("your-model-id") def predict(text): inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512) outputs = model(**inputs) probs = torch.nn.functional.softmax(outputs.logits, dim=-1) return "Fake News" if torch.argmax(probs) == 1 else "Real News" text = "Breaking: Scientists discover a new element!" print(predict(text)) ``` ## Training Details ### Training Data The model was trained on a dataset consisting of **news articles labeled as real or fake**. The dataset includes information from reputable sources and misinformation websites. ### Training Procedure - **Preprocessing:** - Tokenization using `DistilBertTokenizerFast` - Removal of stop words and punctuation - Converting text to lowercase - **Training Configuration:** - **Model:** `distilbert-base-uncased` - **Optimizer:** AdamW - **Batch size:** 16 - **Epochs:** 3 - **Learning rate:** 2e-5 ### Compute Resources - **Hardware:** NVIDIA Tesla T4 (Google Colab) - **Training Time:** ~2 hours ## Evaluation ### Testing Data - The model was evaluated on a held-out test set of **10,000 news articles**. ### Metrics - **Accuracy:** 92% - **F1 Score:** 90% - **Precision:** 91% - **Recall:** 89% ### Results | Metric | Score | |----------|-------| | Accuracy | 92% | | F1 Score | 90% | | Precision | 91% | | Recall | 89% | ## Environmental Impact - **Hardware Used:** NVIDIA Tesla T4 - **Total Compute Time:** ~2 hours - **Carbon Emissions:** Estimated using the [ML Impact Calculator](https://mlco2.github.io/impact#compute) ## Technical Specifications ### Model Architecture - The model is based on **DistilBERT**, a lightweight transformer architecture that reduces computation while retaining accuracy. ### Dependencies - `transformers` - `torch` - `datasets` - `scikit-learn` ## Citation If you use this model, please cite it as: ```bibtex @misc{DhruvPal2025FakeNewsDetection, title={Fake News Detection with DistilBERT}, author={Dhruv Pal}, year={2025}, howpublished={\url{https://huggingface.co/your-model-id}} } ``` ## Contact For any queries, feel free to reach out: - **Author:** Dhruv Pal - **Email:** dhruv416pal@gmail.com - **GitHub:** [dhruvpal05](https://github.com/dhruvpal05) - **LinkedIn:** [idhruvpal](https://linkedin.com/in/idhruvpal)