AI Text Detection Model

A Random Forest classifier that detects whether text is human-written or AI-generated (GPT/Deepseek).

Overview

Task: Binary classification (Human vs AI text)
Architecture: Random Forest with TF-IDF features
Input: Text string
Output: Classification label (Human/AI) with confidence score

Installation

# Clone the repository
!git clone https://huggingface.co/polygraf-ai/ai-text-detector-random-forest-supplementary
!cd  ai-text-detector-random-forest-supplementary

# Install the package
!pip install -e .

# Install requirements
pip install -r requirements.txt

Usage

# Single text prediction
from inference import predict_text

text = "Your text here to analyze"
result = predict_text(text, model_path="model_artifacts")
print(result)

# Output format:
{
    'label': 'Human-written',  # or 'AI-generated'
    'confidence': 0.85,  # confidence score between 0 and 1
    'probabilities': {
        'Human-written': 0.85,
        'AI-generated': 0.15
    }
}

# Multiple texts
texts = [
    "First text to analyze",
    "Second text to analyze"
]
results = [predict_text(text) for text in texts]

Limitations

Not suitable for high-level detection
Should be used as a supplementary tool only

Training Data

Text samples from:

Human writers
GPT-4 outputs
Deepseek Chat outputs

Metrics

Accuracy: 0.87
Precision: 0.87
Recall: 0.84
F1: 0.85