Edit model card

0xnu/AGTD-v0.1

The "0xnu/AGTD-v0.1" model represents a significant breakthrough in distinguishing between text written by humans and one generated by Artificial Intelligence (AI). It is rooted in sophisticated algorithms and offers exceptional accuracy and efficiency in text analysis and classification. Everything is detailed in the study and accessible here.

Training Details

Precision: 0.6269
Recall: 1.0000
F1-score: 0.7707
Accuracy: 0.7028
Confusion Matrix:
[[197 288]
 [  0 484]]

Training History

Run the model

import os
os.environ["KERAS_BACKEND"] = "tensorflow"

import keras
import tensorflow as tf
import pickle
import numpy as np
from huggingface_hub import hf_hub_download

# Hugging Face repository details
REPO_ID = "0xnu/AGTD-v0.1"
MODEL_FILENAME = "human_ai_text_classification_model.keras"
TOKENIZER_FILENAME = "tokenizer.pkl"

# Download the model and tokenizer
model_path = hf_hub_download(repo_id=REPO_ID, filename=MODEL_FILENAME)
tokenizer_path = hf_hub_download(repo_id=REPO_ID, filename=TOKENIZER_FILENAME)

# Load the model
model = keras.models.load_model(model_path)

# Load the tokenizer
with open(tokenizer_path, 'rb') as tokenizer_file:
    tokenizer = pickle.load(tokenizer_file)

# Input text
text = "This model trains on a diverse dataset and serves functions in applications requiring a mechanism for distinguishing between human and AI-generated text."

# Parameters (these should match the training parameters)
MAX_LENGTH = 100000

# Tokenization function
def tokenize_text(text, tokenizer, max_length):
    sequences = tokenizer.texts_to_sequences([text])
    padded_sequence = tf.keras.preprocessing.sequence.pad_sequences(sequences, maxlen=max_length, padding='post', truncating='post')
    return padded_sequence

# Prediction function
def predict_text(text, model, tokenizer, max_length):
    processed_text = tokenize_text(text, tokenizer, max_length)
    prediction = model.predict(processed_text)[0][0]
    return prediction

# Make prediction
prediction = predict_text(text, model, tokenizer, MAX_LENGTH)

# Interpret results
if prediction >= 0.5:
    print(f"The text is likely AI-generated (confidence: {prediction:.2f})")
else:
    print(f"The text is likely human-written (confidence: {1-prediction:.2f})")

print(f"Raw prediction value: {prediction}")

Citation

@misc{agtd2024,
  author       = {Oketunji, A.F.},
  title        = {Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing AI-Generated Text},
  year         = 2023,
  version      = {v3},
  publisher    = {arXiv},
  doi          = {https://doi.org/10.48550/arXiv.2311.15565},
  url          = {https://arxiv.org/abs/2311.15565}
}

Copyright

(c) 2024 Finbarrs Oketunji. All Rights Reserved.

Downloads last month
91
Safetensors
Model size
109M params
Tensor type
F32
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train 0xnu/AGTD-v0.1