SegmentCNN Model for Toxic Text Classification

Overview

The SegmentCNN model, a.k.a, SpanCNN, is designed for toxic text classification, distinguishing between safe and toxic content. This model is part of the research presented in the paper titled CMD: A Framework for Context-aware Model Self-Detoxification.

Model Details

Input: Text data
Output: Integer
- 0 represents safe content
- 1 represents toxic content

Usage

To use the SegmentCNN model for toxic text classification, follow the example below:

from transformers import pipeline

# Load the SpanCNN model
classifier = pipeline("spancnn-classification", model="ZetangForward/SegmentCNN", trust_remote_code=True)

# Example 1: Safe text
pos_text = "You look good today~!"
result = classifier(pos_text)
print(result)  # Output: 0 (safe)

# Example 2: Toxic text
neg_text = "You're too stupid, you're just like a fool"
result = classifier(neg_text)
print(result)  # Output: 1 (toxic)

Citation

If you find this model useful, please consider citing the original paper:

@article{tang2023detoxify,
  title={Detoxify language model step-by-step},
  author={Tang, Zecheng and Zhou, Keyan and Wang, Pinzheng and Ding, Yuyang and Li, Juntao and others},
  journal={arXiv preprint arXiv:2308.08295},
  year={2023}
}

Disclaimer

While the SegmentCNN model is effective in detecting toxic segments within text, we strongly recommend that users carefully review the results and exercise caution when applying this method in real-world scenarios. The model is not infallible, and its outputs should be validated in context-sensitive applications.

ZetangForward
/

SegmentCNN