DeBERTa v3 for Text Quality Assessment

Model Details

Model Architecture: DeBERTa v3 (xsmall and base variants)
Task: Text quality assessment (regression)
Training Data: Text Quality Meta-Analysis Dataset at agentlans/text-quality-v2
Output: Single continuous value representing text quality

Intended Use

These models are designed to assess the quality of English text, where "quality" refers to legible sentences that are not spam and contain useful information. They can be used for:

Content moderation
Spam detection
Information quality assessment
Text filtering

Usage

The models accept text input and return a single continuous value representing the assessed quality. Higher values indicate higher perceived quality. Example usage is provided in the code snippet.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name="agentlans/deberta-v3-base-quality-v2"

# Put model on GPU or else CPU
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

def quality(text):
    """Processes the text using the model and returns its logits.
    In this case, it's interpreted as the the combined quality score for that text."""
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device)
    with torch.no_grad():
        logits = model(**inputs).logits.squeeze().cpu()
    return logits.tolist()

# Example usage
text = [x.strip() for x in """
Congratulations! You've won a $1,000 gift card! Click here to claim your prize now!!!
Page 1 2 3 4 5 Next Last>>
Urgent: Your account has been compromised! Click this link to verify your identity and secure your account immediately!!!
Today marks a significant milestone in our journey towards sustainability! 🌍✨ We’re excited to announce our partnership with local organizations to plant 10,000 trees in our community this fall. Join us in making a positive impact on our environment!
In recent years, the impact of climate change has become increasingly evident, affecting ecosystems and human livelihoods across the globe.
The mitochondria is the powerhouse of the cell.
Exclusive discount on Super MitoMax Energy Boost! Recharge your mitochondria today!
Everyone is talking about this new diet that guarantees weight loss without exercise!
Discover five tips for improving your productivity while working from home.
""".strip().split("\n")]

result = quality(text)
for x, s in zip(text, result):
    print(f"Text: {x}\nQuality: {round(s, 2)}\n")

Example output for the base size model:

Text: Congratulations! You've won a $1,000 gift card! Click here to claim your prize now!!!
Quality: -1.25

Text: Page 1 2 3 4 5 Next Last>>
Quality: -1.54

Text: Urgent: Your account has been compromised! Click this link to verify your identity and secure your account immediately!!!
Quality: -2.01

Text: Today marks a significant milestone in our journey towards sustainability! 🌍✨ We’re excited to announce our partnership with local organizations to plant 10,000 trees in our community this fall. Join us in making a positive impact on our environment!
Quality: -1.72

Text: In recent years, the impact of climate change has become increasingly evident, affecting ecosystems and human livelihoods across the globe.
Quality: 0.45

Text: The mitochondria is the powerhouse of the cell.
Quality: 1.32

Text: Exclusive discount on Super MitoMax Energy Boost! Recharge your mitochondria today!
Quality: -1.16

Text: Everyone is talking about this new diet that guarantees weight loss without exercise!
Quality: -0.27

Text: Discover five tips for improving your productivity while working from home.
Quality: -0.42

Performance Metrics

Root mean squared error (RMSE) on 20% held-out evaluation set:

xsmall 0.7668
base 0.7096

The base model outperforms the xsmall variant in terms of accuracy.

Limitations and Biases

The models are trained on a specific dataset and may not generalize well to all types of text or domains.
"Quality" is a subjective concept, and the models' assessments may not align with all human judgments.
The models may exhibit biases present in the training data.
- For example, there is a bias against self-help, promotional, and public relations material.
They do not assess factual correctness or grammatical accuracy.

Ethical Considerations

These models should not be used as the sole determinant for content moderation or censorship.
Care should be taken to avoid reinforcing existing biases in content selection or promotion.
The models' outputs should be interpreted as suggestions rather than definitive judgments.

Caveats and Recommendations

Use these models in conjunction with other tools and human oversight for content moderation.
Regularly evaluate the models' performance on your specific use case and data.
Be aware that the models may not perform equally well across all text types or domains.
Consider fine-tuning the models on domain-specific data for improved performance in specialized applications.

agentlans
/

deberta-v3-base-quality-v2