YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Card for Model ID (in progress of completing)

This model is a fine-tunning of BETO uncase to detect offensive and discriminatory language against lgbt community. It could be used as a moderation service in forums and digital spaces.

Model Card Contact

[[email protected]]

Model Details

Model description process

-Starting recovering of discriminatory phrases for the LGBTQIA+ community from X/Twitter, Instagram and Tiktok (197 phrases) . -Labelling by 3 raters as non-lgbtphobic (0) and lgbtphobic (1). -Text augmentation was applied backtranslation and random synonyms replacing. -Translating to Spanish part of McGiff, J., & Nikolov, N. S. (2024) dataset and added (under  licence CC-BY-4.0) -Finally, we obtained 1234 tagged phrases for version 1.0.1 of LGBTQIAphobia_augmented. Please cite data set as:

Martínez-Araneda, C., Maldonado Montiel, D., Gutiérrez Valenzuela, M., Gómez Meneses, P., Segura Navarrete, A., & Vidal-Castro, C. (2024). LGBTQIAphobia dataset (augmented) (1.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14563166

  • Developed by: [Martínez-Araneda, C; Segura Navarrete, A.; Gutierrez Valenzuela, Mariella; Maldonado Mintiel, Diego; Gómez Meneses, P.; Vidal-Castro; Christian ]
  • Model type: [text-classification]
  • Language(s) (NLP): [Spanish]
  • License: [CC-BY-4.0]
  • Finetuned from model [dccuchile/bert-base-spanish-wwm-uncased]: More information of base model [https://github.com/dccuchile/beto]

Model Sources [optional]

Uses

This model can be used to detect offensive and discriminatory language against lgbt community. It could be used as a moderation service in forums and digital spaces.

Direct Use

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

This model has its own bias from having been adjusted with a small data set.

[More Information Needed]

Recommendations

How to Get Started with the Model

#libraries from transformers import AutoModelForSequenceClassification, AutoTokenizer

Define la ruta de donde cargarás el modelo

#load_directory = "./lgbetO"

Cargar el modelo entrenado

#model = AutoModelForSequenceClassification.from_pretrained(load_directory)

Cargar el tokenizer

#tokenizer = AutoTokenizer.from_pretrained(load_directory)

Training Details

The training process begins by retrieving offensive/non-offensive and discriminatory/non-discriminatory language against phrases related to the lgbt community from twitter, instagram and tiktok, preprocessing them, labeling them by 3 raters, augmenting them with backtranslation and synonyms, and adjusting the BETO base model (dccuchile/bert-base -spanish-wwm-uncased) for discriminatory phrase detection for the lgbt community.

Training Data

Citation Martínez-Araneda, C., Maldonado Montiel, D., Gutiérrez Valenzuela, M., Gómez Meneses, P., Segura Navarrete, A., & Vidal-Castro, C. (2024). LGBTQIAphobia dataset (augmented) (1.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14563166

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

  • Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: Google Cloud Platform [More Information Needed]
  • Hours used: [More Information Needed]
  • Cloud Provider: [More Information Needed]
  • Compute Region: southamerica
  • Carbon Emitted: 0.14kgCO$_2$eq/kWh

Experiments were conducted using Google Cloud Platform in region southamerica-east1, which has a carbon efficiency of 0.2 kgCO$_2$eq/kWh. A cumulative of 10 hours of computation was performed on hardware of type T4 (TDP of 70W).

Total emissions are estimated to be 0.14 kgCO$_2$eq of which 100 percents were directly offset by the cloud provider.

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

(GPU) del backend de Google Compute Engine en Python 3

Hardware

RAM: 3.87 GB/12.67 GB Disco: 33.96 GB/112.64 GB

Software

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.