FpOliveira commited on
Commit
795d652
1 Parent(s): 8f605a1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - FpOliveira/TuPi-Portuguese-Hate-Speech-Dataset-Binary
5
+ language:
6
+ - pt
7
+ metrics:
8
+ - accuracy
9
+ - precision
10
+ - recall
11
+ - f1
12
+ pipeline_tag: text-classification
13
+ base_model: neuralmind/bert-base-portuguese-cased
14
+ base_dataset: FpOliveira/TuPi-Portuguese-Hate-Speech-Dataset-Binary
15
+ widget:
16
+ - text: 'Bom dia, flor do dia!!'
17
+ ---
18
+
19
+ ## Introduction
20
+
21
+
22
+ Tupi-BERT-Base is a fine-tuned BERT model designed specifically for binary classification of hate speech in Portuguese. Derived from the [BERTimbau base](https://huggingface.co/neuralmind/bert-base-portuguese-cased), TuPi-Base is refinde solution for addressing hate speech concerns.
23
+ For more details or specific inquiries, please refer to the [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/).
24
+
25
+ The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data. In the creation of a specialized Portuguese Language Model tailored for hate speech classification, the original BERTimbau model underwent fine-tuning processe carried out on the [TuPi Hate Speech DataSet](https://huggingface.co/datasets/FpOliveira/TuPi-Portuguese-Hate-Speech-Dataset-Binary), sourced from diverse social networks.
26
+
27
+ ## Available models
28
+
29
+ | Model | Arch. | #Layers | #Params |
30
+ | ---------------------------------------- | ---------- | ------- | ------- |
31
+ | `FpOliveira/tupi-bert-base-portuguese-cased` | BERT-Base |12 |109M|
32
+ | `FpOliveira/tupi-bert-large-portuguese-cased` | BERT-Large | 24 | 334M |
33
+ | `FpOliveira/tupi-bert-base-portuguese-cased-multiclass-multilabel` | BERT-Base | 12 | 109M |
34
+ | `FpOliveira/tupi-bert-large-portuguese-cased-multiclass-multilabel` | BERT-Large | 24 | 334M |
35
+
36
+ ## Example usage usage
37
+
38
+ ```python
39
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
40
+ import torch
41
+ import numpy as np
42
+ from scipy.special import softmax
43
+
44
+ def classify_hate_speech(model_name, text):
45
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
46
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
47
+ config = AutoConfig.from_pretrained(model_name)
48
+
49
+ # Tokenize input text and prepare model input
50
+ model_input = tokenizer(text, padding=True, return_tensors="pt")
51
+
52
+ # Get model output scores
53
+ with torch.no_grad():
54
+ output = model(**model_input)
55
+ scores = softmax(output.logits.numpy(), axis=1)
56
+ ranking = np.argsort(scores[0])[::-1]
57
+
58
+ # Print the results
59
+ for i, rank in enumerate(ranking):
60
+ label = config.id2label[rank]
61
+ score = scores[0, rank]
62
+ print(f"{i + 1}) Label: {label} Score: {score:.4f}")
63
+
64
+ # Example usage
65
+ model_name = "FpOliveira/tupi-bert-base-portuguese-cased"
66
+ text = "Bom dia, flor do dia!!"
67
+ classify_hate_speech(model_name, text)
68
+
69
+ ```