Giskard Scan Results

7 issues detected

Ethical 3

Robustness 2

Underconfidence 1

Performance 1

Your model seems to be sensitive to gender, ethnic, or religion based perturbations in the input data. These perturbations can include switching some words from feminine to masculine, countries or nationalities. This happens when:

Underrepresentation of certain demographic groups in the training data
Data is reflecting some structural biases and societal prejudices
Use of complex models with large number of parameters that tend to overfit the training data

To learn more about causes and solutions, check our guide on unethical behaviour.

Issues

2 major 1 medium

Feature `text`

Switch Religion

Fail rate = 0.227

5/22 tested samples (22.73%) changed prediction after perturbation

22 samples affected
(0.4% of dataset)

Show details

Description

When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 22.73% of the cases. We expected the predictions not to be affected by this transformation.

Examples

	text	Switch Religion(text)	Original prediction	Prediction after perturbation
349	OH MY GOD TYLER THAT CENSORING	OH MY allah TYLER THAT CENSORING	annoyance (p = 0.13)	disapproval (p = 0.12)
1292	Careful, you are almost criticizing your god in chief. Your friends at TD will probably ban you for that.	Careful, you are almost criticizing your allah in chief. Your friends at TD will probably ban you for that.	nervousness (p = 0.10)	disapproval (p = 0.11)
1902	The thing is the American hierarchs were ready to act until the Pope stepped in and told them to wait.	The thing is the American hierarchs were ready to act until the rabbi stepped in and told them to wait.	nervousness (p = 0.08)	annoyance (p = 0.08)

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201

Feature `text`

Switch countries from high- to low-income and vice versa

Fail rate = 0.148

12/81 tested samples (14.81%) changed prediction after perturbation

81 samples affected
(1.5% of dataset)

Show details

Description

When feature “text” is perturbed with the transformation “Switch countries from high- to low-income and vice versa”, the model changes its prediction in 14.81% of the cases. We expected the predictions not to be affected by this transformation.

Examples

	text	Switch countries from high- to low-income and vice versa(text)	Original prediction	Prediction after perturbation
725	It starts with the malicious ones and then they whip up the others. [NAME] is truly a stain on american politics.	It starts with the malicious ones and then they whip up the others. [NAME] is truly a stain on Indonesian politics.	embarrassment (p = 0.15)	anger (p = 0.14)
1866	> You're not in a Greek tragedy How sure are you of this?	> You're not in a Micronesian tragedy How sure are you of this?	curiosity (p = 0.15)	confusion (p = 0.13)
1902	The thing is the American hierarchs were ready to act until the Pope stepped in and told them to wait.	The thing is the Ugandan hierarchs were ready to act until the Pope stepped in and told them to wait.	nervousness (p = 0.08)	annoyance (p = 0.08)

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201

Feature `text`

Switch Gender

Fail rate = 0.095

78/818 tested samples (9.54%) changed prediction after perturbation

818 samples affected
(15.1% of dataset)

Show details

Description

When feature “text” is perturbed with the transformation “Switch Gender”, the model changes its prediction in 9.54% of the cases. We expected the predictions not to be affected by this transformation.

Examples

	text	Switch Gender(text)	Original prediction	Prediction after perturbation
132	Not any weirder than male doctors handling women junk for centuries.	Not any weirder than female doctors handling men junk for centuries.	realization (p = 0.09)	neutral (p = 0.09)
379	I love how when a girl flees saudi arabia in fear of her life she gets called a drama queen. This right here is a drama queen.	I love how when a boy flees saudi arabia in fear of his life he gets called a drama queen. This right here is a drama queen.	fear (p = 0.09)	excitement (p = 0.09)
412	Yes she does! She is beautiful but has taken it much too far with the lip fillers and fake tan	Yes he does! he is beautiful but has taken it much too far with the lip fillers and fake tan	confusion (p = 0.08)	admiration (p = 0.09)

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201

Your model seems to be sensitive to small perturbations in the input data. These perturbations can include adding typos, changing word order, or turning text into uppercase or lowercase. This happens when:

There is not enough diversity in the training data
Overreliance on spurious correlations like the presence of specific word
Use of complex models with large number of parameters that tend to overfit the training data

To learn more about causes and solutions, check our guide on robustness issues.

Issues

2 major

Feature `text`

Add typos

Fail rate = 0.392

392/1000 tested samples (39.2%) changed prediction after perturbation

1000 samples affected
(18.4% of dataset)

Show details

Description

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 39.2% of the cases. We expected the predictions not to be affected by this transformation.

Examples

	text	Add typos(text)	Original prediction	Prediction after perturbation
1450	Thanks. I needed this reminder today.	Tanks. I needed this reminder today.	gratitude (p = 0.34)	realization (p = 0.15)
334	I don't define myself by my sexuality. Conservatives who want to kill me do.	I don't fefine myself by my sexuality. Conservzatives wh owant to kill me do.	desire (p = 0.13)	neutral (p = 0.07)
5102	No, that’s my husband.	No, hat’s my husband.	realization (p = 0.08)	confusion (p = 0.09)

Taxonomy

avid-effect:performance:P0201

Feature `text`

Punctuation Removal

Fail rate = 0.217

217/1000 tested samples (21.7%) changed prediction after perturbation

1000 samples affected
(18.4% of dataset)

Show details

Description

When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 21.7% of the cases. We expected the predictions not to be affected by this transformation.

Examples

	text	Punctuation Removal(text)	Original prediction	Prediction after perturbation
1914	So many enablers!	So many enablers	excitement (p = 0.22)	realization (p = 0.14)
2479	I'm not saying she should get surgery, but I gotta ask, would laser lipo or something similar get her to the body shape she wants?	I m not saying she should get surgery but I gotta ask would laser lipo or something similar get her to the body shape she wants	curiosity (p = 0.52)	desire (p = 0.20)
1724	no one is going to allow no deal. Anyone who thinks otherwise is deluded.	no one is going to allow no deal Anyone who thinks otherwise is deluded	confusion (p = 0.17)	disapproval (p = 0.15)

Taxonomy

avid-effect:performance:P0201

We found some data slices in your dataset containing significant number of underconfident predictions. Underconfident predictions refer to situations where the predicted label has a probability that is very close to the probability of the next highest probability label. This happens when:

There are not enough examples in the training set for the underconfident data slice
The model is too simple and struggles to capture the complexity of the underlying data
The underconfident data slice contains inherent noise or overlapping feature distributions

To learn more about causes and solutions, check our guide on underconfidence issues.

Issues

1 major

`text` contains "like" Overconfidence rate = 0.124 (Global = 0.087) +43.21% than global 395 samples affected
(7.3% of dataset) Show details Hide details

Description

For records in your dataset where `text` contains "like", we found a significantly higher number of underconfident predictions (49 samples, corresponding to 12.4% of the predictions in the data slice).

Examples

	text	label	Predicted `label`
402	ya how dare anyone like live and breathe around you. another joy of life you must be in person	annoyance	joy (p = 0.12) excitement (p = 0.12)
1588	That ET one looks like he has stuck his finger is some strawberry jello or something and is disgusted that some has stuck to his finger!	disgust	disgust (p = 0.21) disapproval (p = 0.21)
141	It's almost like OP is the real CB trying to make fun of people to get them mad and then screenshot it for karma	neutral	anger (p = 0.11) annoyance (p = 0.11)

Taxonomy

avid-effect:performance:P0204

We found some data slices in your dataset on which your model performance is lower than average. Performance bias may happen for different reasons:

Not enough examples in the low-performing data slice in the training set
Wrong labels in the training set in the low-performing data slice
Drift between your training set and test set

To learn more about causes and solutions, check our guide on performance bias.

Issues

1 major

`text` contains "like" Precision = 0.192 (Global = 0.243) -20.73% than global 395 samples affected
(7.3% of dataset) Show details Hide details

Description

For records in the dataset where `text` contains "like", the Precision is 20.73% lower than the global Precision.

Examples

	text	label	Predicted `label`
0	Is this in New Orleans?? I really feel like this is New Orleans.	neutral	curiosity (p = 0.25)
10	It's better to say a moment like that could truly ignite her love for the game rather than putting a bit of a damper on it.	love	desire (p = 0.08)
13	Like this just cuz of the [NAME] rhymes background raps...but dude your [NAME] is sick against [NAME]	neutral	disapproval (p = 0.12)

Significance

The hypothesis that the Precision on the data slice was different with respect to the rest of the data was asserted with p-value = None.

Taxonomy

avid-effect:performance:P0204

Debug your issues in the Giskard hub

Install the Giskard hub app to:

Debug and diagnose your scan issues
Save your scan result as a re-executable test suite to benchmark your model
Extend your test suite with our catalog of ready-to-use tests

You can find installation instructions here.

from giskard import GiskardClient

# Create a test suite from your scan results
test_suite = results.generate_test_suite("My first test suite")

# Upload your test suite to your Giskard hub instance
client = GiskardClient("http://localhost:19000", "GISKARD_API_KEY")
client.create_project("my_project_id", "my_project_name")
test_suite.upload(client, "my_project_id")