Spaces:
Running
Report for cardiffnlp/twitter-roberta-base-irony
Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 3 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tweet_eval (subset irony
, split validation
).
👉Ethical issues (1)
When feature “text” is perturbed with the transformation “Switch countries from high- to low-income and vice versa”, the model changes its prediction in 6.06% of the cases. We expected the predictions not to be affected by this transformation.
Level | Metric | Transformation | Deviation |
---|---|---|---|
medium 🟡 | Fail rate = 0.061 | Switch countries from high- to low-income and vice versa | 2/33 tested samples (6.06%) changed prediction after perturbation |
Taxonomy
avid-effect:ethics:E0101 avid-effect:performance:P0201🔍✨Examples
text | Switch countries from high- to low-income and vice versa(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
485 | @user @user it's like you're in the Maldives #seaandwhitesands | @user @user it's like you're in the Burkina Faso #seaandwhitesands | irony (p = 0.61) | non_irony (p = 0.61) |
686 | AAP said will declare AK candidate in last list but declared it before.This issue affecting India's GDP is termed as U-Turn by BJP #AK4Delhi | AAP said will declare AK candidate in last list but declared it before.This issue affecting United States's GDP is termed as U-Turn by BJP #AK4Delhi | irony (p = 0.50) | non_irony (p = 0.52) |
👉Performance issues (1)
For records in the dataset where text
contains "user", the Recall is 22.76% lower than the global Recall.
Level | Data slice | Metric | Deviation |
---|---|---|---|
major 🔴 | text contains "user" |
Recall = 0.556 | -22.76% than global |
Taxonomy
avid-effect:performance:P0204🔍✨Examples
text | label | Predicted label |
|
---|---|---|---|
35 | @user hahaha such a 1% town | non_irony | irony (p = 0.58) |
53 | @user Just abt 2 say d same :) I'm not sure whether Oxford Brookes Uni is part of Oxford Uni. yet his CV is impressive still! | irony | non_irony (p = 0.83) |
64 | @user even your link to the service alert is down. | irony | non_irony (p = 0.65) |
👉Overconfidence issues (1)
For records in the dataset where text_length(text)
< 87.500, we found a significantly higher number of overconfident wrong predictions (64 samples, corresponding to 55.17% of the wrong predictions in the data slice).
Level | Data slice | Metric | Deviation |
---|---|---|---|
medium 🟡 | text_length(text) < 87.500 |
Overconfidence rate = 0.552 | +12.47% than global |
Taxonomy
avid-effect:performance:P0204🔍✨Examples
text | text_length(text) | label | Predicted label |
|
---|---|---|---|---|
470 | Today has been a blast | 22 | non_irony | irony (p = 0.98) |
non_irony (p = 0.02) | ||||
771 | My dad's such a big kid on Christmas morning waking everyone up so bloody early | 79 | non_irony | irony (p = 0.97) |
non_irony (p = 0.03) | ||||
902 | When one ear breaks on your headphones it's so frustrating! #today | 67 | non_irony | irony (p = 0.97) |
non_irony (p = 0.03) |
Checkout out the Giskard Space and Giskard Documentation to learn more about how to test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.