ExploreACMnaacl / posts /context.py
Yacine Jernite
black
0a10cf1
raw
history blame
7.11 kB
import streamlit as st
title = "Hate Speech in ACM"
description = "The history and development of hate speech detection as a modeling task"
date = "2022-01-26"
thumbnail = "images/prohibited.png"
__ACM_SECTION = """
Content moderation is a collection of interventions used by online platforms to partially obscure
or remove entirely from user-facing view content that is objectionable based on the company's values
or community guidelines, which vary from platform to platform.
[Sarah T. Roberts (2014)](https://yalebooks.yale.edu/book/9780300261479/behind-the-screen/) describes
content moderation as "the organized practice of screening user-generated content (UGC)
posted to Internet sites, social media, and other online outlets" (p. 12).
[Tarleton Gillespie (2021)](https://yalebooks.yale.edu/book/9780300261431/custodians-internet/) writes
that platforms moderate content "both to protect one user from another,
or one group from its antagonists, and to remove the offensive, vile, or illegal.''
While there are a variety of approaches to this problem, in this tool, we focus on automated content moderation,
which is the application of algorithms to the classification of problematic content.
Content that is subject to moderation can be user-directed (e.g. targeted harassment of a particular user
in comments or direct messages) or posted to a personal account (e.g. user-created posts that contain hateful
remarks against a particular social group).
"""
__CURRENT_APPROACHES = """
Automated content moderation has relied both on analysis of the media itself (e.g. using methods from natural
language processing and computer vision) as well as user dynamics (e.g. whether the user sending the content
to another user shares followers with the recipient, or whether the user posting the content is a relatively new account).
Often, the ACM pipeline is fed by user-reported content. Within the realm of text-based ACM, approaches vary
from wordlist-based approaches to data-driven, machine learning models. Common datasets used for training and
evaluating hate speech detectors can be found at [https://hatespeechdata.com/](https://hatespeechdata.com/).
"""
__CURRENT_CHALLENGES = """
Combating hateful content on the Internet continues to be a challenge. A 2021 survey of respondents
in the United States, conducted by Anti-Defamation League, found an increase in online hate & harassment
directed at LGBTQ+, Asian American, Jewish, and African American individuals.
### Technical challenges for data-driven systems
With respect to models that are based on training data, datasets encode worldviews, and so a common challenge
lies in having insufficient data or data that only reflects a limited worldview. For example, a recent
study found that Tweets posted by drag queens were more often rated by an automated system as toxic than
Tweets posted by white supremacists.
This may be due, in part, to the labeling schemes and choices made for the data used in training the model,
as well as particular company policies that are invoked when making these labeling choices.
(This all needs to be spelled out better!)
### Context matters for content moderation.
*Counterspeech* is "any direct response to hateful or harmful speech which seeks to undermine it"
(from [Dangerous Speech Project](https://dangerousspeech.org/counterspeech/)). Counterspeech has been shown
to be an important community self-moderation tool for reducing instances of hate speech (see
[Hangartner et al. 2021](https://www.pnas.org/doi/10.1073/pnas.2116310118)), but counterspeech is often
incorrectly categorized as hate speech by automatic systems due to the counterspeech making direct reference
to or quoting the original hate speech. Such system behavior silences those who are trying to push back against
hateful and toxis speech, and, if the flagged content is hidden automatically, prevents others from seeing the
counterspeech.
See [van Aken et al. 2018](https://aclanthology.org/W18-5105.pdf) for a detailed list of examples that
automatic systems frequently misclassify.
"""
__SELF_EXAMPLES = """
- [**(FB)(TOU)** - *Facebook Community Standards*](https://transparency.fb.com/policies/community-standards/)
- [**(FB)(Blog)** - *What is Hate Speech? (2017)*](https://about.fb.com/news/2017/06/hard-questions-hate-speech/)
- [**(NYT)(Blog)** - * New York Times on their partnership with JigSaw*](https://open.nytimes.com/to-apply-machine-learning-responsibly-we-use-it-in-moderation-d001f49e0644)
- [**(NYT)(FAQ)** - *New York Times on their moderation policy*](https://help.nytimes.com/hc/en-us/articles/115014792387-Comments)
- [**(Reddit)(TOU)** - *Reddit General Content Policies*](https://www.redditinc.com/policies/content-policy)
- [**(Reddit)(Blog)** - *AutoMod - help scale moderation without ML*](https://mods.reddithelp.com/hc/en-us/articles/360008425592-Moderation-Tools-overview)
- [**(Google)(Blog)** - *Google Search Results Moderation*](https://blog.google/products/search/when-and-why-we-remove-content-google-search-results/)
- [**(Google)(Blog)** - *JigSaw Case Studies*](https://www.perspectiveapi.com/case-studies/)
- [**(YouTube)(TOU)** - *YouTube Community Guidelines*](https://www.youtube.com/howyoutubeworks/policies/community-guidelines/)
"""
__CRITIC_EXAMPLES = """
- [Social Media and Extremism - Questions about January 6th 2021](https://thehill.com/policy/technology/589651-jan-6-panel-subpoenas-facebook-twitter-reddit-and-alphabet/)
- [Over-Moderation of LGBTQ content on YouTube](https://www.gaystarnews.com/article/youtube-lgbti-content/)
- [Disparate Impacts of Moderation](https://www.aclu.org/news/free-speech/time-and-again-social-media-giants-get-content-moderation-wrong-silencing-speech-about-al-aqsa-mosque-is-just-the-latest-example/)
- [Calls for Transparency](https://santaclaraprinciples.org/)
- [Income Loss from Failures of Moderation](https://foundation.mozilla.org/de/blog/facebook-delivers-a-serious-blow-to-tunisias-music-scene/)
- [Fighting Hate Speech, Silencing Drag Queens?](https://link.springer.com/article/10.1007/s12119-020-09790-w)
- [Reddit Self Reflection on Lack of Content Policy](https://www.reddit.com/r/announcements/comments/gxas21/upcoming_changes_to_our_content_policy_our_board/)
"""
def run_article():
st.markdown("## Automatic Content Moderation (ACM)")
with st.expander("ACM definition", expanded=False):
st.markdown(__ACM_SECTION, unsafe_allow_html=True)
st.markdown("## Current approaches to ACM")
with st.expander("Current Approaches"):
st.markdown(__CURRENT_APPROACHES, unsafe_allow_html=True)
st.markdown("## Current challenges in ACM")
with st.expander("Current Challenges"):
st.markdown(__CURRENT_CHALLENGES, unsafe_allow_html=True)
st.markdown("## Examples of ACM in Use: in the Press and in their own Words")
col1, col2 = st.columns([4, 5])
with col1.expander("In their own Words"):
st.markdown(__SELF_EXAMPLES, unsafe_allow_html=True)
with col2.expander("Critical Writings"):
st.markdown(__CRITIC_EXAMPLES, unsafe_allow_html=True)