The Guide to Trust & Safety: Content Detection Tools

By Ariella Rothschild
May 11, 2022

In our sixth edition of the Guide to Trust & Safety, we share the ins and outs of the detection tools needed to effectively moderate content. We discuss the advantages and disadvantages of automated and human moderation, demonstrating the need for a combined approach.

The right tools help Trust & Safety teams with their many responsibilities, including the most vital of all- ensuring human safety. At the core of this task is the ability to detect harmful content. To do so, teams must be able to sift through vast volumes of content to find malicious items – both quickly and precisely.

As part of ActiveFence’s Guide to Trust & Safety series, we share resources on the critical tools that enable the work of teams of all sizes. This blog reviews content detection tools.

The core of content moderation

Proper detection tools allow teams to gather, prioritize and understand the content shared on their platforms. When deciding on content moderation tools, teams must take the following considerations into account:

Available resources
Ability to understand context and nuances
Recall vs. precision
Minimizing false positives

Teams employ a combination of tactics, ranging from human to automated moderation, to tackle this task. While each has its advantages and drawbacks, as it will become clear, a combined approach is often most effective for teams working at scale.

Automated Content Moderation

Automated content moderation allows for the detection of harmful content at scale. These tools save both time and resources. With the ability to flag, block, or remove content, AI tools are dynamic and customizable.

Types of Technology

Automated content moderation relies on artificial intelligence. Here are a few forms of AI commonly used:

Natural Language Processing (NLP) similarly understands text and spoken words to humans and then translates them into data that a computer can analyze.
Optical Character Recognition (OCR) recognizes text within an image and converts it into text, allowing for automated flagging.
Digital hash technology translates images and videos into strings of text and numbers called hashes that are then matched with pre-existing databases of classified hashes, enabling identification.

Advantages and Disadvantages of Automated Content Moderation

Automated moderation has many benefits that ease the load on Trust & Safety teams. These include:

Scale and speed: Automation allows for the processing of large volumes of data at high speed.

Recall: Used as a performance metric, recall refers to relevant data retrieved in a search relative to the total number of existing data. With automated content moderation, the recall rate is higher. This means that the volume of harmful content picked up with automation is far higher than with human moderation.
Protection of human moderators: Automatic content detection safeguards the well-being of human moderators by limiting their exposure to harmful content.

While automated detection has many advantages, it has pitfalls as well. AI is only as intelligent as its learning set, leaving many shortcomings such as:

Context: Without understanding the surrounding text or images, grey areas like educational purposes or friendly conversations can be considered harmful, even if that is not their intention. For example, nudity or weapons can be detrimental within one context but educational in another.

Regional differences: As AI is primarily based on learning sets, most of which are in English, language, culture, and slang are a few examples of what automation may miss. For instance, AI frequently overlooks harmful content in other languages.
Sentiment: As automation relies on computer processing, it generally misses tone and emotion, which are critical to understanding content.

A human hand and a robotic hand reaching towards each other, symbolizing the connection between humans and technology.

Human Moderation

Human moderation adds the necessity of contextual understanding that AI cannot provide. Human moderators, used in addition to AI, include content moderators, platform users, and intelligence moderators.

Content moderators: Human moderators review flagged content to determine if it is violative, adding an understanding of emotion, context, and nuance.
User flagging and reviews: This form of content moderation relies on a platform’s users to flag harmful content. Platforms can implement mechanisms for users to report, comment, and rate harmful content.
Proactive intelligence: Human intelligence (humint) involves direct communication with threat actors, while web intelligence (webint), also known as open-source intelligence (OSINT), relies on publicly available information to provide additional investigative research. An added layer, proactive intelligence deepens understanding of content and can alert about harmful content before humans or AI flags it. Online crime, terrorism, and disinformation are just a few areas that benefit from intelligence gathering.

Advantages and Disadvantages of Human Moderation

Human moderation has clear advantages. Often these are the exact opposite of automated detection. These include:

Ability to understand nuance and context: As humans, we can process many more information streams, which allows us to understand the context behind findings.
Precision: Human moderators are generally more precise than AI, meaning they have a lower error rate on the harmful content they find.
Regional and subject area expertise: Humans with varied skillsets, including regional knowledge, and linguistic and subject-area expertise, can analyze and detect harmful content that trained AI generally cannot.

While the human element is key to detection, it comes at a heavy price. These considerations include the following:

Scale and speed: Human moderation cannot process anywhere near the amount of content that automation can, leading to a lower recall rate as humans cannot find the same volume as AI.
Resources: Resources are needed to employ personnel and support their needs
Exposure: Reviewing harmful content can compromise the mental well-being of staff members, potentially causing long-term effects on their health. Additionally, relying on user flagging and reporting exposes users to harmful content before takedown.

Each of these tools is complementary to the other. When choosing the right tools, platforms must consider their needs and understand which combinations will strike a balance in their tool stack.

ActiveFence’s harmful content detection solution uses both human and automated moderation, allowing teams to scale their Trust & Safety efforts with precision and speed.

The Guide to Trust & Safety: Content Detection Tools

The core of content moderation

Automated Content Moderation

Types of Technology

Advantages and Disadvantages of Automated Content Moderation

Human Moderation

Advantages and Disadvantages of Human Moderation

Table of Contents

Related Content

The Role of Organic and Synthetic Data in AI Safety and Security

Why Red Teaming Is Critical for Generative AI Safety, Security, and Success

How ActiveFence Helps Amazon Build Safer Models