The LLM Safety Review: Benchmarks & Analysis

Abuse areas
Online abuse has countless forms. Understand the types of risks Trust & Safety teams must keep users safe from on-platform.
Learn more
Child Safety
Protect your most vulnerable users with a comprehensive set of child safety tools and services.
Learn More

Cover image for a report titled 'Protecting Children from Online Grooming' by ActiveFence, featuring a child using a laptop with a city skyline in the background.

Learn the advanced detection methods needed to protect children from online predators. Read the Report

Social Media Learn More
Gaming Learn More
Dating Learn More

Marketplaces Learn More
Foundation Models Learn More
AI Applications Learn More

Be Compliant with the DSA
Our out-of-the-box solutions support platform transparency and compliance.
Read more
Regulation & Compliance Overview
Keep up with T&S laws, from the Online Safety Bill to the Online Safety Act.
Learn More

Cover of ActiveFence's white paper titled 'The Comprehensive Guide to Transparency Reports'

Preparing your first transparency report? Here's what you need to know. Learn more

Secure Election Integrity
Over 70 elections will take place in 2024: don't let your platform be abused to harm election integrity.
Learn more

Cover of ActiveFence eBook titled 'Preparing for 2024's [many] Elections'.

Learn how you can prepare for this election year Download Report

Brand Protection
Protect your brand integrity before the damage is done.
Learn more

Learn about the online risks that jeopardize your brand. Download Report

Cyber Threat Intelligence
From privacy risks, to credential theft and malware, the cyber threats to users are continuously evolving.
Learn more

Cover of ActiveFence document titled 'Combating Malware Campaigns: The holistic approach to user cyber safety, June 2023'.

Access our report on combating malware campaigns Download Report

The LLM Safety Review

GenAI tools, and the Large Language Models (LLMs) that underpin them – are impacting the day-to-day lives of billions of users across the globe. But can these technologies be trusted to keep users safe?

This report examines how this new technology can be used by bad actors and vulnerable users to create dangerous content. By testing LLM responses to risky prompts, we are able to assess their relative safety, identify weaknesses, and, most importantly – define actionable steps to improve LLM safety.

The LLM Safety Review

Within this Report

Related Content

The Buyer’s Guide to T&S Tools 2.0

The State of Trust & Safety 2024

How Threat Actors Abuse Online Games

The LLM Safety Review

Within this Report

Related Content

The Buyer’s Guide to T&S Tools 2.0

The State of Trust & Safety 2024

How Threat Actors Abuse Online Games

The State of Trust & Safety 2024