Manage and orchestrate the entire Trust & Safety operation in one place - no coding required.
Take fast action on abuse. Our AI models contextually detect 14+ abuse areas - with unparalleled accuracy.
Watch our on-demand demo and see how ActiveOS and ActiveScore power Trust & Safety at scale.
The threat landscape is dynamic. Harness an intelligence-based approach to tackle the evolving risks to users on the web.
Don't wait for users to see abuse. Proactively detect it.
Prevent high-risk actors from striking again.
For a deep understanding of abuse
To catch the risks as they emerge
Disrupt the economy of abuse.
Mimic the bad actors - to stop them.
Online abuse has countless forms. Understand the types of risks Trust & Safety teams must keep users safe from on-platform.
Protect your most vulnerable users with a comprehensive set of child safety tools and services.
Stop online toxic & malicious activity in real time to keep your video streams and users safe from harm.
The world expects responsible use of AI. Implement adequate safeguards to your foundation model or AI application.
Implement the right AI-guardrails for your unique business needs, mitigate safety, privacy and security risks and stay in control of your data.
Our out-of-the-box solutions support platform transparency and compliance.
Keep up with T&S laws, from the Online Safety Bill to the Online Safety Act.
Over 70 elections will take place in 2024: don't let your platform be abused to harm election integrity.
Protect your brand integrity before the damage is done.
From privacy risks, to credential theft and malware, the cyber threats to users are continuously evolving.
Your guide on what to build and buy
Learn more about ActiveFence's powerful, precise detection models
When it comes to evaluating the performance of detection models for content moderation, precision and recall are often the most cited metrics. However, in the complex and evolving world of trust and safety, relying solely on these metrics can provide an incomplete picture, especially when considering operational needs and user safety. This blog post will explore different ways of evaluating detection models and how other trust and safety solution components can help optimize performance, efficiency, and ultimately, user safety.
While precision and recall are essential for understanding how well a model performs in a lab setting, they are just the starting off point for measuring operational efficiency.
One way we evaluate and address efficiency beyond precision and recall is through our Health Assessments, which provide a tailored analysis of a potential partner platform’s moderation needs by applying ActiveFence policies to a representative sample of their data. This helps uncover the greatest areas of potential efficiency gains and confirms the prevalence of different types of harm.
Having conducted dozens of these health assessments over the years, we have aggregated some of the collected data to gain a deeper understanding of core metrics for operational efficiency.
A key insight from our Health Assessments is that only 5% of platform data is actually harmful. This emphasizes the importance of operational efficiency in trust and safety —most content isn’t harmful, which means optimizing the way we handle the small percentage that is problematic can lead to significant improvements in both workload and resource allocation.
One of the key challenges in trust and safety operations is dealing with both high-risk harms (e.g., child exploitation, terrorist content) and high-prevalence harms (e.g., spam, misinformation). Evaluating detection models in this context means balancing the need to catch harmful content with the potential risk of false positives.
Spam is an example of a high-prevalence harm that may be hard to detect and action on manually. This is because it is often hidden by coded terminology and based on a high frequency of messages – that are often not violative if looked on as their own, making it impossible for moderators to review these volumes manually. This is where automation can kick in.
In one example, a client we worked with found that 50% of the spam shared on its platform was generated by just 1% of its flagged users. One spammer sent the same message 70,000 times in less than two weeks. By understanding these user patterns, that client was able to use automation to reduce the burden of manually handling repetitive and disruptive spam by 80%.
When looking at both standard efficiency metrics and different violation types, it becomes clear that precision and recall don’t fully capture the operational efficiency that trust and safety teams need to achieve.
Alternatively, operational efficiency can be evaluated by looking at how effectively a moderation system handles content at scale and supports moderators in managing harms. For instance:
It’s clear from the above that evaluating detection model metrics should not happen in isolation. The broader trust and safety ecosystem—that includes automations, moderator tooling, and user-level risk scoring—plays a crucial role in determining the overall effectiveness of harmful content mitigation strategies.
Ultimately, evaluating detection models should be about understanding their impact on both safety outcomes and operational efficiency. Key performance indicators might include:
The integration of models into a larger trust and safety strategy means we move beyond a narrow focus on model precision and recall. By including metrics that focus on impact, efficiency, and holistic safety, we gain a clearer picture of how well a model performs in the real world—ensuring that we not only detect harm but also mitigate it effectively and sustainably.
Precision and recall are important metrics for evaluating detection models, but they are only part of the story. Trust and safety teams must also consider the operational efficiency of their solutions, the balance between automation and human intervention, and the user-level actions enabled by their detection models. By expanding the evaluation criteria, platforms can ensure that their trust and safety operations are strategic, scalable, and capable of keeping their communities safe in a constantly evolving landscape.
Enhance moderation efficiency with precise models and efficiency-building features in ActiveOS.
ActiveFence is proud to be the first AI Safety vendor in NVIDIA's NeMo Guardrails, making our AI safety solution accessible to the wider world. Learn about this new integration and how you can ensure you're integrating safe AI into your platform.
AI-generated misinformation is spreading faster than ever. How can companies handle this threat during world events like the 2024 Paris Olympics?
Over the past year, we’ve learned a lot about GenAI risks, including bad actor tactics, foundation model loopholes, and how their convergence allows harmful content creation and distribution - at scale. Here are the top GenAI risks we are concerned with in 2024.