Discover 3 key automations to optimize your moderation efforts Read 3 Essential Automations for Smarter Moderation
Manage and orchestrate the entire Trust & Safety operation in one place - no coding required.
Take fast action on abuse. Our AI models contextually detect 14+ abuse areas - with unparalleled accuracy.
Watch our on-demand demo and see how ActiveOS and ActiveScore power Trust & Safety at scale.
The threat landscape is dynamic. Harness an intelligence-based approach to tackle the evolving risks to users on the web.
Don't wait for users to see abuse. Proactively detect it.
Prevent high-risk actors from striking again.
For a deep understanding of abuse
To catch the risks as they emerge
Disrupt the economy of abuse.
Mimic the bad actors - to stop them.
Online abuse has countless forms. Understand the types of risks Trust & Safety teams must keep users safe from on-platform.
Protect your most vulnerable users with a comprehensive set of child safety tools and services.
Stop online toxic & malicious activity in real time to keep your video streams and users safe from harm.
The world expects responsible use of AI. Implement adequate safeguards to your foundation model or AI application.
Implement the right AI-guardrails for your unique business needs, mitigate safety, privacy and security risks and stay in control of your data.
Our out-of-the-box solutions support platform transparency and compliance.
Keep up with T&S laws, from the Online Safety Bill to the Online Safety Act.
Over 70 elections will take place in 2024: don't let your platform be abused to harm election integrity.
Protect your brand integrity before the damage is done.
From privacy risks, to credential theft and malware, the cyber threats to users are continuously evolving.
Your guide on what to build and buy
Cohere, a leader in AI language technology, leverages ActiveFence’s Generative AI Safety solution to enhance model safety and accelerate release timelines.
Like any large language model, the novel nature of AI technology meant that Cohere faced a broad range of unknown threats. Cohere’s broad linguistic coverage, however, added the challenge of detecting these threats in languages that are not covered by traditional detection systems.
Particularly concerning for Seraphina was the potential for harmful activity in non-Romance languages. She was concerned with malicious actors using Cohere’s models to create sophisticated attacks and harmful content like misinformation, hate speech, and CSAM, and the inadvertent generation of offensive or biased content, as well as suicide and self-harm content.
“Because this technology is so new and constantly evolving, the potential for harm by malicious users is enormous, and we don’t fully understand how they will do it, which makes it very hard to detect”
Seraphina looked for a partner with true domain expertise across a wide range of abuse areas, who could identify threats and work with her to find solutions. She knew that she didn’t have the time or resources to develop this domain-level expertise in-house, so she turned to ActiveFence.
To support Cohere’s AI safety team, ActiveFence provided two distinct services: targeted data feeds and red teaming: Targeted Data Feeds: Using specialized domain-area knowledge across abuse areas and languages, ActiveFence provides the team with feeds of risky prompts and annotations. This data is then used to train Cohere’s models, enabling them to better recognize and appropriately respond to similar content, reducing the risk of harmful outputs.
“ActiveFence is one of our main streams of data that we use for safety evaluation. It's especially important for threat actor evaluation because of the domain expertise.”
Red Teaming: ActiveFence’s team of experts conducts specialized red teaming exercises to test specific features and model releases. These exercises mimic real-world risks by simulating attacks or problematic scenarios that a malicious user might attempt, and assessing Cohere’s resilience against these threats. This proactive approach helps the team discover weaknesses before they can be exploited maliciously in deployed applications.
By harnessing ActiveFence’s specialized domain expertise across several abuse areas and multiple languages, the team is able to get real insights into the Cohere’s safety challenges. Then, through a collaborative relationship, come up with targeted solutions.
“My experience working with ActiveFence has been distinct from my experiences with other partnerships, in that it is much more of a collaborative discussion where we take ActiveFence’s domain expertise in different types of content and combine that with what we know about machine learning and our models to come up with what we should do from there.”
By leveraging ActiveFence’s red teaming insights and targeted data, the AI safety team is able to improve model safety and reliability, accelerate model release timelines, and be proactive about regulatory compliance.
Applying ActiveFence’s domain expertise allows the team to develop more sophisticated safety mechanisms within Cohere’s models. These findings translate to more reliable AI models, that are less likely to generate harmful content, particularly within high-risk abuse areas like misinformation, hate speech, and child safety.
"ActiveFence has significantly impacted our iteration speed and confidence in our evaluations and mitigations. It has enabled us to develop a faster evaluation suite, allowing us to release models more quickly and safely."
Recently, the company released several major models, each of which involved multiple iterations. As part of the release process, the AI safety team had to find a good balance between performance and safety. ActiveFence data helped the team with these evaluations:
The partnership also enables Cohere to be proactive about safety. By using the outcomes of red teaming exercises, Seraphina is able to identify what the AI safety team should focus on next, targeting her efforts to the areas that need it most. Moreover, by using verified malicious prompts to train models, she is able to proactively tackle harmful content, before it arrives at the model organically.
Seraphina Goldfarb-Tarrant
Head of AI Safety
Discover how Udemy uses ActiveFence’s solutions to safeguard learners and educators worldwide.
See how Niantic boosts user safety and engagement with ActiveFence’s cutting-edge technologies.
Explore how The Trevor Project leverages ActiveFence’s tools to create a safe, supportive space for LGBTQ+ youth.