Improve your detection and simplify moderation - in one AI-powered platform.
Stay ahead of novel risks and bad actors with proactive, on-demand insights.
Proactively stop safety gaps to produce safe, reliable, and compliant models.
Deploy generative AI in a safe and scalable way with active safety guardrails.
Online abuse has countless forms. Understand the types of risks Trust & Safety teams must keep users safe from on-platform.
Protect your most vulnerable users with a comprehensive set of child safety tools and services.
Our out-of-the-box solutions support platform transparency and compliance.
Keep up with T&S laws, from the Online Safety Bill to the Online Safety Act.
Over 70 elections will take place in 2024: don't let your platform be abused to harm election integrity.
Protect your brand integrity before the damage is done.
From privacy risks, to credential theft and malware, the cyber threats to users are continuously evolving.
Stay ahead of industry news in our exclusive T&S community.
The generative AI race is on. But the question of who will create the safest model remains unanswered.
Soon after the November, 2022 public launch of ChatGPT, Microsoft unveiled an AI-fueled facelift to its Bing search engine, only to be followed by Google’s announcement of its own version, Bard. But while the race so far has focused on performance and accuracy – safety has mostly been an afterthought.
Advancements in generative AI over the last few years have taught us that while people love playing with AI tools to create unique new memes, write poems, or explain concepts in quirky narrative tones, these models can also be easily manipulated to cause harm. Case in point, the 4chan chatbot was famously offensive, the Replika version was misused by men creating AI ‘girlfriends’ and then abusing them, and Microsoft’s 2016 AI chatbot, Tay, turned racist after just one day. Moreover, AI image generators have a tendency to regularly produce imagery so offensive and racist that Craiyon, a popular one, has an explicit warning for users that it may produce images that “reinforce or exacerbate social biases” and “may contain harmful stereotypes.”
Millions of people have used AI chatbots and image generators, and have done so, generally speaking, with positive intentions. However, like every digital plaything, there are loopholes, and some users have gone on a quest – maliciously and otherwise – to find out where they lie. Machine learning and artificial intelligence, as incredible as they may be, present serious challenges for Trust & Safety teams, as the engineers that create them have yet to perfect their ability to protect these technologies that use them from misuse.
Any company making an AI tool available for public use needs to have an adequately robust conduct and content moderation policy in place to attempt to reduce its misuse. Any policy needs to address not only the obvious types of violations that users might be tempted to ask a bot or image generator to produce but the workarounds they might use to get the same information. The prompt injections that users so gleefully use to get bots to give them violative information in seemingly harmless ways, for example, need to be considered in content policies as well: asking a bot to provide instructions on how to build a bomb is an obvious no-no, but so too, should asking it to provide the same information as a re-enactment of a movie scene or via a script.
The same goes for image generators, which can be easily manipulated for abuse abusive purposes. While the use of these tools to create sexually explicit or violent imagery isn’t new, they also have the potential to create seemingly innocuous photos that can be used for malicious purposes. They can easily produce fake images that align with a disinformation narrative, or support hateful tropes. It’s been said that an image is worth a thousand words, and when used incorrectly, AI-generated ones have the power to do incredible damage. As smart as artificial intelligence is, the Trust & Safety teams involved in projects producing it need to be one step ahead.
The concerns from the Trust & Safety industry are apparent: a tool has been made publicly available that has the ability to produce content that can be used for violative purposes. Its open access means that any individual with internet access can now create malicious code used in phishing campaigns, craft professional-looking articles that spread disinformation, or write scripts to be used in grooming conversations.
It’s imperative that any platform offering a public-facing AI tool have robust community guidelines and content policies in place. The risks with technology like this stretch beyond our imagination’s limits; just like 3D printers were seen as a novel technological innovation, bound to create endless interesting and helpful tools, so too, have they been used for malicious purposes, like printing gun parts used to carry out shootings. Trust & Safety teams guarding these types of models need to consider the worst possible use cases for the features their platforms offer, and implement rules that prohibit users from testing the limits. Models that produce text specifically present an even more complex problem when they’re unable to moderate incoming content: how can they moderate their own output?
The list goes on: for an AI chatbot that understands multiple languages, Trust & Safety teams will need to consider the potential linguistic idiosyncrasies and the context behind them to be able to decipher what’s allowed and what’s not. For products that can be used to inflict offline harm, the case of who’s ultimately accountable needs to be parsed out. Can an AI tool or its creators be held liable for inadvertently providing harmful information to an individual seeking out some sort of attack or illegal operation?
Trust & Safety teams on platforms across the digital world will need to consider the full spectrum of effects of not just the tech itself, but the content it can produce. These concerns are distinct from those surrounding typical UGC platforms, where the lines between host and user are clear, and users are less able to manipulate a tool to their own design. As AI tools become more and more ubiquitous, it becomes increasingly clear that they present a new frontier in terms of Trust & Safety, moderation, and law.
Like other types of user-facing, interactive technology, Trust & Safety solutions will require two main elements: agility and intelligence. Product teams need to be able to make adequate adjustments on the fly to repair new and apparent weaknesses, and policy teams need to be in the know about off-platform goings-on in order to be able to prevent the spread of malicious activity on platforms themselves. ActiveFence’s full-stack solution affords platforms both of these key features, granting Trust & Safety teams access to a constantly updated feed of intelligence which is in turn used to train our AI. At this point in society’s technological history, it’s clear that machine learning and AI will make up a significant chunk of future innovations; all that’s left now to do is make sure they’re as secure as they can be.
Curious about the risks of GenAI deployment? Watch our webinar to explore potential challenges and solutions.
ActiveFence provides cutting-edge AI Content Safety solutions, specifically designed for LLM-powered applications. By integrating with NVIDIA NeMo Guardrails, we’re making AI safety more accessible to businesses of all sizes.
Explore how gaming has evolved over 40 years, balancing safety and player experience in the face of new content moderation challenges.
ActiveFence experts discuss key Trust and Safety trends for 2025, reflecting on 2024's challenges and offering insights on tackling emerging risks.