Improve your detection and simplify moderation - in one AI-powered platform.
Stay ahead of novel risks and bad actors with proactive, on-demand insights.
Proactively stop safety gaps to produce safe, reliable, and compliant models.
Deploy generative AI in a safe and scalable way with active safety guardrails.
Online abuse has countless forms. Understand the types of risks Trust & Safety teams must keep users safe from on-platform.
Protect your most vulnerable users with a comprehensive set of child safety tools and services.
Our out-of-the-box solutions support platform transparency and compliance.
Keep up with T&S laws, from the Online Safety Bill to the Online Safety Act.
Over 70 elections will take place in 2024: don't let your platform be abused to harm election integrity.
Protect your brand integrity before the damage is done.
From privacy risks, to credential theft and malware, the cyber threats to users are continuously evolving.
Stay ahead of industry news in our exclusive T&S community.
Stay informed on the latest strategies for integrating Trust & Safety expertise into GenAI platforms.
In early 2023, a few months after the public launch of chatGPT, we first heard about a significant shift in budget allocation away from Trust and Safety (T&S) teams. During a conversation with one of our long-time champions at a large multinational company, amidst the hectic period marked by significant layoffs in the tech sector, our contact shared that her overall headcount and budget were dramatically reduced.
During that period, T&S departments, along with others, were cautiously planning the second half of the year, focusing on the new core business priority—Generative AI (GenAI). Our contact set aside a dedicated budget for managing the complex issues of GenAI safety, only to discover that this responsibility had been reallocated to the newly formed Responsible AI (RAI) department. Since then, we’ve seen this scenario repeated with numerous T&S leaders at major tech companies.
This trend is becoming increasingly common as tech companies shift focus to the highly visible, and seemingly fate-determining field of GenAI. We all watch as this emerging tech influences stock prices– subsequently altering business priorities, management decisions, headcounts, and budgets.
These days, many of our long-time and valued customers—T&S professionals—are striving to maintain their relevance in a world where a significant portion of the more “sexy” safety work is being shifted to AI Safety or RAI departments. These newly established teams don’t always include traditional T&S personnel, and their expertise in understanding how safety for large language models (LLMs), foundation models, and GenAI implementations work, gives them an edge and decision-making power.
While traditional T&S functions remain crucial for core product areas, T&S professionals should follow the money trail to maintain their strategic importance as business priorities shift.
This article offers a few actionable steps for T&S teams to align with their company’s GenAI activities. Based on our collaboration with RAI and AI Safety departments in leading GenAI companies, these insider tips aim to boost career growth for T&S professionals, as well as enhance user safety and well-being on any platform.
In another conversation with a T&S lead, we learned about a critical gap in their understanding of AI safety. Our contact was a manager of a small team of investigators and content moderators. When the company began experimenting with GenAI, like many others over the past two years, his department was tasked with advising on model safety. However, the T&S team was unfamiliar with the fundamental concepts of GenAI safety and ended up purchasing services and committing to projects they didn’t fully understand. The T&S team also refused to connect the AI safety team in their company directly with the knowledgeable vendors they were hiring, fearing they might be excluded from the process. This lack of collaboration resulted in inefficient use of resources and eventually led to the T&S team being sidelined and removed from the project.
To avoid this pitfall, it’s crucial to understand GenAI safety and stay informed about the latest developments.
So, what is GenAI safety? Essentially, it involves ensuring the safety of the company’s GenAI models and applications across the value chain. Today, safety can include both known and immediate risks as well as some of the more “far-fetched” artificial general intelligence (AGI) risks. Mitigations include “protecting the model” from bad prompts, preventing implementations from producing unsafe responses, and identifying and blocking problematic users. They also incorporate maintaining the model’s integrity by ensuring it is trained on safety-related data and conducting relevant safety evaluations and red teaming exercises.
AI safety departments are typically newly formed and often include some combination of machine learning (ML) experts, product managers, and RAI professionals. The more technical members of the team usually have experience deploying AI models but less knowledge of the safety arena in general. People with RAI backgrounds often have some expertise in safety, primarily centered around ethics, fairness, bias, and racism. However, their academic knowledge often doesn’t equip them with traditional T&S fundamentals, such as handling issues related to suicide, self-harm, or child exploitation.
The risk of not being involved in the company’s AI safety efforts is significant. Lack of participation can result in relinquishing control to others. As noted, we have seen numerous instances where T&S teams have lost power and budgets due to this lack of involvement. This is unfortunate, given the vast expertise traditional T&S professionals possess—expertise that AI safety teams often lack. Ideally, these teams should complement each other rather than compete for resources.
Another broader risk is that traditional safety categories might be neglected and not integrated well enough into AI model training. While ensuring model integrity by avoiding bias and racism is crucial, it is equally important to prevent the model from giving dangerous advice, creating illegal content, or being exploited by child predators. These are areas where traditional T&S knowledge is essential, making it their time to shine.
In a separate discussion, with a long-term partner – a powerful T&S executive in a large corporation – we asked for an introduction to the company’s AI safety department. Surprisingly, despite his seniority, our contact didn’t know where this responsibility sat within their organization.
This lack of awareness isn’t uncommon. GenAI projects, such as model launches or new features, are usually kept confidential in large multinational organizations due to their sensitivity. These top-secret deployments are highly guarded because they can significantly impact stock prices, driving billions of dollars up or down. Consequently, information about these projects is tightly controlled, and many employees, including T&S teams, are kept in the dark. This secrecy can be detrimental to the company, whose T&S expertise is crucial in these areas.
This secrecy results in many of our T&S contacts being excluded from GenAI safety discussions, despite their valuable expertise. This exclusion is unfortunate and highlights the necessity of building relationships with the relevant teams. Mapping the right people within the organization is essential to ensure T&S professionals stay in the loop and avoid being sidelined.
By establishing strong connections with AI safety teams, you can position yourself as an integral part of the process, contributing valuable insights and expertise. This proactive approach will help stay relevant and influential in the evolving landscape of GenAI safety.
It is important to identify where traditional T&S expertise is missing in GenAI safety programs and to establish a plan for collaboration. The strategies below are T&S 101’s which have also proven essential in creating foundation models and applications.
Consider this scenario: the typical lead of an AI safety department is likely an ML expert. She is responsible for safety, running versions of the model on training data her team purchases. However, she probably lacks knowledge of traditional risk categories. If she consults an RAI professional about risks to consider while testing the model, the response will often focus on bias and fairness. While these aspects are certainly important, they don’t address preventing the model from creating misinformation, hate speech, or child abuse, for instance.
Properly mapping online risks requires T&S knowledge of abuse areas, especially since threat actors are rapidly adopting GenAI platforms to mass-produce abusive or harmful material. This problematic content is then published on social media, which ultimately comes back to T&S teams tasked with moderating and removing this content, further complicating their day-to-day job.
Having a policy in place is another T&S fundamental, which we’ve sometimes seen implemented as an afterthought by AI safety teams. You might wonder how it’s possible to train a foundation model or fine tune a deployment without a written safety policy—this is precisely why your expertise is needed. Developing a policy is a process that requires specific expertise, an understanding of risks and edge cases and experience which AI safety professionals don’t always have on their teams.
Incidentally, this is where T&S expertise truly shines: they should be integral in developing policies for model outputs and its users. Contributions can come from known policies and risk areas, utilizing much of the company’s existing ‘macro’ policy work.
To AI safety teams, keywords might just be inputs for model training—they don’t necessarily understand the nuances behind them. They might not have seen how threat actors use keywords to disguise themselves by altering letters to numbers or using emojis. Keeping lists up to date with threat actors’ tactics is a traditional T&S practice that can help AI safety teams build more effective and robust filtering mechanisms.
T&S teams should provide input on any filtering or blocking processes the AI safety teams are building. They will appreciate existing datasets T&S teams have already curated, as these can be a straightforward way to create refusals. Sharing these datasets with AI safety teams can improve blocking processes and assist in developing robust keyword filtering mechanisms.
T&S teams often don’t understand what training data is, what it looks like, or how safety evaluations are built and work. But, by taking the time to understand – with AI safety teams – how this type of data should look and how it will be used, T&S professionals will be able to support the creation of useful and realistic safety data that reflects bad actor behavior and vulnerable user populations.
In particular, T&S teams can write risky prompts and suggested responses for different abuse areas and model policies that can support AI safety team efforts. They can also direct AI safety teams to the right vendors for this type of data; these are often the same vendors who provide subject matter and language expertise in traditional T&S arenas.
Learning and gaining knowledge in order to be part of the process and contribute is essential. For example, T&S expertise can help ensure that the training data supports the company’s safety policies, such as those concerning child safety, suicide & self-harm, or violence. By doing so, T&S teams can enhance the overall safety and effectiveness of AI models.
As we know, safety should be implemented and integrated throughout the product’s lifecycle—from development to deployment and beyond. When it comes to GenAI, ML experts and developers are usually focused on the pre-launch phase, ensuring that model inputs and outputs are safe. However, the post-launch phase often receives less attention, and this is where they can greatly benefit from T&S expertise.
To put it differently, GenAI developers and ML experts are often “model people,” while T&S teams include “system people.” System people are in charge of ensuring the entire deployment, or experience, functions safely. The system includes not just the model but also the platform and the users interacting with it “in the wild,” acknowledging a much broader, unexpected set of risks and mitigations. Safety issues in GenAI can never be fully resolved with improved training and testing alone.
T&S professionals have likely spent their careers understanding how to identify and flag problematic users. While AI safety professionals focus on the model’s behavior, T&S teams know how to keep the platform safe by managing user behavior—e.g., by implementing systems for user flagging and scoring them, blocking offending users, conducting investigations of specific incidents, or requiring identity verification as a preventive step.
T&S teams also have a better understanding of how potential abuses or manipulations can be applied to make the model misbehave—making it do things the developers never dreamed it could. This includes predicting that text-to-image models can create child sexual abuse material (CSAM) or that chatbots can – and will – be utilized for misinformation campaigns. This broader context of safeguarding the overall deployment, beyond just model development, is often a blind spot for AI safety teams.
By taking over ongoing monitoring and feedback processes, T&S teams can ensure that the widest range of safety measures are continuously evaluated and improved. This not only helps maintain the integrity of the AI models but also enhances overall safety.
Not all industry talk is about budget shifts and shrinking T&S teams. We’ve recently heard about a leading multinational technology platform with a massive GenAI focus that is forming a new team within the T&S department specifically to coordinate and conduct red team efforts for the entire company. This team will be responsible for testing model defenses and proactively identifying gaps and loopholes that may cause harm across all GenAI implementations.
We hope we are witnessing a kind of full circle, where companies are realizing the value that is lost by de-emphasizing existing T&S knowledge and expertise in AI safety processes. They are starting to reintegrate these teams in different, more crucial ways. T&S is being incorporated into the key processes and finding its way to the forefront again, albeit in a transformed role.
Over the past months, we at ActiveFence have worked with a number of the world’s largest and most innovative GenAI foundation model providers. We have tested and scrutinized the most sophisticated models to assess their safety. Our traditional T&S knowledge has been one of the cornerstones of all these projects and served us well in assessing risks and providing mitigation advice.
Don’t shy away from being an integral part of your company’s GenAI efforts. This is the future, and we need both trust and safety there.
Want to learn more about AI safety or discuss the future of T&S in the GenAI era? Come meet us at TrustCon! Join us in San Francisco, July 22-24, for the biggest Trust & Safety event of the year!
As Trust & Safety professionals gather at TrustCon 2023 to share successes, lessons learned, and discuss the future of our field, join us for these must-attend events.
Incident management protocols and work processes are crucial to mitigate policy violations that inevitably occur.
The decision to build or buy content moderation tools is a crucial one for many budding Trust and Safety teams. Learn about our five-point approach to this complex decision.