Improve your detection and simplify moderation - in one AI-powered platform.
Stay ahead of novel risks and bad actors with proactive, on-demand insights.
Proactively stop safety gaps to produce safe, reliable, and compliant models.
Deploy generative AI in a safe and scalable way with active safety guardrails.
Online abuse has countless forms. Understand the types of risks Trust & Safety teams must keep users safe from on-platform.
Protect your most vulnerable users with a comprehensive set of child safety tools and services.
Our out-of-the-box solutions support platform transparency and compliance.
Keep up with T&S laws, from the Online Safety Bill to the Online Safety Act.
Over 70 elections will take place in 2024: don't let your platform be abused to harm election integrity.
Protect your brand integrity before the damage is done.
From privacy risks, to credential theft and malware, the cyber threats to users are continuously evolving.
Stay ahead of industry news in our exclusive T&S community.
Learn more about ActiveFence's Generative AI Safety solutions
In 2023, we saw Generative AI exploitation surge, as bad actors got a host of new tools to use to create and distribute harmful content. With the lessons learned last year, here’s what we’re looking out for in 2024.
Over the past year, our team, like many others, has been entrenched in GenAI. Unlike other teams, however, we’ve spent the year studying how GenAI is abused by bad actors, as we work with our partners, including seven leading foundation model organizations and GenAI applications, to ensure AI safety by design. As we’ve done so – we’ve learned a lot about bad actor tactics, foundation model loopholes, and how their convergence allows harmful content creation and distribution – at scale.
With that knowledge, here are the seven generative AI risks we are preparing for in 2024:
2024 will see an acceleration of AI model releases, with multimodal capabilities that allow various combinations of inputs (such as text and images in the same prompt), creating a new suite of Generative AI risks. Threat actors can combine two prompts that are benign in isolation to generate harmful materials not detected by existing safety mitigations.
To illustrate, consider a non-violative prompt involving adult speech, combined with a childlike audio or image. Taken separately, these would not raise any flags, it is only when looking at both inputs together, that harmful content with CSAM implications may be identified.
This threat extends far beyond CSAM production, for example, our testing also showed that combinations of this type can generate personal information of social media users, such as home addresses or phone numbers.
Text-to-Audio models will grow more prevalent in 2024, and their use in fraud, misinformation, and other risky contexts will grow.
Already being used in voice-cloning fraud schemes, this type of abuse will become more prominent in other abuse areas as well. For example, in hate speech and disinformation, threat actors may clone an individual’s voice to falsely claim that they made a controversial or misleading statement and use this to achieve malicious aims.
Concerningly, ActiveFence has also already documented voice cloning being used by child predators for malicious purposes.
Half of the world will vote in 2024, and AI-generated mis- and disinformation will affect the electoral results. ActiveFence’s work in 2023 showed how this process is already in motion.
In Q4 2023, ActiveFence detected both foreign and domestic actors using Generative AI to target American voters with divisive narratives related to US foreign and economic policy. The potential scale for this type of abuse is dramatic: in the past month alone, the ActiveFence team reviewed politically charged AI-generated content related to one model, with over 83M impressions.
Demonstrated early last year, the use of GenAI tools to create CSAM and sexually explicit content continues to grow. We expect this trend to continue well into next year.
Additionally, changes in the sexual extortion landscape, alongside the rise in generative AI adoption, point to an upcoming explosion of nude image creation by organized crime organizations, people known to victims and other threat actors for use in extortion. This growth has already been documented by ActiveFence in 2023, consider the following statistics:
Copyright issues are a known challenge in the Generative AI space due to the fundamentals of the technology. In the context of recent lawsuits, we expect continued legal scrutiny, attempts to transfer liability across the ecosystem, and new standards and policies adopted by the major players.
Generative AI players are proactively preparing for this, as in Q4, many of our AI safety customers requested deeper work in this arena.
In 2023, we saw widespread abuse of foundation models to create bad AI models and chatbots with few to no safety restrictions (e.g., WormGPT and FraudGPT). We expect this trend to continue in 2024, as threat actors uncover more ways to exploit new— and open-source—technologies.
An example of this abuse was uncovered in December 2023, when uncensored chatbots named TrumpAI, BidenAI, and BibiBot AI, which claimed to emulate politicians but actually promoted far-right and antisemitic content, were released on Gab.
As the launch of LLM-based enterprise applications moves beyond the early adopter phase, ActiveFence expects more incidents related to privacy, security, and Generative AI risks applicable in a corporate context.
ActiveFence customers and prospects are increasingly concerned about brand risk, the provision of problematic financial, legal, and medical advice, PII, and model jailbreaking for fraudulent purposes.
Knowing what we know about AI risks in 2023, it is doubly crucial to prepare for 2024’s AI risks by proactively identifying emerging threats.
Our proactive approach to AI safety includes AI red teaming, risky prompt feeds, threat intelligence, prompt and output filters, and a safety management platform – enabling robust AI Safety programs and providing ongoing support and services.
Click below to learn more about how ActiveFence ensures AI safety by design.
Learn about ActiveFence's approach to AI safety
The decision to build or buy content moderation tools is a crucial one for many budding Trust and Safety teams. Learn about our five-point approach to this complex decision.
Recognized by Frost & Sullivan, ActiveFence addresses AI-generated content while enhancing Trust and Safety in the face of generative AI.
Building your own Trust and Safety tool? This blog breaks down for Trust and Safety teams the difference between building, buying, or using a hybrid approach.