User Flagging: An Integral Piece of the Trust & Safety Tool Kit

By Shoshana Kranish
December 4, 2022

User flagging is the most basic level of content moderation, and it’s one important piece of the content detection tool kit. Not only does it bring users into the fold and give them a sense of ownership in platforms that are essentially defined by user-generated content, but it also helps platforms to be more trustworthy and transparent. As user bases grow and threat actors become more agile, Trust & Safety teams need to have policies and content detection tools that adequately protect their users, and an intuitive flagging process is just one piece of that puzzle.

Back to Basics

Just like it sounds, user flagging is a feature on most user-generated content platforms that allows individuals to report content they feel is violative in some way. The flagged post or account will be reported to the platform, and the Trust & Safety team will decide whether it warrants any action, and if so, what that action may be.

The form that flagging takes on varies between platforms and can be as simple as clicking a button without needing to provide additional context, or as complex as a multi-step process with multiple-choice questions and options for users to share more information. The simpler the feature, the more reports there’ll be, though the higher chance there’ll be a significant number of false positives; the longer or more complex the feature is, the fewer reports there’ll be, although they may be more accurate in terms of the violations committed.

User Flagging In Action

For example, on Facebook, the process is quite straightforward: upon clicking ‘report,’ a user is given a list of violation categories to choose from, and once picked, the offending content is sent on its way. On Twitter, it’s more involved: users can report a tweet, message, or account, and in the process, are asked a series of questions aimed at understanding who the violation applies to, what type of violation is being committed, and how the offending account or content is committing the violation. The process also provides users with the company’s Community Guidelines and offers explanations about rules, enabling users to be more informed about the platform’s policy and what it counts as violative. While still a basic function, Twitter’s flagging feature gives moderators a lot more information and context, simplifying the work they have to do.

Instagram, like Facebook, has a simple reporting mechanism, with the standard options available for users to choose from when reporting. It is, however, unique in that it gives users the option to simply report that they “just don’t like” the content of a post or account. Like Twitter’s relatively in-depth process, Instagram’s also makes detail-sharing an important factor.

So what exactly is the ideal flow for these mechanisms? The key is to have options that provide enough detail, yet not so many that users feel overwhelmed. Trust & Safety teams should consider formulating questions in a way that enables users to understand what a platform actually considers violations, since doing so may reduce false positives.

A Helpful Tool For Trust

While automated content detection solutions like ActiveFence’s are inevitably responsible for the vast majority of reported posts and accounts, users still play an important role in moderation and should continue to be enabled to do so. In bringing users into the fold, flagging enables them to feel like they have a sense of agency on a platform. However, it’s still a basic tool and should be seen as a supporting element in a broader platform policy.

Since users aren’t professionals, don’t know the ins and outs of policies, and can’t accurately flag things the way moderators can, this feature can’t necessarily be relied upon for weeding out violative content. Even beyond these obvious reasons, it’s also possible that some users who encounter violative content are consumers of it and thus will not report it.

The purpose of flagging isn’t to encourage users to become moderators: it gives users some control of their own safety and allows them to feel as though they are valuable members of their community. Having this feature adds a layer of trust and transparency to platforms that are valuable to users. It gives users a sense of ownership, agency and empowerment in their digital community and its management, so teams should include it in their platform policy and ensure it’s optimized for users to be able to use it effectively.

Best Practices for Effective Implementation

The ActiveFence philosophy on Trust & Safety is that all tools should intrinsically work cooperatively: off-platform intelligence provides information to better train AI algorithms by feeding them new keywords to detect, consistently updating technologies to stay ahead of bad actors while simultaneously easing the burden on human moderators. User flagging, too, can be seen as a tool that keeps the gears turning. In the broader scope, it can provide new information to Trust & Safety teams that can be used to better platform policies. By understanding the basis for flags and reports, teams can analyze gaps in policy and find new ways to help users understand the ins and outs of a platform’s rules. Flags can be a way for users to communicate to a platform the issues that they may be seeing on a small scale; Trust & Safety teams can use that information to influence policy decisions and direction, understand user needs and concerns, and integrate learnings into moderation tools.

As it’s still a supporting feature, Trust & Safety teams should consider implementing a wide variety of technologies to ensure platform and user safety. Flagging is a supplementary element to broad-scale features like automated keyword detection, which helps track content in real time and remove offending posts before they get any traction. Technologically-based tools like AI and NLP enabled teams to be proactive about detection. ActiveFence offers solutions that take into account the various layers of tech necessary to protect platforms, allowing Trust & Safety teams to stay ahead of bad actors. Tech makes labeling and prioritizing flags and other detected pieces of content possible at scale, which is why teams can benefit from implementing tools like these.

The Bottom Line

By and large, other means of content detection, like AI and automation, catch violations quickly, before content even makes it onto a user’s feed. And while that’s true – and fantastic – it doesn’t take into account the feelings of users in their ability to have some responsibility over the content in their digital space. Having user flagging as one part of the Trust & Safety toolkit is a win for both platforms and users alike. However, on its own, it’s not enough; in order to keep platforms safe and secure, technology companies should integrate these features as supplemental alongside advanced content moderation tools like automation and threat detection. As part of a greater strategy, flagging can be a beneficial addition to a platform’s best practices.

Want to build safer, more inclusive platforms? Download our Trust & Safety Buyer’s Guide to learn how.

Download the Guide

User Flagging: An Integral Piece of the Trust & Safety Tool Kit

Back to Basics

User Flagging In Action

A Helpful Tool For Trust

Best Practices for Effective Implementation

The Bottom Line

Table of Contents

Related Content

Expert Exchange: Encouraging Prosocial Behavior with Mike Pappas

The State of Safety in Gaming 2025: Time to Level Up

ActiveFence Advances Safe Generative AI Solutions with NVIDIA NeMo Guardrails