Proposing Botwatch

Trust-Based Community Bot Detection

March 23, 2026

Is this account a bot? This question's importance is only growing in a world of LLMs and state-backed influence campaigns. Proposed here is Botwatch, a trust-based system for personalized bot detection.

In Botwatch, users publish records indicating whether they think others are bots and records indicating trust in a user’s scores. By analyzing this network, we can create useful signals to help users distinguish between bots and humans. Such a signal would consider your trust relations and output a personalized estimated bot score for a target user. There’s an example at the end of this proposal, but you don’t need to read it to know how it should work. If all the people you trust agree that someone is a bot or human, it should agree. If the people you trust have mixed opinions, perhaps the formula should be uncertain. Naturally, misplaced trust will result in inaccurate results. The hope, though, is that with sufficient scores and well-placed trust, these heuristics will correlate with the truth.

Bot Scores

Users can publish their perspectives on whether other users are bots. They score other users on a scale of -1 to 1 (or, equivalently, -100% to 100%). Users don't need to input these as numbers; an app might present this as a slider or as a few options.

Score: Meaning

----------------------

1: Certainly a bot

0.5: Probably a bot

0: Uncertain

-0.5: Probably not a bot

-1: Certainly not a bot

Here's a few examples:

Alice meets Bob at a convention. They exchange information and become certain that each other's accounts represent a real person. They each publish a score of -1 for the other.
Charlie encounters , a bot. Void doesn't pretend to be human. He gives Void a score of 1.
A company offers verification services. It will score you -0.85 if you upload your ID and pay a small fee.
Dave sees a suspicious account online. It's posting some bad takes and used an em-dash once. Dave rates it 0.9.
Ethan gets in a heated argument with a teammate in an online game. He calls them a bot and scores them 1 (lol).
Somewhere in Russia, a nefarious programmer spins up a small army of LLM-powered accounts. The bots rate each other -1, lying to the world about their nature.

Clearly, not all scores are created equal. A naïve average of all bot scores for a given account is a terrible measure, highly vulnerable to coordinated attack. How can we uncover useful signals from this information?

Trust Scores

An unknown user's input may mean little, but Twitter's verification was once meaningful. The difference is trust. My trust in a person or organization gives their voice meaning to me. Furthermore, trust in someone can extend, in part, to whoever they trust. These trust relationships form a trust web, informing whether a given bot score is worth considering.

Concretely, users can publish trust scores in each other.

Score: Meaning

---------------

1: Max trust

0.5: Trust

0: No trust

-0.5: Distrust

-1: Max distrust

Real trust is hardly one-dimensional, but these trust scores can serve as a useful approximation of trust when it comes to verifying humans, detecting bots, and trusting the worthy.

In Practice

How will this system be built? What will it look like?

Foundations

This system will be built on atproto, allowing for user-owned data and a diverse ecosystem of algorithms and experiences. This will prevent user lock-in and disincentivize service-level abuses. After all, systems controlled by a single entity hardly engender trust. Bluesky's custom labelers and feeds represent ways this system could enhance existing social media experiences while its 'following' relationships are a possible starting point for trust. Users might choose an algorithm that includes virtual bot or trust scores for those they follow.

User Experience

Before this trust web is integrated into existing social media experiences, it needs an interface of its own. This application must allow users to explore the network and update their scores for others. They should be able to view estimated bot scores, ideally with user control over the algorithm used to calculate them. Critically, they need to be able to explore that calculation and discover the source of unexpected outputs. When a user sees an estimated bot score they know to be incorrect, they need to be able to examine the trust web to determine the source of the error and react accordingly. They would likely reduce their trust in the culprit or suggest that they correct their mistake.

Scoring atproto identities is a natural place to start, but there's no reason to stop there. Users could also score Instagram, YouTube, or TikTok accounts. While the initial use case may only be the app and Bluesky labels/feeds, this system could be used to support future app experiences and moderation.

A Parting Thought

Scores are speech and speech can be abusive. Surely some will use it as an avenue for attack. It remains to be seen whether this trust web will support genuine connection between humans. Should it succeed, this sort of trust model could help answer more than "is this account a bot?".

Botwatch will launch on March 26th, 2026.

Appendix - Example Hueristic

Here's an algorithm that works how we expect trust to. It considers what those we trust have said to provide us an estimate of whether users are bots. This is hardly the only reasonable choice for such an algorithm; users should be able to choose what they see as the best method.

1.
Choose a max_depth. We will only consider scores from users at most that many steps away on the trust graph.
2.
Choose a method for combining scores, weighted according to your trust in its source. The result should be -1 ≤ x ≤ 1. Here's a simple one.
1. 1.
  Weighted Average
  1. 1.
    Ignore scores from sources you don't trust (𝑇 <= 0)
  2. 2.
    Multiply each score by your trust in its creator and sum the values.
  3. 3.
    Divide by the sum of the trust scores.
3.
If you've published a bot score for the target directly, that's it.
4.
Collect each bot score for the target account.
5.
Compute your trust in the creator of each score.
1. 1.
  If max_depth == 0, your trust in them is 0.
2. 2.
  If you've published a trust score in them directly, that's it.
3. 3.
  Consider each account that directly published trust in them.
4. 4.
  Use this method to compute your trust in that account with max_depth = max_depth - 1.
5. 5.
  Use your chosen method to weigh the trust scores and obtain a result.
6.
Use your chosen method to weigh the bot scores and obtain a result.

moderation

bots

trust

botwatch