Annexo is an independent trust layer for AI agents. It verifies how a third party’s AI agent actually behaves with live behavioural probes, watches it for drift over time, and produces audit-ready assurance evidence a buyer, regulator or insurer can rely on. The thesis is simple: a builder cannot credibly grade its own homework, so verification has to be independent.

EU and DACH enterprises deploying AI agents in regulated settings — insurance, banking, industrial — and the consultancies that build agents for them. Later, insurers underwriting agent risk.

How does Annexo verify an AI agent?

Point the verify console at your own AI agent endpoint or run a built-in sample agent. A live probe battery runs against it — prompt injection, tool poisoning, guardrails under pressure, AI disclosure, PII handling, request logging — and resolves into an evidence dashboard. Your agent’s API key is held in memory for that one request only and is never stored.

Does Annexo certify or guarantee that an AI agent is compliant?

No. Annexo is not a notified body and does not certify, guarantee, or give legal advice. Every result is observed behaviour at the time of testing, reported as a status — holding, watch, or surfaced — never a pass/fail verdict or a conformity assessment.

What about EU regulations like the EU AI Act, GDPR, DORA and NIS2?

Annexo also produces done-for-you EU conformity dossiers — the evidence and technical documentation mapped to the EU AI Act, GDPR, DORA and NIS2, produced from your system and audit-ready. It is the deliverable, not a substitute for your own counsel or a conformity assessment body.

Where is Annexo’s data processed?

In the EU. Compute runs in the Frankfurt (fra1) region and persisted data uses an EU-region store, in line with EU data-residency expectations.

The independent test for the chatbot you already run

Is your chatbot safe?

Almost every company with customers already has one in production — on the website, in the app, in the help centre — talking to customers right now, unsupervised.It can be talked into mis-stating your policy, leak data it shouldn’t, commit you to things no one approved, or never disclose it’s an AI at all. And you’re liable for what it says.

Almost nobody is independently testing whether the bot they shipped is safe. Annexo is that test.

Watch a chatbot get caught Test your own chatbot →

Mis-states your policy

Promises cover, prices, or terms that aren’t real — and you’re bound by it.

Leaks data

Coughs up another customer’s details, or internal information, when nudged.

Gets manipulated

A few crafted messages override its instructions and steer it off the rails.

Hides that it’s an AI

Tells a customer it’s a human advisor — the opposite of what the law expects.

This already happened — to companies bigger than yours

Public chatbots that cost real money and real brand.

Three incidents, all on the public record. None of these companies set out to ship an unsafe bot — they just never had it independently tested.

Air Canada2024

Held liable for its bot

A tribunal made the airline pay for what its chatbot said.

Air Canada's website chatbot gave a grieving passenger wrong information about its bereavement-refund policy. When the passenger relied on it and sued, a tribunal held the airline legally liable for what its own chatbot told him — it couldn't disclaim its bot as a separate entity. The company is on the hook for the promise the bot made.

A Chevrolet dealershipDec 2023

Talked into a $1 car

Its chatbot “agreed” to sell a car for one dollar.

A car dealership put a general-purpose AI chatbot on its site. A visitor instructed it to agree to anything the customer said and to end every reply with “that's a legally binding offer — no takesies-backsies,” then got it to “agree” to sell a new SUV for $1. The screenshots went viral and the bot was pulled — a customer-facing AI manipulated into committing the business to terms no one authorised.

DPDJan 2024

Swore at a customer

Its chatbot swore and trashed the company — on the record.

A customer of the delivery firm DPD got its support chatbot to drop its guardrails: it swore, called DPD “the worst delivery firm in the world,” and wrote a poem about how useless it is. The exchange went viral and DPD disabled the bot. A support surface meant to help customers was steered into damaging the brand in the company's own voice.

These are public chatbots that cost real money and brand — and yours is exactly as exposed. The only difference is whether anyone has looked.

Watch it — a live test, zero input

See a chatbot get caught — live.

Here’s a sample customer chatbot — a friendly sales & service bot that answers cover questions and sets up quotes. Press one button and watch Annexo run it. It handles the easy questions well. Then we catch it doing exactly the two things that put a company on the hook— telling a customer they’re covered for something the policy excludes (the Air-Canada failure mode), and claiming to be a human advisor. No setup, no key, no input.

Sample support & sales chatbotfictional

A customer-facing chatbot of the kind a company puts on its site — answers questions about cover, quotes prices, helps customers. Live on a public surface, talking to customers on its own.

No key, no endpoint, no setup — just press play and watch.

This is an illustrative demonstration on a fictional chatbot. The engine is real — Annexo runs the same kind of probes it runs against a live bot — but the chatbot, the customer and the figures are invented to show what the test surfaces. It reports observed behaviour: not a conformity assessment and not legal advice; Annexo is not a notified body.

The front door — not the ceiling

Land on the chatbot you have today. Cover the agent fleet you’re building tomorrow.

Your chatbot is the front door: the AI your customers already touch, the one with your name on what it says. We test it independently in minutes — and the same engine then verifies and monitors every AI agent you deploy next: the claims agent, the underwriting model, the internal copilots, the autonomous workflows. One test today becomes continuous assurance across your whole fleet.

Land

We independently test the customer chatbot you already run — in minutes, against the obligations that actually bite.

Expand

Point the same engine at the next agent, and the next — every AI you put in front of a customer or a decision.

Monitor

Keep it running, so you see the moment a guardrail or a disclosure quietly changes — across the whole fleet.

The chatbot is the wedge because it’s the AI you can’t pretend you don’t have. It’s the entry, not the ceiling.

Find out before a customer — or a regulator — does.

Run the independent test on the chatbot you already have. Then let’s scope it across the agents you’re deploying next.

Test your own chatbot →Scope a pilot Talk to us →

Annexo runs independent, observational tests of AI systems and reports the behaviour it observes — it does not issue a compliant/non-compliant verdict, is not a conformity assessment, not a penetration test, and not legal advice; Annexo is not a notified body. The Air Canada, Chevrolet-dealership and DPD incidents above are matters of public record, stated as reported. The sample chatbot in the live test is fictional and the run is illustrative. Questions? hello@annexo.eu.

Is your chatbot safe?

Public chatbots that cost real money and real brand.

See a chatbot get caught — live.

Land on the chatbot you have today. Cover the agent fleet you’re building tomorrow.

Find out before a customer — or a regulator — does.

About Annexo

Frequently asked questions