AI accountability checklist: lessons from Timnit Gebru

Q: When should we start doing AI accountability work for a feature?

Start right before you ship , when real users will begin relying on outputs. If you wait until after launch, you’ll be documenting incidents instead of preventing them, and you’ll have less time (and fewer options) to add guardrails or narrow scope.

Q: What does “AI accountability” actually mean in practice?

Accountability means you can point to written decisions about: - What the system is for (and not for) - What data it uses (training and runtime) - Known limits and failure modes - Who could be harmed and how - What you’ll do when it fails (monitoring, escalation, rollback) If you can’t show those decisions and an owner for them, you don’t have accountability.

Q: What counts as an AI feature that needs this level of review?

Any feature where a model’s output can change what people see, do, or how they’re treated . That includes “small” features like summaries or suggested replies if someone might act on them (send them to customers, deny a request, change a priority). If it influences a decision, treat it like a real product surface with real risk.

Q: What’s the minimum documentation we should have before launch?

Have a small “minimum set” in writing: - Purpose and users (including out-of-scope uses) - Data and sources (training/tuning, retrieval, logs, storage) - Known limits (with examples of bad outputs) - User harm risks (privacy, bias, unsafe advice, over-trust) - Monitoring + incident plan (alerts, escalation, rollback triggers) Keep it short, but make every claim testable.

Q: How do we document limitations so they’re actually useful?

Start with one sentence: what the model does . Then add “not for” boundaries. Include a short list of: - Inputs that confuse it (ambiguous requests, mixed languages, missing context) - Situations it misreads (sarcasm, anger, jokes) - Known failure patterns (hallucinated policy, wrong entity, wrong dates) - Abuse cases (prompt injection, attempts to extract private data) - Operational limits (latency/cost caps, timeouts, context window limits) Add 3–5 concrete bad-output examples so non-engineers can understand the edges.

Q: What’s the simplest way to run a user harm assessment?

Separate error from harm : - Error: the model output is wrong (bad summary, false flag). - Harm: what happens because of that error (lost money, unfair access, privacy exposure). Then write a few short scenarios: who the user is, what they ask, what the model might do, and what action follows. Rate each scenario by severity and likelihood , and assign an owner to each mitigation.

Q: How do we “gate” an AI feature from prototype to release?

Use a gated flow from prototype to release: 1. Define the decision the AI influences. 2. Draft data notes + limitations early (before UI polish). 3. Test with messy, edge, and sensitive cases (including out-of-scope prompts users will try). 4. Add guardrails: refusals, “needs review,” fallbacks, easy reporting. 5. Define monitoring and an incident plan, including a rollback trigger. If a gate feels hard, that’s usually where the real risk is.

Q: What are the most common AI accountability mistakes teams make?

Common mistakes: - Treating an offline score as the launch decision - Forcing one confident answer instead of allowing “I don’t know” or “needs review” - Testing only with team-written prompts (not messy real inputs) - Writing docs after launch and never updating them - Shipping without a rollback path A practical fix: keep the checklist inside the product spec and require sign-off before release.

AI accountability checklist: lessons from Timnit Gebru | Koder.ai

Why AI accountability matters when you are about to ship

Building an AI feature used to be mostly a technical question: can we get the model to work? Now the harder question is whether you should deploy it, and what limits you need.

Once real users rely on AI output, small issues turn into real costs: wrong decisions, confused customers, privacy leaks, or unfair treatment.

AI accountability isn't a vibe or a promise. It's written documentation plus clear decisions that someone owns. If you can't point to what data you used, what the system cannot do, and what you'll do when it fails, you don't have accountability. You have hope.

This matters most right before launch, when it's tempting to treat documentation as optional. Shipping without it creates surprises that are expensive later: support tickets with no answers, angry users, product rollbacks, and internal finger-pointing.

A simple accountability checklist forces concrete answers:

What data fed the feature, and what are the known gaps?
What is the intended use, and what is explicitly out of scope?
What mistakes are likely, and who could be harmed?
What guardrails are in place (human review, fallbacks, monitoring)?

The goal isn't theory. It's to document the basics (data, limits, risks), then make a decision you can defend later, even if you're moving fast.

Timnit Gebru in one page: what her work changed

Timnit Gebru is one of the most cited voices in AI accountability because she pushed a simple idea that many teams skipped: it isn't enough to ask "can we build it?" You also have to ask "should we deploy it, who could it hurt, and how would we know?"

A big part of that shift is making AI systems legible to other people. Not just to the engineers who trained the model, but to reviewers, product managers, support teams, and users. The point is to write down what the system is meant to do, what data shaped it, where it fails, and what the risks look like in real life.

Two practical artifacts became popular because they make that legibility concrete:

Dataset notes (often called datasheets for datasets): what the data is, where it came from, who is represented or missing, and what it should not be used for.
Model notes (often called model cards): what the model is for, how it was tested, known limits, and what kinds of mistakes to expect.

For product teams, this isn't paperwork for its own sake. Documentation is evidence. When someone asks, "Why did we ship this feature?" or "Why didn't you catch this failure mode?" you need something you can point to: what you measured, what you chose not to support, and what safeguards you added.

A concrete example: if you add an AI summary button in a support tool, the model notes should say whether it was tested on sensitive topics, how it handles uncertainty, and what the human review step is. That turns a vague worry into a decision you can defend and improve.

What counts as an AI feature and what can go wrong

An AI feature is any part of a product where a model's output can change what people see, what they can do, or how they are treated. If the output influences a decision, even a small one, treat it like a real feature with real consequences.

Common types include summarization, ranking, recommendations, moderation, and scoring (risk, fraud, quality, eligibility, priority).

When things go wrong, the impact can reach beyond the person clicking the button. People who can be harmed include end users, non-users (people mentioned or profiled), support staff and moderators, contractors and reviewers, and data subjects whose data was used to train or evaluate the feature.

It helps to separate errors from harms. An error is the model being wrong: a bad summary, a false flag, or an irrelevant recommendation. Harm is what that error causes in the real world: lost money, unfair access, damaged reputation, or safety risks. For example, a support assistant that hallucinates a refund policy is an error. The harm is a customer making a purchase based on it, then being denied, or a support agent having to handle angry tickets.

Harms are often uneven across groups and contexts. A moderation model might "work fine" for most users but repeatedly misread slang or dialect, leading to more removals for one community. A ranking model might bury small sellers unless they match patterns common to larger brands.

If you build AI features through a chat-driven builder like Koder.ai, the speed is real, but the accountability work stays the same. You still need to be clear about where the model can fail and who pays the price when it does.

The minimum documentation you should have before launch

Before you ship an AI feature, you need a small set of documents that answer one question: what did we build, who is it for, and what can go wrong? Keep it short, but make every claim testable.

Minimum set to have in writing before release:

Purpose and users: what the feature is for, who will use it, and who should not. Include the decision it helps with (or replaces).
Data and sources: what data trained or tuned it, what data it reads at runtime, and what data you store. Note sensitive fields and consent assumptions.
Known limits: where it fails, what it cannot know, and what it tends to confuse. Add a few examples of bad outputs you've already seen.
User harm risks: realistic ways people could be misled, excluded, or exposed (privacy, bias, unsafe advice, over-trust).
Monitoring and response plan: what you'll measure after launch, who gets alerts, and what triggers a rollback or a feature lock.

"Documented" isn't the same as "understood." A doc nobody reads is just a file. Have one person outside the building team read it and sign off in plain language: "I understand the limits and the user impact." If they can't summarize it back to you, you aren't ready.

Assign a single owner to keep the docs current (usually the product owner for the feature, not legal). Set a cadence (every release or every month), plus an immediate update after any incident.

Keep the tone honest and concrete. Avoid claims like "high accuracy" unless you name the test set, the metric, and the failure cases you didn't fix.

Data documentation: what to record and how detailed to be

Good data notes do two jobs: they help you predict failures before users find them, and they give future teammates a clear reason to trust (or stop trusting) the system.

Keep the level of detail "enough to answer hard questions in 10 minutes." You aren't writing a thesis. You're writing down facts someone will need during a bug report, a privacy review, or a customer complaint.

Start with a simple data inventory. For each dataset (including logs, feedback, and third-party sources), record the source and who controls it, when it was collected and how often it updates, what product behavior it supports, what consent and privacy boundaries apply, and how it was labeled or cleaned.

Representativeness deserves its own line. Name what's missing: regions, languages, devices, accessibility needs, user types, or edge cases. Write it plainly, like "mostly US English mobile users" or "few examples from small businesses."

If you use human labels, document the labeler context (experts vs. crowd), the instructions they saw, and where they disagreed. Disagreement isn't a flaw to hide. It's a warning sign to design around.

Limitations documentation: make the edges visible

Separate roles, reduce mistakes

Use Koder.ai’s agent approach to split planning, building, and testing into clear steps.

Try Agents

Limitations docs are where you move from "it worked in the demo" to "here is what this feature can safely handle." If you only write the happy path, users will find the edges for you.

Start by naming the job of the model in one sentence, then name what it is not for. "Draft short replies to common questions" is very different from "decide refunds" or "detect fraud." That boundary makes later decisions (UI copy, escalation rules, support training) much easier.

Capture known failure patterns in plain language. A good limits section usually covers what inputs confuse it (ambiguous requests, missing context, mixed languages), what tone it misreads (sarcasm, jokes, anger), what it does poorly in rare cases (niche terms, unusual products), and what can break it on purpose (prompt injection, bait to reveal private data).

Include operational constraints because they change user experience and safety. Write down latency targets, cost limits, and what happens when you hit them (timeouts, shorter answers, fewer retries). Note context window limits (it may forget earlier messages) and dependency changes (switching LLM providers or upgrading a model can shift behavior).

Then produce a single warning you can reuse in the product:

"AI-generated responses may be incomplete or wrong. Do not use them for legal, medical, or financial decisions. If this concerns billing, refunds, or account access, contact support."

Update this note whenever the model, prompts, or policies change.

User harm assessment: turn worries into a written risk map

A harm assessment isn't a debate about abstract ethics. It's a short document that says: if this feature is wrong, who can get hurt, how, and what we will do before and after launch.

Start with broad categories so you don't miss the obvious: safety, discrimination, privacy, deception, and reliability.

Then turn each harm into a real situation. Write one or two concrete stories per category: who is the user, what they ask, what the model might output, and what the user might do because of it. The key is the action chain. A wrong answer is annoying. A wrong answer that triggers a medical decision, a money transfer, or a policy change is much bigger.

To prioritize, use simple scales. For each scenario, mark severity (low, medium, high) and likelihood (low, medium, high). You don't need perfect numbers. You need a shared view of what deserves work now.

Finally, assign owners. A mitigation with no name isn't a mitigation. For each scenario, write down the mitigation before launch (guardrails, UX warnings, blocked topics, logging), the mitigation after launch (support playbook, monitoring, rollback trigger), and who is accountable.

Step by step: how to gate an AI feature from prototype to release

Draft safely for support

Create a support reply assistant and bake in “needs review” workflows from day one.

Build Assistant

Gating is how you move from "we can build it" to "we should ship it." Treat it like a set of exits: you don't pass the next exit until the basics are written down, reviewed, and tested.

Write the intent and the decision it will influence. Be specific about who uses it, what they're deciding, and what happens if the output is wrong.
Draft your data and limitations notes early. Do this before you polish the UI, while the feature is still easy to reshape.
Test on realistic, edge, and sensitive cases. Use messy text, slang, different languages, long threads, and ambiguous requests. Add a few high-stakes cases (billing disputes, account access, medical or legal questions) even if the feature isn't meant for them, because users will try.
Add user messaging, fallbacks, and escalation. Decide what the user sees when the model refuses, is unsure, or performs poorly. Provide a safe default (like "ask a human"), and make it easy to report a bad answer.
Define monitoring, incidents, and rollback. Pick the signals you'll watch (complaints, reversal rate, flagged outputs), who gets alerted, and what "stop the feature" looks like.

If any step feels hard, that friction is usually telling you where the risk is.

Common mistakes teams make with AI accountability

The fastest way to undermine trust is to treat a good score in a lab as proof you're safe in the real world. Benchmarks help, but they don't show how people will push, misunderstand, or rely on a feature in daily work.

Another common failure is hiding uncertainty. If your system always speaks with the same confidence, users will assume it's always right. Even a simple "not sure" path, or a short note about what the answer was based on, can prevent people from taking a shaky output as fact.

Teams also tend to test with their own habits. Internal prompts are polite and predictable. Real users are tired, rushed, and creative. They paste messy text, ask follow-ups, or try to get the model to break rules.

Five mistakes show up repeatedly:

Treating a benchmark or offline eval as the launch decision
Forcing one confident answer instead of allowing "I don't know" or "needs review"
Testing only with team-written prompts and skipping messy, real user inputs
Writing docs after launch, then never updating them as the feature changes
Shipping without a rollback path

A practical fix is to make accountability part of the build. Keep the checklist inside the spec, and require it before release: what data you used, what it fails on, who could be harmed, and what you will do when it goes wrong.

One concrete example: if you deploy an AI assistant inside an app builder, test it with vague requests ("make it like Airbnb"), conflicting requirements, and sensitive content. Then set a clear rollback plan (snapshots, versioning, fast disable switch) so you can act quickly when users report harm.

Quick checklist you can copy into your spec

Paste this into your product spec and fill it in before you ship. Keep it short, but make every answer specific. Name an owner for each risk.

### 1) Purpose and affected people
- Feature name:
- What decision or action does the AI support (one sentence):
- Who uses it:
- Who is affected even if they never use it (customers, employees, bystanders):
- What a “good” outcome looks like:

### 2) Data used (training, tuning, retrieval, logs)
- Data sources (where it came from and why it’s allowed):
- What you excluded (and why):
- Sensitive data involved (PII, health, finance, kids):
- Data retention period and deletion plan:
- Security and access controls:

### 3) Limits and “do not use” zones
- Known failure modes (give 3-5 concrete examples):
- Languages supported and not supported:
- Inputs it should refuse (or route to a human):
- Cases where it must not be used (legal, medical, hiring, etc.):

### 4) User harm assessment
- Top 5 harms (ranked):
- Mitigation for each harm:
- Who owns each mitigation (name + team):
- What you will tell users (warnings, confidence cues, citations):

### 5) Operations after launch
- Monitoring signals (quality, complaints, bias flags, cost spikes):
- Human review path (when and how escalation happens):
- Rollback trigger (exact threshold or condition):
- Snapshot/version you can revert to:

Example: if the feature drafts customer support replies, list harms like "confidently wrong refund policy" and set a rule that low-confidence drafts require approval before sending.

Example: documenting an AI assistant for customer support

Plan before you generate

Write purpose, limits, and out-of-scope rules first with Koder.ai planning mode.

Use Planning

A support team adds an AI reply assistant inside their customer chat tool. The assistant drafts replies, suggests next steps, and pulls context from the current ticket. Before shipping, they write a short doc that fits the checklist: what the system sees, what it might get wrong, and who could be harmed.

Data notes (what it learned from vs what it sees now)

They separate two sources. First is training or fine-tuning data (past support tickets, internal help docs, product policies). Second is live context (the customer message, account plan, order status, and any notes shown in the agent console).

They write down privacy expectations for each source. Old tickets may include addresses or payment issues, so they define rules: redact sensitive fields before training, avoid storing full chat transcripts longer than needed, and log only what is required to debug errors.

Limitations (make the edges visible)

They list weak spots in plain language: the model can invent policies, mirror a customer's angry tone, miss sarcasm, or perform poorly in less common languages. They also decide how to show uncertainty, such as a "Draft reply, needs review" tag, so agents don't treat it as fact.

They add a rule: the assistant must cite the internal doc or policy snippet it used, or it must say "I could not find a source."

They map likely harms: customers could be misled by a made-up refund rule, private info could leak into the reply, or biased language could lead to unfair treatment.

Mitigations go into the spec as concrete gates:

Clear disclaimer that replies are drafts and require agent approval
Risky topics (refunds, legal, security, medical) routed into review
Assistant blocked from requesting or repeating sensitive data
Human edits and complaints recorded as feedback signals
Fast rollback plan (for example, using snapshots and rollback if your platform supports it)

That turns "should we deploy it?" into written checks the team can test before customers feel the damage.

Next steps: make accountability a habit, not a one time task

Accountability only works if it changes what you do on release day and what you do after something goes wrong. Your notes should end in a clear decision, not a folder of good intentions.

Translate your documentation into one of three outcomes:

Ship: the feature meets the goal, risks are understood, and the controls are real.
Ship with limits: narrow who can use it, what it can be used for, and how results are shown.
Do not ship (yet): the data is too thin, failure modes are too costly, or you can't explain it well enough.

To make this repeatable, set a lightweight review ritual: one product owner, one engineer, and one person who can speak for users (support, research, or ops). They should sign off on the same few items each time: data source notes, known limitations, likely harms, and what happens when the model is wrong.

After launch, treat accountability like operations. Pick one cadence (weekly or per release) and make updates normal.

Run a short failure drill where you test obvious bad inputs and log what the user sees.
Collect feedback where it naturally appears (support tickets, thumbs up/down, internal QA notes).
Record incidents in plain language: what happened, who was affected, and what you changed.
Update the docs and the product together, so the next team member doesn't repeat the same mistake.

If you prototype quickly, keep the same discipline. Tools that move fast can still support good gates. For example, if you're building in Koder.ai (koder.ai), use planning mode to define boundaries early, and treat snapshots and rollback as part of your safety plan, not just a convenience.

FAQ

When should we start doing AI accountability work for a feature?

Start right before you ship, when real users will begin relying on outputs.

If you wait until after launch, you’ll be documenting incidents instead of preventing them, and you’ll have less time (and fewer options) to add guardrails or narrow scope.

What does “AI accountability” actually mean in practice?

Accountability means you can point to written decisions about:

What the system is for (and not for)
What data it uses (training and runtime)
Known limits and failure modes
Who could be harmed and how
What you’ll do when it fails (monitoring, escalation, rollback)

If you can’t show those decisions and an owner for them, you don’t have accountability.

What counts as an AI feature that needs this level of review?

Any feature where a model’s output can change what people see, do, or how they’re treated.

That includes “small” features like summaries or suggested replies if someone might act on them (send them to customers, deny a request, change a priority). If it influences a decision, treat it like a real product surface with real risk.

What’s the minimum documentation we should have before launch?

Have a small “minimum set” in writing:

Purpose and users (including out-of-scope uses)
Data and sources (training/tuning, retrieval, logs, storage)
Known limits (with examples of bad outputs)
User harm risks (privacy, bias, unsafe advice, over-trust)

How detailed should our data documentation be?

Record enough that someone can answer tough questions fast:

Where each dataset came from, who controls it, update frequency
What it’s used for in the feature
What sensitive fields exist and what consent assumptions apply
Cleaning/labeling steps (and labeler instructions if humans labeled)
What’s missing (languages, regions, user types, edge cases)

Write missing coverage plainly (for example: “mostly US English; few examples from small sellers”).

How do we document limitations so they’re actually useful?

Start with one sentence: what the model does. Then add “not for” boundaries.

Include a short list of:

Inputs that confuse it (ambiguous requests, mixed languages, missing context)
Situations it misreads (sarcasm, anger, jokes)
Known failure patterns (hallucinated policy, wrong entity, wrong dates)
Abuse cases (prompt injection, attempts to extract private data)
Operational limits (latency/cost caps, timeouts, context window limits)

What’s the simplest way to run a user harm assessment?

Separate error from harm:

Error: the model output is wrong (bad summary, false flag).
Harm: what happens because of that error (lost money, unfair access, privacy exposure).

Then write a few short scenarios: who the user is, what they ask, what the model might do, and what action follows. Rate each scenario by and , and assign an owner to each mitigation.

How do we “gate” an AI feature from prototype to release?

Use a gated flow from prototype to release:

Define the decision the AI influences.
Draft data notes + limitations early (before UI polish).
Test with messy, edge, and sensitive cases (including out-of-scope prompts users will try).
Add guardrails: refusals, “needs review,” fallbacks, easy reporting.
Define monitoring and an incident plan, including a rollback trigger.

If a gate feels hard, that’s usually where the real risk is.

What are the most common AI accountability mistakes teams make?

Common mistakes:

Treating an offline score as the launch decision
Forcing one confident answer instead of allowing “I don’t know” or “needs review”
Testing only with team-written prompts (not messy real inputs)
Writing docs after launch and never updating them
Shipping without a rollback path

A practical fix: keep the checklist inside the product spec and require sign-off before release.

If we build fast with Koder.ai, what changes for accountability?

Speed doesn’t remove responsibility. If you build with a chat-driven tool like Koder.ai, keep the same discipline:

Use planning mode to write purpose, limits, and “do not use” zones up front.
Test the generated feature with edge and abuse prompts (prompt injection, sensitive data, conflicting requirements).
Make rollback real: rely on snapshots/versioning and a fast disable switch.
Assign one owner to keep docs current as prompts, models, or policies change.

Fast iteration is fine as long as you can still explain what you shipped and how you’ll respond when it breaks.