Implementation

How to Start an AI Pilot Properly

Implementation10 min readMarch 2026

Most AI pilots fail before they have a chance to prove anything. The failure is rarely technical. It is structural: the wrong scope, no baseline measurement, no executive authority to remove blockers, and staff who were not involved in the design and do not trust the output. Eight weeks and the right structure prevents all of those failures.

Phase 1: Design (Weeks 1–2)

Get Executive Sponsorship First

An AI pilot without an executive sponsor is a project waiting to stall. When a blocker appears — budget approval, a scope dispute, a systems access issue — someone needs the authority to resolve it in days, not months. That person must be a C-suite or SVP-level owner who meets with the pilot team at least monthly, has budget authority, and will report results to the board.

Without this, the pilot will lose momentum when things get hard. And things always get hard.

Define One Specific Use Case

Good pilot use cases are narrow and well-defined: a single process, a clear input and output, a measurable outcome. Bad pilots try to explore what AI can do across a broad domain. You do not learn from exploration — you learn from a specific question with a specific answer.

A well-defined pilot: “Automatically categorise incoming customer support tickets as urgent, high, medium, or low to reduce manual sorting time from two hours to fifteen minutes daily.” Everything else is out of scope for this pilot.

Measure the Baseline Before Any AI Is Involved

You cannot prove ROI without a baseline. Before touching any AI tool, document the current state of the process you are improving: volume per day or week, time per item, error rate, cost per transaction, and where the bottlenecks actually are. This is your before measurement. Without it, the after measurement means nothing.

Phase 2: Execute (Weeks 3–6)

Run in Shadow Mode First

Do not replace the human process on day one. Run the AI in parallel for the first two weeks. Staff perform the process as normal. The AI also processes the same inputs. Compare the outputs side by side.

This does three things: it measures real accuracy on your actual data before anything is at risk, it builds staff confidence as they see the AI perform accurately rather than just being told it is accurate, and it surfaces edge cases and failure modes before they affect live work.

Ramp Gradually

Once shadow mode validates accuracy, increase AI-handled volume in stages: 20% in week three, 50% in week four, 80% in week five, 100% in week six with a manual fallback still available. Monitor accuracy and gather staff feedback at each stage. Do not accelerate the ramp if accuracy is below target or staff confidence is low.

Track Daily

A daily dashboard does not need to be sophisticated. A shared spreadsheet updated each morning showing volume processed, AI accuracy, manual overrides, time saved, and any error patterns is enough. The point is that you see trends as they develop, not after they have become problems.

Phase 3: Evaluate (Weeks 7–8)

Compare the post-pilot state to your baseline on every metric you measured at the start. Calculate actual ROI: time saved per week multiplied by staff cost, minus the tool cost per year. Get staff to complete a short satisfaction survey covering confidence in the AI, what still needs improvement, and whether they would recommend extending it.

Go / No-Go Decision

The criteria for a Go decision: accuracy at or above 80%, measurable time or cost savings, staff confidence at 7 out of 10 or higher, no unresolved governance concerns, and the sponsor wants to scale. If any of those criteria are not met, the right answer is either to fix the specific problem and rerun the pilot, or to wind it down and apply the learning to a different use case.

Eight weeks is enough. You will know whether AI works for this use case in your organisation within that window. A pilot that drags on for six months is not a pilot — it is an unmanaged project.

Common Pilot Pitfalls

Pitfall	Impact	Prevention
No baseline measurement	Cannot prove ROI	Measure current state before AI
Multiple use cases at once	Diluted resources, no clear learning	One process per pilot
No executive sponsor	Pilot stalls when blocked	Confirm sponsor commitment before starting
End users not involved	Low adoption and trust	20% of pilot team are operational staff
No fallback procedure	One failure breaks everything	Manual backup always ready
Wrong success metrics	Cannot demonstrate value	Define metrics before pilot, not during

Running an AI pilot this quarter?

Simon Steggles provides AI pilot design, governance, and advisory support for UK organisations.

Book a conversation