How to Start an AI Pilot Properly
Most AI pilots fail before they have a chance to prove anything. The failure is rarely technical. It is structural: the wrong scope, no baseline measurement, no executive authority to remove blockers, and staff who were not involved in the design and do not trust the output. Eight weeks and the right structure prevents all of those failures.
Phase 1: Design (Weeks 1–2)
Get Executive Sponsorship First
An AI pilot without an executive sponsor is a project waiting to stall. When a blocker appears — budget approval, a scope dispute, a systems access issue — someone needs the authority to resolve it in days, not months. That person must be a C-suite or SVP-level owner who meets with the pilot team at least monthly, has budget authority, and will report results to the board.
Without this, the pilot will lose momentum when things get hard. And things always get hard.
Define One Specific Use Case
Good pilot use cases are narrow and well-defined: a single process, a clear input and output, a measurable outcome. Bad pilots try to explore what AI can do across a broad domain. You do not learn from exploration — you learn from a specific question with a specific answer.
A well-defined pilot: “Automatically categorise incoming customer support tickets as urgent, high, medium, or low to reduce manual sorting time from two hours to fifteen minutes daily.” Everything else is out of scope for this pilot.
Measure the Baseline Before Any AI Is Involved
You cannot prove ROI without a baseline. Before touching any AI tool, document the current state of the process you are improving: volume per day or week, time per item, error rate, cost per transaction, and where the bottlenecks actually are. This is your before measurement. Without it, the after measurement means nothing.
Phase 2: Execute (Weeks 3–6)
Run in Shadow Mode First
Do not replace the human process on day one. Run the AI in parallel for the first two weeks. Staff perform the process as normal. The AI also processes the same inputs. Compare the outputs side by side.
This does three things: it measures real accuracy on your actual data before anything is at risk, it builds staff confidence as they see the AI perform accurately rather than just being told it is accurate, and it surfaces edge cases and failure modes before they affect live work.
Ramp Gradually
Once shadow mode validates accuracy, increase AI-handled volume in stages: 20% in week three, 50% in week four, 80% in week five, 100% in week six with a manual fallback still available. Monitor accuracy and gather staff feedback at each stage. Do not accelerate the ramp if accuracy is below target or staff confidence is low.
Track Daily
A daily dashboard does not need to be sophisticated. A shared spreadsheet updated each morning showing volume processed, AI accuracy, manual overrides, time saved, and any error patterns is enough. The point is that you see trends as they develop, not after they have become problems.
Phase 3: Evaluate (Weeks 7–8)
Compare the post-pilot state to your baseline on every metric you measured at the start. Calculate actual ROI: time saved per week multiplied by staff cost, minus the tool cost per year. Get staff to complete a short satisfaction survey covering confidence in the AI, what still needs improvement, and whether they would recommend extending it.
Go / No-Go Decision
The criteria for a Go decision: accuracy at or above 80%, measurable time or cost savings, staff confidence at 7 out of 10 or higher, no unresolved governance concerns, and the sponsor wants to scale. If any of those criteria are not met, the right answer is either to fix the specific problem and rerun the pilot, or to wind it down and apply the learning to a different use case.
Eight weeks is enough. You will know whether AI works for this use case in your organisation within that window. A pilot that drags on for six months is not a pilot — it is an unmanaged project.
Common Pilot Pitfalls
| Pitfall | Impact | Prevention |
|---|---|---|
| No baseline measurement | Cannot prove ROI | Measure current state before AI |
| Multiple use cases at once | Diluted resources, no clear learning | One process per pilot |
| No executive sponsor | Pilot stalls when blocked | Confirm sponsor commitment before starting |
| End users not involved | Low adoption and trust | 20% of pilot team are operational staff |
| No fallback procedure | One failure breaks everything | Manual backup always ready |
| Wrong success metrics | Cannot demonstrate value | Define metrics before pilot, not during |
Running an AI pilot this quarter?
Simon Steggles provides AI pilot design, governance, and advisory support for UK organisations.
