Skip to main content
AI-Si.com

Executive Resources · for UK SME leaders

How to Start an AI Pilot Properly

Most AI pilots fail before they have a chance to prove anything, and the failure is rarely technical. It is structural: wrong scope, no baseline measurement, no executive authority to remove blockers, and staff who were not involved in the design and do not trust the output. An eight-week pilot in three phases — design, execute, evaluate — with one executive sponsor, one specific use case, a measured baseline, shadow-mode validation, a graduated ramp and explicit go/no-go criteria, prevents all of those failures.

Why most AI pilots fail before they begin

The post-mortem on a failed AI pilot almost never lands on the model. It lands on scope that was too broad, a measurement baseline that was never taken, a sponsor who was not senior enough to clear blockers, and an end-user team that found out about the project the week before go-live.

Eight weeks of structure prevents all of that. The design phase commits the organisation to a single use case with a measured baseline and a real executive sponsor. The execute phase de-risks the AI by running it in shadow mode before it touches live work. The evaluate phase produces a defensible go or no-go decision, not a vibes-based extension into month four.

Phase 1: Design (weeks 1–2) — get executive sponsorship first

An AI pilot without an executive sponsor is a project waiting to stall. When a blocker appears — a budget approval, a scope dispute, a systems-access issue — someone needs the authority to resolve it in days, not months. That person must be a C-suite or SVP-level owner who meets with the pilot team at least monthly, has budget authority and will report results to the board.

Without this, the pilot will lose momentum the moment things get hard. And things always get hard.

Phase 1: Define one specific use case and measure the baseline

Good pilot use cases are narrow and well-defined: a single process, a clear input and output, a measurable outcome. Bad pilots try to explore what AI can do across a broad domain. You do not learn from exploration — you learn from a specific question with a specific answer.

A well-defined pilot reads like this: 'Automatically categorise incoming customer support tickets as urgent, high, medium or low to reduce manual sorting time from two hours to fifteen minutes daily.' Everything else is out of scope.

Before touching any AI tool, document the current state of the process: volume per day or week, time per item, error rate, cost per transaction and where the bottlenecks actually are. That is your before measurement. Without it, the after measurement means nothing.

Phase 2: Execute (weeks 3–6) — run in shadow mode first

Do not replace the human process on day one. Run the AI in parallel for the first two weeks. Staff perform the process as normal. The AI also processes the same inputs. Compare the outputs side by side.

  • It measures real accuracy on your actual data before anything is at risk.
  • It builds staff confidence as they see the AI perform accurately rather than just being told it is accurate.
  • It surfaces edge cases and failure modes before they affect live work.

Phase 2: Ramp gradually and track daily

Once shadow mode validates accuracy, increase AI-handled volume in stages: 20% in week three, 50% in week four, 80% in week five and 100% in week six with a manual fallback still available. Monitor accuracy and gather staff feedback at each stage. Do not accelerate the ramp if accuracy is below target or staff confidence is low.

A daily dashboard does not need to be sophisticated. A shared spreadsheet updated each morning showing volume processed, AI accuracy, manual overrides, time saved and any error patterns is enough. The point is that you see trends as they develop, not after they have become problems.

Phase 3: Evaluate (weeks 7–8) and run the go/no-go

Compare the post-pilot state to your baseline on every metric you measured at the start. Calculate actual ROI: time saved per week multiplied by staff cost, minus the tool cost per year. Get staff to complete a short satisfaction survey covering confidence in the AI, what still needs improvement and whether they would recommend extending it.

The criteria for a Go decision are explicit: accuracy at or above 80%, measurable time or cost savings, staff confidence at 7 out of 10 or higher, no unresolved governance concerns, and the sponsor wants to scale. If any of those criteria are not met, the right answer is either to fix the specific problem and rerun the pilot, or to wind it down and apply the learning to a different use case.

Eight weeks is enough. A pilot that drags on for six months is not a pilot — it is an unmanaged project.

Common pilot pitfalls

Six failure patterns recur across the pilots that do not deliver. Each has a single, cheap prevention. None of the preventions are clever; all of them are skipped under time pressure.

PitfallImpactPrevention
No baseline measurementCannot prove ROIMeasure current state before AI
Multiple use cases at onceDiluted resources, no clear learningOne process per pilot
No executive sponsorPilot stalls when blockedConfirm sponsor commitment before starting
End users not involvedLow adoption and trust20% of pilot team are operational staff
No fallback procedureOne failure breaks everythingManual backup always ready
Wrong success metricsCannot demonstrate valueDefine metrics before pilot, not during

Take the next step

Want help applying this to your organisation? Use the resource below or book a 30 minute strategy call with Simon — no pitch, just practical advice.

Frequently asked questions

Find Out Where AI Can Save or Generate Money in Your Organisation

Book a free 30-minute call with Simon. Bring a real problem — staff time, governance worry, vendor proposal, failing pilot — and leave with a concrete first step you can take next week.

07973 210 895
Call