undefinedundefined

AI Tool Evaluation Criteria

A structured scoring framework for evaluating AI tools before procurement. Use this to compare options consistently, identify risks, and justify your selection to stakeholders.

THE REAL COST OF A WRONG CHOICE

Choosing the wrong AI tool costs more than money. It costs time, trust, and momentum.

The average UK organisation wastes 4-6 months and £15,000-£50,000 on an AI tool that doesn’t deliver before they realise the mistake. This framework stops that from happening.

Not sure where to start? Our fractional AI director service includes independent tool selection — no vendor ties, no hidden incentives.

Document Type: Evaluation Framework
Version: 2.0
Issued By: AI-Si Consultancy
Last Reviewed: February 2026
Use Case: AI Tool Selection & Procurement

1. Evaluation Methodology

This framework provides a structured scoring approach for comparing AI tools. Score each tool against the criteria below on a 1–5 scale. Multiply each score by the weighting factor. Sum the weighted scores to produce a total evaluation score.

No AI tool should be procured or approved for use without completing this evaluation. For tools that will process personal data, a Data Protection Impact Assessment (DPIA) must also be completed before approval.

Score 1

Does not meet requirement. Significant gap with no credible roadmap to resolution.

Score 3

Partially meets requirement. Some gaps but manageable with compensating controls.

Score 5

Fully meets or exceeds requirement. No gaps. Market-leading capability in this area.

2. Evaluation Scoring Framework

CriterionWhat to AssessWeightScore (1–5)
CAPABILITY (30% of total)
Core FunctionalityDoes the tool do what you need it to do, reliably and accurately?10%____
Output QualityIs the quality of AI outputs sufficient for your use case without extensive human correction?10%____
CustomisationCan the tool be configured, fine-tuned, or prompted to align with your specific context, terminology, and requirements?5%____
Reliability & UptimeWhat is the vendor’s documented uptime SLA? Is there evidence of production reliability at scale?5%____
SECURITY (25% of total)
Data EncryptionIs data encrypted in transit and at rest? What encryption standards are used?8%____
Data ResidencyWhere is data stored and processed? Is UK/EEA data residency guaranteed? Are there standard contractual clauses for non-UK transfers?8%____
Access ControlsIs role-based access control available? Is SSO/MFA supported? Can audit logs be exported?5%____
Security CertificationsDoes the vendor hold ISO 27001, SOC 2 Type II, Cyber Essentials, or equivalent certifications?4%____
COMPLIANCE (20% of total)
UK GDPR ComplianceIs the vendor compliant with UK GDPR? Is a Data Processing Agreement available? Has a DPIA been completed?8%____
Training Data TransparencyCan the vendor confirm what data was used to train the model? Is there a risk of the model reproducing proprietary or personal data?6%____
Output OwnershipDoes the vendor claim any ownership or usage rights over content you create using the tool? Is IP ownership clearly assigned to you?6%____
INTEGRATION (15% of total)
API AvailabilityIs a well-documented API available for integration with your existing systems?6%____
Existing System CompatibilityDoes the tool integrate with your current Microsoft 365, Google Workspace, CRM, or ERP environment?6%____
Implementation ComplexityWhat level of technical resource is required to deploy and maintain the integration?3%____
COMMERCIAL (10% of total)
Total Cost of OwnershipWhat is the full cost including licences, implementation, training, and ongoing support over 3 years?5%____
Vendor ViabilityIs the vendor financially stable? How long have they been trading? What is their client retention rate?3%____
Exit ProvisionsCan you export your data? Are there reasonable contract termination provisions? What is the data deletion commitment on exit?2%____

3. Non-Negotiable Requirements

Regardless of overall evaluation score, a tool must meet all of the following mandatory requirements before it can be approved for use. Any single failure is a disqualifying condition:

Automatic Disqualifiers

  • No Data Processing Agreement available (if processing personal data)
  • Data is processed or stored in countries without adequate data protection
  • Vendor claims ownership of outputs created using the tool
  • No documented security certifications or completed security questionnaire
  • Tool uses your data to train or improve public AI models without explicit opt-out

Minimum Acceptable Standards

  • Overall weighted score of 3.0 or above
  • Security section score of 3.5 or above
  • Compliance section score of 3.5 or above
  • Reference customers in your sector available for verification
  • Vendor can provide a completed security questionnaire within 10 working days

4. Vendor Due Diligence Checklist

Document / Action RequiredResponsibleCompleted
Complete evaluation scoring framework aboveIT / Operations[ ]
Request and review vendor security questionnaireIT / Information Security[ ]
Review vendor Data Processing Agreement (DPA)DPO / Legal[ ]
Complete Data Protection Impact Assessment (if processing personal data)DPO[ ]
Obtain and verify security certifications (ISO 27001, SOC 2, etc.)IT[ ]
Check data residency and international transfer provisionsDPO[ ]
Review training data usage and opt-out provisionsIT / Legal[ ]
Verify IP ownership terms for AI-generated outputsLegal[ ]
Obtain two sector-relevant reference contactsOperations[ ]
Complete total cost of ownership modelling (3-year)Finance[ ]
Obtain approval from AI Steering CommitteeOperations Director[ ]

How Popular AI Tools Score Against This Framework

To illustrate how to use the evaluation criteria, here is how three widely-used AI tools scored against our framework when assessed for a typical UK professional services organisation. Scores are indicative — your specific context will affect results.

ChatGPT (OpenAI)

Security & GDPR6/10
Data Residency (UK)4/10
Integration Ease8/10
Capability9/10
Vendor Stability8/10
Overall: 7/10
Best for: General productivity

Microsoft Copilot 365

Security & GDPR9/10
Data Residency (UK)9/10
Integration Ease9/10
Capability7/10
Vendor Stability9/10
Overall: 8.6/10
Best for: MS365 environments

Claude (Anthropic)

Security & GDPR8/10
Data Residency (UK)6/10
Integration Ease7/10
Capability9/10
Vendor Stability8/10
Overall: 7.6/10
Best for: Complex analysis

These scores are indicative and context-dependent. Your organisation’s existing infrastructure, compliance requirements, and use case will significantly affect the correct choice. Need independent evaluation? Book a free call.

Related resources: Once you’ve selected a tool, you’ll need governance to deploy it safely. See our AI Governance & Risk framework and AI Director services for implementation support.

FREE DOWNLOAD

AI Tool Scoring Spreadsheet

A pre-built spreadsheet for evaluating AI tools against the 6 criteria in this framework. Weight criteria by importance, score 2-5 tools side-by-side, and generate a recommendation with justification for board sign-off.

  • +Pre-loaded with security, GDPR, and contract red-flag scoring
  • +Worked example with 3 popular AI tools included
  • +Auto-calculates weighted scores and recommendation
  • +Includes governance and services internal links for context

Get the scoring spreadsheet:

UK GDPR compliant. No spam. Unsubscribe at any time.

Need help evaluating tools for your organisation?

Book a Free Call

Need Help Evaluating AI Tools?

AI-Si provides independent AI vendor evaluation as part of our fractional AI director services — helping you select the right tools without vendor bias.

BOOK YOUR FREE AI STRATEGY DISCUSSION NOW
undefined
Scroll to Top