AI Tool Evaluation Criteria
A structured scoring framework for evaluating AI tools before procurement. Use this to compare options consistently, identify risks, and justify your selection to stakeholders.
THE REAL COST OF A WRONG CHOICE
Choosing the wrong AI tool costs more than money. It costs time, trust, and momentum.
The average UK organisation wastes 4-6 months and £15,000-£50,000 on an AI tool that doesn’t deliver before they realise the mistake. This framework stops that from happening.
Not sure where to start? Our fractional AI director service includes independent tool selection — no vendor ties, no hidden incentives.
1. Evaluation Methodology
This framework provides a structured scoring approach for comparing AI tools. Score each tool against the criteria below on a 1–5 scale. Multiply each score by the weighting factor. Sum the weighted scores to produce a total evaluation score.
No AI tool should be procured or approved for use without completing this evaluation. For tools that will process personal data, a Data Protection Impact Assessment (DPIA) must also be completed before approval.
Score 1
Does not meet requirement. Significant gap with no credible roadmap to resolution.
Score 3
Partially meets requirement. Some gaps but manageable with compensating controls.
Score 5
Fully meets or exceeds requirement. No gaps. Market-leading capability in this area.
2. Evaluation Scoring Framework
| Criterion | What to Assess | Weight | Score (1–5) |
|---|---|---|---|
| CAPABILITY (30% of total) | |||
| Core Functionality | Does the tool do what you need it to do, reliably and accurately? | 10% | ____ |
| Output Quality | Is the quality of AI outputs sufficient for your use case without extensive human correction? | 10% | ____ |
| Customisation | Can the tool be configured, fine-tuned, or prompted to align with your specific context, terminology, and requirements? | 5% | ____ |
| Reliability & Uptime | What is the vendor’s documented uptime SLA? Is there evidence of production reliability at scale? | 5% | ____ |
| SECURITY (25% of total) | |||
| Data Encryption | Is data encrypted in transit and at rest? What encryption standards are used? | 8% | ____ |
| Data Residency | Where is data stored and processed? Is UK/EEA data residency guaranteed? Are there standard contractual clauses for non-UK transfers? | 8% | ____ |
| Access Controls | Is role-based access control available? Is SSO/MFA supported? Can audit logs be exported? | 5% | ____ |
| Security Certifications | Does the vendor hold ISO 27001, SOC 2 Type II, Cyber Essentials, or equivalent certifications? | 4% | ____ |
| COMPLIANCE (20% of total) | |||
| UK GDPR Compliance | Is the vendor compliant with UK GDPR? Is a Data Processing Agreement available? Has a DPIA been completed? | 8% | ____ |
| Training Data Transparency | Can the vendor confirm what data was used to train the model? Is there a risk of the model reproducing proprietary or personal data? | 6% | ____ |
| Output Ownership | Does the vendor claim any ownership or usage rights over content you create using the tool? Is IP ownership clearly assigned to you? | 6% | ____ |
| INTEGRATION (15% of total) | |||
| API Availability | Is a well-documented API available for integration with your existing systems? | 6% | ____ |
| Existing System Compatibility | Does the tool integrate with your current Microsoft 365, Google Workspace, CRM, or ERP environment? | 6% | ____ |
| Implementation Complexity | What level of technical resource is required to deploy and maintain the integration? | 3% | ____ |
| COMMERCIAL (10% of total) | |||
| Total Cost of Ownership | What is the full cost including licences, implementation, training, and ongoing support over 3 years? | 5% | ____ |
| Vendor Viability | Is the vendor financially stable? How long have they been trading? What is their client retention rate? | 3% | ____ |
| Exit Provisions | Can you export your data? Are there reasonable contract termination provisions? What is the data deletion commitment on exit? | 2% | ____ |
3. Non-Negotiable Requirements
Regardless of overall evaluation score, a tool must meet all of the following mandatory requirements before it can be approved for use. Any single failure is a disqualifying condition:
Automatic Disqualifiers
- No Data Processing Agreement available (if processing personal data)
- Data is processed or stored in countries without adequate data protection
- Vendor claims ownership of outputs created using the tool
- No documented security certifications or completed security questionnaire
- Tool uses your data to train or improve public AI models without explicit opt-out
Minimum Acceptable Standards
- Overall weighted score of 3.0 or above
- Security section score of 3.5 or above
- Compliance section score of 3.5 or above
- Reference customers in your sector available for verification
- Vendor can provide a completed security questionnaire within 10 working days
4. Vendor Due Diligence Checklist
| Document / Action Required | Responsible | Completed |
|---|---|---|
| Complete evaluation scoring framework above | IT / Operations | [ ] |
| Request and review vendor security questionnaire | IT / Information Security | [ ] |
| Review vendor Data Processing Agreement (DPA) | DPO / Legal | [ ] |
| Complete Data Protection Impact Assessment (if processing personal data) | DPO | [ ] |
| Obtain and verify security certifications (ISO 27001, SOC 2, etc.) | IT | [ ] |
| Check data residency and international transfer provisions | DPO | [ ] |
| Review training data usage and opt-out provisions | IT / Legal | [ ] |
| Verify IP ownership terms for AI-generated outputs | Legal | [ ] |
| Obtain two sector-relevant reference contacts | Operations | [ ] |
| Complete total cost of ownership modelling (3-year) | Finance | [ ] |
| Obtain approval from AI Steering Committee | Operations Director | [ ] |
How Popular AI Tools Score Against This Framework
To illustrate how to use the evaluation criteria, here is how three widely-used AI tools scored against our framework when assessed for a typical UK professional services organisation. Scores are indicative — your specific context will affect results.
ChatGPT (OpenAI)
Best for: General productivity
Microsoft Copilot 365
Best for: MS365 environments
Claude (Anthropic)
Best for: Complex analysis
These scores are indicative and context-dependent. Your organisation’s existing infrastructure, compliance requirements, and use case will significantly affect the correct choice. Need independent evaluation? Book a free call.
Related resources: Once you’ve selected a tool, you’ll need governance to deploy it safely. See our AI Governance & Risk framework and AI Director services for implementation support.
FREE DOWNLOAD
AI Tool Scoring Spreadsheet
A pre-built spreadsheet for evaluating AI tools against the 6 criteria in this framework. Weight criteria by importance, score 2-5 tools side-by-side, and generate a recommendation with justification for board sign-off.
- +Pre-loaded with security, GDPR, and contract red-flag scoring
- +Worked example with 3 popular AI tools included
- +Auto-calculates weighted scores and recommendation
- +Includes governance and services internal links for context
Get the scoring spreadsheet:
UK GDPR compliant. No spam. Unsubscribe at any time.
Need help evaluating tools for your organisation?
Book a Free CallNeed Help Evaluating AI Tools?
AI-Si provides independent AI vendor evaluation as part of our fractional AI director services — helping you select the right tools without vendor bias.
BOOK YOUR FREE AI STRATEGY DISCUSSION NOW