Legal AI Tool Benchmarks
Objective Performance Scores
Below is the latest test data, obtained through fair and objective testing involving an analysis of 65 simulated documents across a broad variety of document types.
GenieAI vs Claude
A rigorous 15-metric evaluation of AI-generated legal risk assessments across 65 source documents in a simulated scenario.
Overall Scores
Across 15 legal quality metrics, each scored 1-10
Largest Performance Gaps
GenieAI
Produces a structured legal risk assessment grounded in cited sources and cross-document analysis. Risks are evaluated in relation to each other rather than in isolation, enabling identification of secondary and compounding issues (e.g. counterparty, regulatory, and structural risk). Primary weakness is presentation density and slightly weaker recommendation structuring.
Detailed legal assessmentClaude
Produces a polished, executive-readable report with good structure and actionable recommendations. However, figures are sometimes inaccurate or unverifiable, critical evidence (whistleblower, counterparty insolvency) is entirely missing, and risks are treated as isolated rather than interconnected. One area where Claude outperforms: actionability of recommendations.
High-level briefing onlyMethod
Scenario
Task
Prompt
Expected Key Points
- Board authorized 3 strategic partnerships for European expansion
- NexGen: solid-state battery supply, EUR 2.5B+ annual commitment by 2028
- AutonomX: autonomous driving for EU market, EUR 250M+ total investment
- NordischEM: contract manufacturing, 100,000+ vehicles/year capacity
- Key risks: single-source dependency, quality issues, regulatory compliance
- Board considering QuantumFlux acquisition to reduce NexGen dependency
- Type Approval issues could impact EUR 189M-567M in revenue
- Strategic objective: 20M vehicles annually by 2030 (Master Plan Part 3)