Model training

Eight years training legal AI - without ever training on your contracts.

Built and enhanced since 2017 by ML masters from Oxford, UCL, and Imperial. Improved with a proprietary 1M+ legal knowledge base, in-context learning, and a panel of qualified legal advisors. Lawyer-rated at 92% accuracy and outperforms Claude in review tasks by 140%.

Your contracts never become anyone else's training data.

The four principles

What sits behind every Genie answer

We never train on your contracts

Customer documents are never used to train Genie. Your contracts stay yours. Genie improves independently through a proprietary legal knowledge base, in-context retrieval, and lawyer-rated benchmarking. Never by ingesting customer data.

300K+ contracts. 10M+ clause revisions.

A proprietary legal corpus built since 2017: 300,000+ contracts, 10 million+ clause revisions, plus open-source templates, legislation, and case law spanning 150 legal jurisdictions. Filtered into every request so the model has the right legal context before it generates anything.

In-context learning over fine-tuning

Fine-tuned models can sit on outdated foundation models and need expensive retraining as the landscape moves. We feed legal context at inference time instead so the foundation model stays current and the legal layer keeps up with the law.

Lawyer-rated accuracy

Every release is benchmarked by qualified lawyers. The latest data shows Genie outperforms Claude and ChatGPT substantially when it comes to complex legal tasks. Read the latest published benchmark here.

How it works

The five layers of Genie's legal model.

01

A proprietary knowledge base, built since 2017.

Eight years of compounding legal data: 300,000+ contracts, 10 million+ clause revisions, plus templates, legislation, and case law spanning the US and UK legal systems. The corpus has grown through real legal work, including early pilots with Clifford Chance, Pinsent Masons, and Withers from 2018, and continuous additions from the 200,000+ businesses that use Genie today.
02

Research-grade ML at the foundation.

Genie's founders, Rafie Faruq and Nitish Mutha, hold masters in machine learning and were taught by researchers at Google DeepMind. Genie's research approach was developed in partnership with Oxford University, including a paper published at top NLP conference EMNLP in 2019.
03

In-context learning.

Genie's architecture passes the right legal context at inference time, so the legal layer keeps up with the law and we can swap in newer foundation models as they land.
04

Lawyer review, not just engineering review.

Training data, prompt design, and retrieval logic are reviewed by qualified lawyers. They check what the model is exposed to, how it reasons, and where it should defer. The result: legal judgement is built into the system before benchmarking starts.
05

Continuous improvement. Customer-data boundary intact.

Genie improves independently. Every release is measured against a curated benchmark of legal tasks (drafting, review, redlining, comparison) and we fold in new legislation and case law as it lands. The customer-data boundary stays explicit at every layer.

Frequently asked

Common questions about how Genie trains its AI.

Does GenieAI train on customer contracts?

No. Customer documents are never used to train Genie. Genie improves independently - through a proprietary legal corpus, in-context retrieval over up-to-date legal sources, and feedback from a panel of qualified lawyers who benchmark every release.

Does GenieAI use Large Language Models?

Yes. Genie pairs best-in-class foundation models with a proprietary legal knowledge base, in-context retrieval, and lawyer-validated guardrails. Genie's patent-pending Eidetic Intelligence architecture is what differentiates the legal layer from a generic LLM.

How is accuracy measured?

Every release is benchmarked by qualified lawyers across a curated set of legal tasks (drafting, review, redlining, comparison). In the most recent published study - 65 documents, lawyer-graded - Genie achieved 90% accuracy, compared to 79.3% for Claude Cowork and 37.3% for ChatGPT. Genie's agents outperform GPT and Claude by 140% on aggregate.

Why prefer in-context learning over fine-tuning?

Fine-tuning bakes a snapshot of the legal world into a model that's hard to update - and the legal world keeps moving. In-context learning passes fresh, jurisdiction-correct legal context at inference time, so the model stays current as legislation and case law evolve, and we can swap in newer foundation models without retraining.

Where does the legal knowledge base come from?

It has been built since 2017. The corpus includes 300,000+ contracts, 10 million+ clause revisions, plus open-source templates, legislation, and case law spanning the US and UK legal systems. Early pilots with Clifford Chance, Pinsent Masons, and Withers (from 2018) seeded the corpus, and 200,000+ businesses now contribute new legal context through normal use.

Who reviews the model's legal output?

Qualified lawyers. They contribute training data, set prompt-engineering standards, supervise the in-context retrieval logic, and rate every release against the panel benchmark before it ships.

How does GenieAI keep customer data secure?

Genie is ISO 27001 certified, with all customer data protected by 256-bit encryption at rest and in transit. Customer contracts are never used to train models that other customers see. The full security posture, sub-processor list, and DPA can be found in our Security Centre.

Who founded GenieAI and what is their background?

Genie was founded in July 2017 by Rafie Faruq and Nitish Mutha, who both hold masters in machine learning and were taught by researchers at Google DeepMind. Genie was the first generative-AI company in the legal space, and Genie's research was published at EMNLP - a top NLP conference - in 2019, in partnership with Oxford University.

Eight years training legal AI - without ever training on your contracts.

We never train on your contracts

300K+ contracts. 10M+ clause revisions.

In-context learning over fine-tuning

Lawyer-rated accuracy

A proprietary knowledge base, built since 2017.

Research-grade ML at the foundation.

In-context learning.

Lawyer review, not just engineering review.

Continuous improvement. Customer-data boundary intact.

Want to dig further?