18th December 2024
3 min

Available Legal AI Datasets

Note: This article is just one of 60+ sections from our full report titled: The 2024 Legal AI Retrospective - Key Lessons from the Past Year. Please download the full report to check any citations.

Available datasets

Contract Understanding Atticus Dataset (CUAD) is a corpus of 13,000+ labels in 510 commercial legal contracts that have been manually labeled under the supervision of experienced lawyers to identify 41 types of legal clauses that are considered important in contract review.

The contracts are collected from the Electronic Data Gathering, Analysis, and Retrieval ("EDGAR") system, which is maintained by the U.S. Securities and Exchange Commission (SEC) (https://www.sec.gov/search-filings).

ContractNLI is a dataset for document-level natural language inference (NLI) on contracts, containing 607 (NDAs). Despite containing more contracts than the CUAD dataset, these are considerably shorter and the whole contract corpus of this dataset is shorter. Moreover, it doesn't contain any other contract type other than NDA. Having more extensive knowledge of the context for this data would enhance the performance of models fine-tuned on them.

Alex Denne
Advisor
Alex Denne, Head of Growth (Open Source Law) at Genie AI, is a legal tech leader and serial founder with over a decade of experience driving innovation and making legal services more accessible. Since joining in 2021, he has scaled the platform from 200 to over 120,000 users, combining deep contract law expertise with a data-driven, open-source approach. He is passionate about democratizing legal knowledge through AI, backed by strong academic credentials and experience leading major product and innovation initiatives.
Alex Denne, Head of Growth (Open Source Law) at Genie AI, is a legal tech leader and serial founder with over a decade of experience driving innovation and making legal services more accessible. Since joining in 2021, he has scaled the platform from 200 to over 120,000 users, combining deep contract law expertise with a data-driven, open-source approach. He is passionate about democratizing legal knowledge through AI, backed by strong academic credentials and experience leading major product and innovation initiatives.

Interested in joining our team?Explore career opportunities with us and be a part of the future of Legal AI.

Jump to