All Papers

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

Author ORCID Identifier

Document Type

Article

Publication Date

9-26-2023

Keywords

legal practice, law and technology, large language models, artificial intelligence, empirical legal methods, machine learning

Abstract

The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning—which distinguish between its many forms—correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables.

Repository Citation

Guha, Neel; Nyarko, Julian; Ho, Daniel E.; Ré, Christopher; Chilton, Adam; Narayana, Aditya; Chohlas-Wood, Alex; Peters, Austin; Waldon, Brandon; Rockmore, Daniel; Zambrano, Diego A.; Talisman, Dmitry; Hoque, Enam; Surani, Faiz; Fagan, Frank; Sarfaty, Galit; Dickinson, Gregory M.; Porat, Haggai; Hegland, Jason; Wu, Jessica; Nudell, Joe; Niklaus, Joel; Nay, John; Choi, Jonathan H.; Tobia, Kevin; Hagan, Margaret; Ma, Megan; Livermore, Michael A.; Rasumov-Rahe, Nikon; Holzenberger, Nils; Kolt, Noam; Henderson, Peter; Rehaag, Sean; Goel, Sharad; Gao, Shang; Williams, Spencer; Gandhi, Sunny; Zur, Tom; Iyer, Varun; and Li, Zehua, "LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models" (2023). All Papers. 380.
https://digitalcommons.osgoode.yorku.ca/all_papers/380

Download

Request an accessible version of this file (available only for documents already in this repository). Link opens in a new window

Included in

Legal Profession Commons, Legal Writing and Research Commons, Science and Technology Law Commons

COinS

All Papers

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

Author ORCID Identifier

Document Type

Publication Date

Keywords

Abstract

Repository Citation

Included in

Search

Links

Browse

Author Corner

All Papers

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

Authors

Author ORCID Identifier

Document Type

Publication Date

Keywords

Abstract

Repository Citation

Included in

Share

Search

Links

Browse

Author Corner