Neel Guha, Stanford University
Julian Nyarko, Stanford Law School
Daniel E. Ho, Stanford Law School
Christopher Ré, Stanford University
Adam Chilton, University of Chicago Law School
Aditya Narayana, Maxime Tools
Alex Chohlas-Wood, Stanford University - Department of Management Science & Engineering
Austin Peters, Stanford University
Brandon Waldon, Stanford University
Daniel Rockmore, Dartmouth College
Diego A. Zambrano, Stanford University
Dmitry Talisman, Maxime Tools
Enam Hoque, LawBeta
Faiz Surani, University of California, Santa Barbara
Frank Fagan, South Texas College of Law Houston
Galit Sarfaty, University of Toronto, Faculty of Law
Gregory M. Dickinson, St. Thomas University, School of Law
Haggai Porat, Harvard Law School
Jason Hegland, Stanford University
Jessica Wu, Stanford University
Joe Nudell, Stanford University
Joel Niklaus, University of Bern
John Nay, Stanford University
Jonathan H. Choi, University of Minnesota Law School
Kevin Tobia, Georgetown University Law Center
Margaret Hagan, Stanford Law School
Megan Ma, Stanford University
Michael A. Livermore, University of Virginia School of Law
Nikon Rasumov-Rahe, Maxime Tools
Nils Holzenberger, Institut Polytechnique de Paris
Noam Kolt, University of Toronto
Peter Henderson, Stanford University
Sean Rehaag, Osgoode Hall Law School of York UniversityFollow
Sharad Goel, Harvard University
Shang Gao, Casetext
Spencer Williams, Golden Gate University School of Law
Sunny Gandhi, Indiana University Bloomington
Tom Zur, Harvard Law School
Varun Iyer
Zehua Li, Stanford University

Author ORCID Identifier

Sean Rehaag: 0000-0002-4432-9217

Document Type


Publication Date



legal practice, law and technology, large language models, artificial intelligence, empirical legal methods, machine learning


The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning—which distinguish between its many forms—correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables.