Legal Systems and Artificial Intelligence (CBR project)

Project team

  • Project leader: Simon Deakin (CBR), Mihoko Sumida (Hitotsubashi University, Tokyo)
  • Co-Investigators: Jennifer Cobbe, Jon Crowcroft, Jat Singh (Computer Laboratory, University of Cambridge); Felix Steffek (Faculty of Law, University of Cambridge); Christopher Markou, Linda Shuku, Helena Xie (CBR); Yuishi Washida, Kazuhiko Yamamoto, Keisuke Takeshita, Mikiharu Noma, Wataru Uehara (Hitotsubashi University); Nanami Furue (Tokyo University of Science); Motoyuki Matsunaga (Institute for International Scio-Economic Studies, Tokyo)
  • Researchers: Bhumika Billa, Anca Cojocaru, Narine Lalafaryan, Chris Pang, Joana Ribeiro De Faria, Holli Sargeant, Lucy Thomas

Project status


Project dates



ESRC and Japanese Science and Technology Agency



The aim of this project is to assess the implications of the introduction of Artificial Intelligence (AI) into legal systems in Japan and the United Kingdom. The project is jointly funded by the UK’s Economic and Social Research Council, part of UKRI, and the Japanese Society and Technology Agency (JST), and involves collaboration between Cambridge University (the CBR, Computer Laboratory and Faculty of Law) and Hitotsubashi University, Tokyo (the Graduate Schools of Law and Business Administration).

The use of machine learning (ML) to replicate aspects of legal decision making is already well advanced. A number of ‘Legal Tech’ applications have been developed by law firms and commercial suppliers and are being used, among other things, to model litigation risk. Data analytics are informing decisions on legally consequential matters including probation, predictive policing and credit evaluation. The next step will be to use ML to replicate core functions of legal systems, including adjudication. At the same time there are already signs of push-back against the use of ML in the legal sphere. Critics point to the biases in current algorithmic decision making processes which systematically disadvantage the poor and minority groups. Concerns over the constitutionality of automating judicial processes prompted the passage Art. 33 of French Law 2019-222, which bars the use of personally identifiable data of judges and other court officials with a view to ‘evaluating, analyzing, comparing or predicting their professional performance, real or supposed’

Aims and objectives

In this context there is an urgent need for informed debate over the uses of AI in the legal sphere. The project will advance this debate by:

  1. exploring stakeholders’ perceptions of the acceptability of AI-related technologies in the legal domain
  2. identifying and addressing legal and ethical risks associated with algorithmic decision making
  3. understanding the potential of, and limits to, the computational techniques underlying law-related AI.  


The project is organised through 3 work packages which will deploy, respectively, the methods of Horizon Scanning (WP1), and machine learning, deep learning, natural language processing, and computational linguistics (WPs 2 and 3).

WP1: Constructing Future Scenarios for the Uses of AI in Law: A Horizon Scanning Approach

Project leaders: Washida, Sumida, Deakin

The Horizon Scanning Method was developed principally by the Stanford Research Institute in the late 1960s. The method avoids the assumption that the future will tend to deviate from a linear extension of current circumstancesm, and attempts instead to develop more realistic predictions of the future by focusing on the collection and analysis of information that does not lie on the path of this linear extension. In implementing the Horizon Scanning approach we will firstly produce a database containing a range of information sources on the uses of AI in law, drawn from press reports and commentary and secondary academic literatures. The database will be used as the basis for discussion at a series of workshops. We will invite experts, researchers, corporate professionals and users across a broad range of fields of activity and different age ranges to take part in the workshops. Emergent scenarios will describe different possible combinations of advantages and risks stemming from the use of AI.

WP2: Computation of Complex Knowledge Systems: Law and Accounting

Project leaders: Deakin, Markou, Crowcroft, Singh, Cobbe, Shuku, Noma

This WP will consider whether the juridical reasoning underpinning employment status decisions can be visually represented using historical data from decided cases and if the outcomes of cases can be accurately predicted using a decision-tree comprised of nodes corresponding to relevant legal indicators. We will use Deep Learning and NLP to analyse legal decisions for latent or hidden variables that can help inform and refine the model. We will then explore how far the same techniques can be applied to the digitisation of knowledge systems used in accounting.

WP3 Predicting the Outcome of Dispute Resolution: Feasibility, Factors and Ethical Implications

Project leaders: Steffek, Xie, Yamamoto

This WP deals with the prediction of dispute outcomes and generally aims to advance understanding of the use of artificial intelligence in case outcome predictions. Analysis will be carried out on a large data set of English court cases. The dataset will be used to test different ML approaches to predicting dispute outcomes. The possibility of carrying out a parallel study using Japanese court data will be explored. In addition this WP will develop ethical guidelines for regulating Artificial Intelligence in dispute resolution. The development of the guidelines will be supported by roundtable meetings with the partners the UK Ministry of Justice, the OECD Department on Access to Justice, leading representatives of the UK judiciary and LawTech firms.


The project began in January 2020 and a planning meeting and workshop was held in Cambridge in early March, with the participation of the Japanese team. Shortly afterwards lockdowns were initiated in both Cambridge and Tokyo and work on the project was formally paused for a 3-month period. Research was resumed in the summer of 2020. Progress has since been made with respect to each of the WPs.

In WP1, the collection of abstracts for use in the Horizon Scanning Method began in August 2020. A horizon scanning workshop, originally planned to take place in Cambridge in December 2020, was postponed because of COVID-19. The workshop was rescheduled to take place once COVID-related restrictions on travel had come to an end, and was successfully completed in March 2023, with the joint participation of the Cambridge and Hitotsubashi teams from WP1.

The workshop took as its theme the impact of artificial intelligence (‘AI’) on the future of work. Just over 30 experts ranging from human resources (HR) professionals and lawyers to trade unionists and academics took part. The participants were divided into 3 break-out groups, each of which brainstormed future scenarios based on a dataset of summaries of around 100 media opinions disseminated constructed by the Cambridge project team before the workshop. The dataset included news, blog posts, and op-eds published in English across the world in the last 4 years and were curated by google search for recent writing on AI and work. Initial results from the deliberations have been published in the form of a blog written by Bhumika Billa and Simon Deakin, and further analysis of the workshop findings, applying the horizon scanning methodology, will be published in the course of 2023 and 2024.

In WP2 progress has been made in developing the conceptual framework for the work, and has resulted in a series of publications including an edited collection, ‘Is Law Computable? Critical Reflections on Law and Artificial Intelligence’, which was published by Hart/Bloomsbury in November 2020, and papers published in the Journal of Cross-Disciplinary Research in Computational Law and the Northern Ireland Legal Quarterly. In addition, substantial progress has been made on constructing a dataset of historical employment cases which is being used to test hypotheses concerning the long-run dynamics of legal change and the coevolution of law with social and economic development.

In WP3 work has been carried out on the dataset of English cases and the possibility of creating similar datasets of Japanese cases has been explored with relevant stakeholders. Progress has also been made in developing the ML and NLP methods which will be used to analyse the judicial data. The dataset of English court cases was complete by the summer of 2023 and first results from it will be published by the autumn.

Both WP2 and WP3 organised multiple meetings between the British and Japanese sides, via Zoom, to coordinate progress and ensure continuing cooperation notwithstanding the impossibility of meeting in person during the COVID-19 emergency. The final conference of the project has been scheduled to take place in Tokyo in December 2023.

GDPR notice

The Cambridge Law Corpus: A corpus for legal AI research

The Cambridge Law Corpus (CLC) is a dataset of more than 250,000 court cases for legal AI research. The CLC has been developed as part of the Legal Systems and Artificial Intelligence research project funded by UKRI (UK Research and Innovation) and JST (Japan Science and Technology Agency). Most cases in the CLC are from the 21st century, but the corpus includes cases as old as the 16th century. The CLC only contains decisions of UK courts and tribunals (together referred to as courts in this notice) that have been made available by the relevant courts for publication. All decisions in the CLC have already been published before either by the courts themselves or by other information providers.

In the UK, court cases are not anonymised. Parties, judges, barristers and other persons involved in court proceedings should, therefore, expect to be named in judgments because courts uphold the principle of open justice, promote the rule of law and ensure public confidence in the legal system. However, courts will anonymise a party if the non-disclosure is necessary to secure the proper administration of justice and to protect the interests of that party. The CLC contains the texts of decisions as they were made available by the courts themselves. As a result, the CLC contains the names of persons involved in court decisions and other personal data as reported by the official court decision.

The CLC is the basis of research experiments conducted as part of this project. They include the identification of the sentences in judgments that contain the outcome of the case, the determination of topics that the courts deal with and the prediction of case outcomes.

Ethical approval has been granted by the Research Ethics Committee of the Centre of Business Research at the University of Cambridge.

GDPR requirements: research exemptions

Compliance with the Data Protection Act 2018 (DPA) and UK General Data Protection Regulation (GDPR) is the basis of the legality of the CLC and its use for research. The personal data in this corpus was not collected directly from data subjects and was only undertaken for research purposes in the public interest. Both these circumstances offer exemptions from obligations in the GDPR.

Given the practically impossible and disproportionate task of informing all individuals mentioned in this corpus and that these cases are publicly available and being processed exclusively for research purposes, the CLC is exempt from notification requirements. Further, research in the public interest is privileged as regards secondary processing and processing of sensitive categories of data restrictions. In particular, this aids the protection of the integrity of research datasets.


We apply safeguards in compliance with legal and ethical requirements and ensure that:

  • Appropriate technical and organisational safeguards exist to protect personal data.
  • Processing will not result in measures being taken in respect of individuals and no automated decision-making takes place.
  • There is no likelihood of substantial damage or distress to individuals from the processing.
  • Users who access the corpus must agree to comply with the DPA and the GDPR in addition to any local jurisdiction.
  • Any individual may request the removal of a case or certain information which will be immediately removed.
  • The corpus will not pose any risks to people’s rights, freedoms or legitimate interests.
  • Access to the corpus will be restricted to researchers based at a university or other research institution whose Faculty Dean (or equivalent authority) confirms, inter alia, that ethical clearance for the research envisaged has been received.
  • Researchers using the corpus must agree to not undertake research that identifies natural persons, legal persons or similar entities. They must also guarantee they will remove any data if requested to do so.

Further information

Further information on the legality and ethical administration of the CLC can be found in the related research paper “The Cambridge Law Corpus: a dataset for legal AI research”.

Further information on the terms and conditions that researchers applying for access to the CLC have to comply with are available on the Department of Computer Science and Technology’s CLC project page. The project page also contains information and a link for those applying for the removal of a case from the CLC.



Deakin, S. and Markou, C. (2021) “Evolutionary law and economics: theory and method.” Northern Ireland Legal Quarterly, 72: 682-712

Deakin, S. and Markou, C. (2022) “Evolutionary interpretation: law and machine learning.” Journal of Cross-Disciplinary Research in Computational Law, 1(2)


Deakin, S. and Shuku, L. (2023) English poor law cases dataset. In progress, expected completion autumn 2023.

Deakin, S. and Shuku, L. (2023) English workmen’s compensation cases dataset. In progress, expected completion autumn 2023.

Östling, A., Sargeant, H., Xie, H., Bull, L., Terenin, A., Jonsson, L., Magnusson, M., and Steffek, F. (2023) Cambridge Law Corpus dataset (to be published later in 2023).