Legal Document Management System

The objective of DATABASE Legal Document Management System (LDMS) is to develop computerised tools for the analysis of case law and legal texts in order to:

  1. allow immediate access to automatically annotated and semantically enriched case law precedents, creating innovative case law archives capable of identifying the types of sentences within the decision and placing them in the context of reference in order to favour the analysis and creation of the correct legal arguments and to identify the determining factors in decisions
  2. Create user interfaces suitable for consultation by the various professionals involved and stakeholders
  3. To generalise the results obtained ideally on any legal topic.


To achieve the above objectives, LDMS is divided into the following tasks:

  1. Exploratory Data Analysis (EDA)
  2. Extract, Transform, Load (ETL)
  3. Data Lake
  4. Data Labeling UI
  5. Legal Semantic Search Engine


Exploratory Data Analysis

Exploratory data analysis (EDA) is used to analyze and investigate data sets and summarize their key characteristics. It uses data visualization methods to discover patterns, spot anomalies (missing values, outliers, duplications), test a hypothesis, or check assumptions. It provides a better understanding of the variables in the datasets analyzing national case law distributions/trends and offers an effective tool for legal comparisons.


Extract, Transform, Load

ETL automatically converts PDF files and legal document metadata into the flexible LS-JSON data format. LS-json can capture all aspects of a legal document and is used to add entity labels and relationships on different portions of text.
In malformed text, machine learning models have difficulty capturing context. The text preprocessing phase of the ETL module will apply a variety of techniques to convert raw text data into clean data and standardized sequences, which can improve the performance of predictive methods. The results of the predictive models trained on clean data will be compared to those obtained from the raw data in order to assess the impact of the cleaning process on model quality.


Data Lake

The Data Lake is based on a flexible data format and schema-free NoSql storage with data versioning and ML reproducibility. It can handle large datasets and track ML experiments, model parameters, metrics and artifacts. thanks to the integrated tracking system, the Data Lake guarantees the scientific integrity of the experiments that can be reproduced for validation and future improvements of the models.


Data Labeling

Supervised learning methods assume that the annotations provided for model training are correct.  We introduce a hierarchical scheme of annotators and a user interface tool that guides experts through the annotation process to obtain high quality dataset. The team of domain exporters who will label the judgments to be used for algorithm training is composed of law students led by specialized attorneys in the different subjects of jurisprudence who will evaluate the qualities of the annotation based on taking into account the inter-label correlation and their professional experience. An efficient text annotation tool will be used to quickly annotate and normalize entities meeting quality requirements.


Legal Semantic Search Engine

To date, legal databases are queried by users using keywords. The innovation that Legal Semantic Search Engine is aiming at is radical: transforming common case law collections into 1) automatically semantically annotated databases and 2) queryable not only by keywords but also by types of phrases and semantic similarity 3) in the context of the individual decision and 4) providing a “value” on the quality of the detected phrase. Such results are not only an innovative aid for all legal practitioners but also for policy makers. Little effort has been devoted to exploiting case law data for regulatory policy development to inform potential reforms. Legal Semantic Search Engine is important because it develops these needed tools for research in legal documents to better inform policy makers as well.

Through the analysis of Judgments contained on our legal database, we wondered about the possibility of isolate different topics even within a …

Another important aspect of the Exploratory Data Analysis (EDA) phase conducted on text data is the statistics, which help to understand the …

The exploration of our legal database lead us to consider a specific analysis, gender based, on Judge’s composition. How many of the …

Legal data comes in different formats (and may be structured, semistructured, or unstructured) are too heterogeneous to be directly usable by scientists, …