Text-based Offense Classification (TOC) is a result of a collaboration between Measures for Justice (MFJ) and the University of Michigan’s Criminal Justice Administrative Records System (CJARS) program, each using court records to assess the performance and impact of criminal case processes. We used hundreds of thousands of hand-coded records produced by MFJ and the machine learning expertise of the CJARS team, following an offense scheme adapted by the team from prior work of the NCRP and others. Together, we designed a uniform crime classification standard (UCCS) schema and an accompanying machine learning algorithm to predict the appropriate standard charge classification. TOC will make the painstaking task of hand-coding offense descriptions obsolete and usher in a new era of leveraging machine-learning assisted techniques.

Publications and research reports based on the outputs from TOC should be cited as follows:

Choi, J., Kilmer, D., Mueller-Smith, M., & Taheri, S. (2023) Hierarchical Approaches to Text-based Offense Classification. Science Advances. 9(9), 1-15.


The Criminal Justice Administrative Records System (CJARS) is a cutting-edge data platform seeking to fundamentally transform research and statistical reporting on the U.S. criminal justice system. The ultimate aim is to improve public administration through next generation evidence-based policy making. CJARS is the first integrated national data repository that follows individual offenses from arrest to charge, from conviction to sanction. Data comes from different types of agencies and numerous jurisdictions, and is harmonized into a common schema at the University of Michigan. In partnership with the U.S. Census Bureau, CJARS data are linked at the person-level to confidential social, economic, and demographic data held by the federal government to produce novel empirical analysis of criminal justice caseloads and policy outcomes. CJARS was founded in 2016 and has received funding from the National Science Foundation, the Bill and Melinda Gates Foundation, and the Laura and John Arnold Foundation.


Measures for Justice Institute (MFJ) is a non-partisan non-profit with a mission to make reliable data available to spur dialogue and reform. MFJ has developed a set of data-driven performance metrics to assess the criminal justice process from arrest to post-conviction on a county-by-county basis. The organization collects, standardizes, and makes publicly available, criminal justice data at the county-level across the United Sates. These data are point in time collections, shared with MFJ under data use agreements between MFJ and the county or state level agency, and in two instances between MFJ and a third-party which compiled case information from online docket records. The data collected makes possible the free online Data Portal, which populates metrics for data collected from jurisdictions across twenty states, spanning more than 1,000 county justice systems, and nine years. Additionally, the organization is making available near live data, at the individual county prosecutor’s office level, each month in a new platform, using the same methodology.

MFJ works with policymakers and criminal justice practitioners across the US to support robust criminal justice data collection, policy, and research. As part of this effort, MFJ also identifies gaps in data collection and infrastructure at the state level, informing statewide data legislation furthering the goal of improved data transparency. The group’s work has pointed to the importance of data transparency to be able to see trends and patterns across jurisdictions and over time, and aims to ground policy in reliable information.