Detailed Instructions

Please see here for detailed instructions.


TOC Instructions

  1. Register a user account and verify registration
  2. Navigate to the “TOC Jobs” page after logging in as a user
  3. Click “Add New Job”
  4. Click “Choose File” and upload relevant file
    • At the moment, only *.CSV files are accepted
    • First row of the file contains column/variable name(s)
      • The main input variable should not be labelled as “desc” since this is a temporary variable name created during TOC preprocessing
    • File size cannot exceed 100 MB
    • Support for additional input file types will be implemented in the future
  5. After uploading the file, click “Add Job to Processing Queue”
  6. In the next page, wait for file validation to run and use the “Select Column with Data to Process:” dropdown menu to choose the variable name of charge descriptions
  7. Click “Set Data Column” to add job to the processing queue and click “Review Jobs” to go back to the “TOC Jobs” page
  8. Once the job status changes from “Not Completed” to “Completed” in the “TOC Jobs” page, an email notification with a link to the “TOC Jobs” page will be sent to the registered email address. In the “TOC Jobs” page, click the “Results” button to download the file

Classification results will be available for 30 calendar days after the date of submission. After the 30 days, the classification results will no longer be available in your user account.


Input Data Specification

  • As a text classification tool, offense descriptions used as inputs for TOC require English vocabulary to an extent
  • Examples of acceptable inputs:
    • “Capital murder”
    • “35 42 1 1 2 murder”
    • “poss of methamph”
    • “Operating a motor vehicle expired regis less than 6 mos”
    • “DUI w/ one prior and breath alcohol of 15 or greater”
  • Types of input data to omit in the data:
    • Case/Citation/Ticket/Warrant number
    • Statute number
      • Statutes can be included in the data if it also contains the description of the statute (e.g. “2C:35-10” vs “2C:35-10 CDS/Possession”). However, the result may not be as reliable.
    • Non-criminal offenses
      • TOC is primarily intended for classifying criminal offenses. As such, predicted charge codes for descriptions related to immigration law or civil cases will not be reliable and thus omitted from user submission.

Sample Data

Valid


Description
Capital murdr
Operate mv expired regis
2C:35-10 CDS/Possession
public order offense
Single-column data with column name in the first row. Subsequent rows contain text descriptions
StatuteDescription
35 42 1 1 2Murder
2C:35-10CDS/Possession
35 43 10 3 3legend drug deception
162205bail jump i
Multi-column data with column names in the first row. The column that contains the text description of the offense should be selected as the main input variable

Invalid


?
Capital murdr
Operate mv expired regis
2C:35-10 CDS/Possession
public order offense
Single-column data without column name can cause errors
Statute Case_Number
35 42 1 1 2 CF00001
2C:35-10 CF00002
35 43 10 3 3 CF00003
162205 CF00004
Multi-column data without text descriptions. Although TOC can process such data, the results are unreliable due to lack of English vocabulary