RTIInternational/rota: 2021.05.18.15

Peter Baumgartner; Emily Hadley; Anna Godwin

doi:10.5281/zenodo.4770493

Published May 18, 2021 | Version 2021.05.18.15

Software Open

RTIInternational/rota: 2021.05.18.15

ROTA: Rapid Offense Text Autocoder

Criminal justice research often requires conversion of free-text offense descriptions into overall charge categories to aid analysis. For example, the free-text offense of "eluding a police vehicle" would be coded to a charge category of "Obstruction - Law Enforcement". Since free-text offense descriptions aren't standardized and often need to be categorized in large volumes, this can result in a manual and time intensive process for researchers. ROTA is a machine learning model for converting offense text into offense codes.

Currently ROTA predicts the Charge Category of a given offense text. A charge category is one of the headings for offense codes in the 2009 NCRP Codebook: Appendix F.

The model was trained on publicly available data from a crosswalk containing offenses from all 50 states combined with three additional hand-labeled offense text datasets.

The input text is standardized through a series of preprocessing steps. The text is first passed through a sequence of 500+ case-insensitive regular expressions that identify common misspellings and abbreviations and expand the text to a more full, correct English text. Some data-specific prefixes and suffixes are then removed from the text -- e.g. some states included a statute as a part of the text. Finally, punctuation (excluding dollar signs) are removed from the input, multiple spaces between words are removed, and the text is lowercased.

Files

RTIInternational/rota-2021.05.18.15.zip

Files (561.5 kB)

Name	Size	Download all
RTIInternational/rota-2021.05.18.15.zip md5:5c890d2683c5188d4aabf2df7976b0f6	561.5 kB	Preview Download

Additional details

Is supplement to: https://github.com/RTIInternational/rota/tree/2021.05.18.15 (URL)

	All versions	This version
Views	175	172
Downloads	8	8
Data volume	4.5 MB	4.5 MB

RTIInternational/rota: 2021.05.18.15

Creators

Description

Files

RTIInternational/rota-2021.05.18.15.zip

Files (561.5 kB)

Additional details

Related works