Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published May 18, 2021 | Version 2021.05.18.15
Software Open

RTIInternational/rota: 2021.05.18.15

Description

ROTA: Rapid Offense Text Autocoder

Criminal justice research often requires conversion of free-text offense descriptions into overall charge categories to aid analysis. For example, the free-text offense of "eluding a police vehicle" would be coded to a charge category of "Obstruction - Law Enforcement". Since free-text offense descriptions aren't standardized and often need to be categorized in large volumes, this can result in a manual and time intensive process for researchers. ROTA is a machine learning model for converting offense text into offense codes.

Currently ROTA predicts the Charge Category of a given offense text. A charge category is one of the headings for offense codes in the 2009 NCRP Codebook: Appendix F.

The model was trained on publicly available data from a crosswalk containing offenses from all 50 states combined with three additional hand-labeled offense text datasets.

The input text is standardized through a series of preprocessing steps. The text is first passed through a sequence of 500+ case-insensitive regular expressions that identify common misspellings and abbreviations and expand the text to a more full, correct English text. Some data-specific prefixes and suffixes are then removed from the text -- e.g. some states included a statute as a part of the text. Finally, punctuation (excluding dollar signs) are removed from the input, multiple spaces between words are removed, and the text is lowercased.

Files

RTIInternational/rota-2021.05.18.15.zip

Files (561.5 kB)

Name Size Download all
md5:5c890d2683c5188d4aabf2df7976b0f6
561.5 kB Preview Download

Additional details