RTIInternational/rota: 2021.05.18.15
Creators
Description
ROTA: Rapid Offense Text Autocoder
Criminal justice research often requires conversion of free-text offense descriptions into overall charge categories to aid analysis. For example, the free-text offense of "eluding a police vehicle" would be coded to a charge category of "Obstruction - Law Enforcement". Since free-text offense descriptions aren't standardized and often need to be categorized in large volumes, this can result in a manual and time intensive process for researchers. ROTA is a machine learning model for converting offense text into offense codes.
Currently ROTA predicts the Charge Category of a given offense text. A charge category is one of the headings for offense codes in the 2009 NCRP Codebook: Appendix F.
The model was trained on publicly available data from a crosswalk containing offenses from all 50 states combined with three additional hand-labeled offense text datasets.
The input text is standardized through a series of preprocessing steps. The text is first passed through a sequence of 500+ case-insensitive regular expressions that identify common misspellings and abbreviations and expand the text to a more full, correct English text. Some data-specific prefixes and suffixes are then removed from the text -- e.g. some states included a statute as a part of the text. Finally, punctuation (excluding dollar signs) are removed from the input, multiple spaces between words are removed, and the text is lowercased.
Files
RTIInternational/rota-2021.05.18.15.zip
Files
(561.5 kB)
Name | Size | Download all |
---|---|---|
md5:5c890d2683c5188d4aabf2df7976b0f6
|
561.5 kB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/RTIInternational/rota/tree/2021.05.18.15 (URL)