Published April 4, 2018 | Version 1.0.1
Software Open

Spiral: splitters for identifiers in source code files

  • 1. California Institute of Technology

Description

Spiral is a Python module that provides several different functions for splitting identifiers found in source code files. Identifier splitting (also known as identifier name tokenization) is the task of breaking apart program identifier strings such as getInt or readUTF8stream into component tokens: [get, int] and [read, utf8, stream]. The need for splitting identifiers arises in a variety of contexts, including natural language processing (NLP) methods applied to source code analysis and program comprehension. Spiral provides some basic naive splitting algorithms, such as a straightforward camel-case splitter, as well as more elaborate heuristic splitters, such as an algorithm we call Ronin.

The name Spiral is a loose acronym based on "SPlitters for IdentifieRs: A Library".

Notes

This material is based upon work supported by the National Science Foundation under Grant Number 1533792 (Principal Investigator: Michael Hucka). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Files

spiral-1.0.1.zip

Files (38.5 MB)

Name Size Download all
md5:cc85d6816ddf5e6cf082794ef03997a3
19.2 MB Download
md5:22a9956e1df11ebae36cf63822953c52
19.3 MB Preview Download

Additional details

Funding

EAGER: Cataloging Software Using a Semantic-Based Approach for Software Discovery and Characterization 1533792
National Science Foundation