Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.

There is a newer version of the record available.

Published November 21, 2015 | Version 1.76
Software Open

corpkit: New interrogation options

  • 1. Gitter

Description

This release is designed to reflect a change from purpose-built interrogator() functions to the search and show arguments, which are much more powerful. Users can construct a dict object with one or more dependency criteria to match, and elect to match all criteria or any criterion with searchmode = 'any'/'all'.

criteria = {'lemma': ['think', 'feel', 'want'], 'pos': r'^V', 'function': 'root'} r = interrogator(corpus, search = criteria, show = ['word'], searchmode = 'all') list(r.results.columns)[:5]

might return:

['think', 'thinking', 'want', 'wants', 'feel']

Passing in a longer list for the show argument will set what is given in the output, as well as its order:

r = interrogator(corpus, search = criteria, show = ['f', 'p', 'l'], searchmode = 'all') list(r.results.columns)[:3]

will produce column names with concatenated function, pos and lemma:

['root/vbp/think', 'root/vbg/thinking', 'root/vb/want']

Another improvement is the exclude argument, which takes the place of blacklist, function_filter and pos_filter. Alongside excludemode = 'any'/'all', it operates just like search, allowing the user to exclude results matching one or more criteria:

r = interrogator(corpus, search = criteria, show = ['f', 'p', 'l'], searchmode = 'all', exclude = {p: r'^V', 'word': r'ing$'}, excludemode = 'all')

would remove any verbal token ending in ing. Changing excludemode to 'any' would remove all verbs and all words ending in ing.

The release has various other bugfixes, code cleanup, and some miscellaneous bits and pieces, such as a function for turning results into Pandas Multi Index DataFrames. Full API documentation is forthcoming.

Files

corpkit-1.76.zip

Files (64.1 MB)

Name Size Download all
md5:fad88dc6c3595a64cd6879076c761634
64.1 MB Preview Download

Additional details

Related works