CLiC for developers¶
CLiC Concordance¶
CLiC Concordance based on cheshire3 indexes.
-
class
concordance.
Concordance
[source]¶ This concordance takes terms, index names, book selections, and search type as input values and returns json with the search term, ten words to the left and ten to the right, and location information.
This can be used in an ajax api.
-
build_and_run_query
(terms, idxName, Materials, selectWords)[source]¶ Builds a cheshire query and runs it.
Its output is a tuple of which the first element is a resultset and the second element is number of search terms in the query.
-
create_concordance
(terms, idxName, Materials, selectWords)[source]¶ main concordance method create a list of lists containing each three contexts left - node -right, and a list within those contexts containing each word. Add two separate lists containing metadata information: [ [left context - word 1, word 2, etc.], [node - word 1, word 2, etc], [right context - word 1, etc], [chapter metadata], [book metadata] ], etc.
-
CLiC Clusters¶
Tool to create wordlists based on the entries in an index.
-
class
clusters.
Clusters
[source]¶ Class that does all the heavy weighting. It makes the connection with cheshire3, uses the input parameters (indexname and subcorpus/Materials) to return a list of words and their total number of occurrences.
For instance,
the 98021
to 78465
...
- or
he said 8937
she said 6732
...
CLiC Keywords¶
Module to compute keywords (words that are used significantly more frequently in one corpus than they are in a reference corpus).
CLiC Chapter Repository¶
Display the texts available in the cheshire3 database. Also highlight specific items that were previously retrieved with a concordance.
-
class
chapter_repository.
ChapterRepository
[source]¶ Responsible for providing access to chapter resources within Cheshire.
-
get_book_title
(book)[source]¶ Gets the title of a book from the json file booklist.json
book – string - the book id/accronym e.g. BH
-
get_chapter
(chapter_number, book)[source]¶ Returns transformed XML for given chapter & book
chapter_number – integer book – string - the book id/accronym e.g. BH
-
get_chapter_with_highlighted_search_term
(chapter_number, book, wid, search_term)[source]¶ Returns transformed XML for given chapter & book with the search highlighted.
We create the transformer directly so that we can pass extra parameters to it at runtime. In this case the search term.
chapter_number – integer book – string - the book id/accronym e.g. BH wid – integer - word index search_term – string - term to highlight
-
CLiC KWICgrouper¶
A module to look for patterns in concordances.
-
class
kwicgrouper.
Concordance
(term, text, word_boundaries=True, length=50, keep_punctuation=True, keep_line_breaks=False)[source]¶ This is a simple concordance for a text file. The input text should a string that is cleaned, for instance:
text.replace(“”, ” ”).replace(” ”, ” ”)
This function has two argument: the search term and the text to be searched.
The length should be an integer
-
class
kwicgrouper.
KWICgrouper
(concordance)[source]¶ This starts from a concordance and transforms it into a pandas dataframe (here called textframe) that has five words to the left and right of the search term in separate columns. These columns can then be searched for and sorted.
Input: A nested list of lists looking like:
- [
- [‘sessed of that very useful appendage a ‘, ‘voice’, ‘ for a much longer space of time than t’ ],
...
Each pattern needs its own instantiation of the KWICgrouper object because the self.textframe variable is changed in the filter method.
-
args_to_dict
(L5=None, L4=None, L3=None, L2=None, L1=None, R1=None, R2=None, R3=None, R4=None, R5=None)[source]¶ Helper function to use L1=”a” type of functions
-
conc_to_df
()[source]¶ Turns a list of dictionaries with L1-R5 values into a dataframe which can be used as a kwicgrouper.
-
kwicgrouper.
clean_punkt
(text)[source]¶ Delete punktuation from a text.
Problem: turns CAN’T into CA NT
-
kwicgrouper.
clean_text
(text)[source]¶ - Clean a text so that it can be used in a concordance. This includes:
- all text to lowercase
- deleting line-breaks
- tokenizing and detokenizing
CLiC Normalizer¶
Defines normalizers that can be used in the cheshire3 indexing workflow.
CLiC Query Builder¶
Future module to handle the construction of cheshire3 CQL queries.
CLiC Web app¶
Index¶
This is the most important file for the web app. It contains the various routes that end users can use.
For instance
@app.route(‘/about/’, methods=[‘GET’]) def about():
return render_template(“info/about.html”)
Where /about/ is the link.
API¶
This file is an extension of index.py. It generates the raw json API that the keywords, cluster, and concordances use(d).
It needs to be refactored.
Models¶
models.py defines the SQL tables that CLiC uses. These classes provide a python interface to the SQL database so that you can write python code that automatically queries the database.
This is heavily dependent on Flask-SQLAlchmey and SQLAlchemy