Database retrieval and population methods

Python3 script for retrieving data from MCSQ database Before running the script, install requirements: pandas, numpy, SQLAlchemy, psycopg2 Author: Danielly Sorato Author contact: danielly.sorato@gmail.com

retrieve_from_tables.build_id_dicts_per_language(language)[source]

Gets all text segments and their IDs and builds a dictionary by item type.

Parameters

language (param1) – target language.

Returns

Four different dictionaries (one for each item type). The IDs are the keys and the text segments are the values.

retrieve_from_tables.create_tagged_text_dict(id_list)[source]

Gets the survey_itemid and the POS tagged text from the survey_item table and creates a dictionary.

Parameters

id_list (param1) – a language specific list of the target segment IDs in the alignment table.

Returns

A dictionary with target survey_itemids as keys and POS tagged text as values.

retrieve_from_tables.get_ids_from_alignment_table(survey_itemid)[source]

Gets all IDs (either source or target) from the alignment table.

Parameters

survey_itemid (param1) – name of the column indicating if the desired IDs to be retrived are from source or from target.

Returns

A list of survey_itemids.

retrieve_from_tables.get_ids_from_alignment_table_per_language(language)[source]

Gets all target IDs from the alignment table based on the language.

Parameters

language (param1) – target language.

Returns

A list of all target_survey_itemids in the alignment table.

retrieve_from_tables.get_instruction_id(text)[source]

Gets an instruction segment ID based on its text.

Parameters

text (param1) – the instruction segment text.

Returns

instruction segment ID (int).

retrieve_from_tables.get_introduction_id(text)[source]

Gets an introduction segment ID based on its text.

Parameters

text (param1) – the introduction segment text.

Returns

introduction segment ID (int).

retrieve_from_tables.get_module_id(module_name)[source]

Gets an module ID based on its name.

Parameters

module_name (param1) – the name of the module.

Returns

response module ID (int).

retrieve_from_tables.get_request_id(text)[source]

Gets an request segment ID based on its text.

Parameters

text (param1) – the request segment text.

Returns

request segment ID (int).

retrieve_from_tables.get_response_id(text, item_value)[source]

Gets an response segment ID based on its text.

Parameters

text (param1) – the response segment text.

Returns

response segment ID (int).

retrieve_from_tables.get_tagged_text_from_survey_item_table()[source]

Gets the survey_itemid and the POS tagged text from the survey_item table and creates a dictionary.

Returns

A dictionary with survey_itemids as keys and POS tagged text as values.

Python3 script for ESS dataset inclusion in the MCSQ database Before running the script, install requirements: pandas, numpy, SQLAlchemy, psycopg2 Author: Danielly Sorato Author contact: danielly.sorato@gmail.com

populate_tables.tag_alignment_table(dictionary, id_list, column_name, source_or_target_id)[source]

Inserts the POS alignment annotation either on the target or the source text column.

Parameters
  • dictionary (param1) – a dictionary where the keys are the survey_itemids and the values are the pos tagged text segments.

  • id_list (param2) – list of the IDs that refers to the text to be annotated.

  • column_name (param3) – defines if the column to be tagged is the source or the target

  • source_or_target_id (param4) – name of the ID (either target_survey_itemid or source_survey_itemid)

populate_tables.tag_item_type_table(dictionary, table_name, table_id_name)[source]

Inserts the POS alignment annotation in item type specific table.

Parameters
  • dictionary (param1) – a dictionary where the keys are the survey_itemids and the values are the pos tagged text segments.

  • table_name (param2) – name of the table to be tagged (introduction, instruction, request or response).

  • table_id_name (param3) – name of the ID of the table.

populate_tables.tag_survey_item(dictionary, table_id_name)[source]

Inserts the POS alignment annotation in survey_item table.

Parameters
  • dictionary (param1) – an item type specific dictionary where the keys are the IDs and the values are the pos tagged text segments.

  • table_id_name (param2) – name of the ID of item type specific the table.

populate_tables.tag_target_alignment_table(dictionary)[source]

Inserts the POS alignment annotation on the target text column.

Parameters

dictionary (param1) – a dictionary where the keys are the target_survey_itemids and the values are the pos tagged text segments.