Alignment based on item structure

alignment_based_on_item_structure.align_introduction_instruction_request(df, df_source, df_target, item_type)[source]

Aligns introduction, instruction and requests segments. Differently from response segments, these other item types can’t be merged. There are five distinct cases to consider: 1) only source segments (df_target is empty), 2) only target segments (df_source is empty), 3) df_source has more segments than df_target 4) df_target has more segments than df_source and, 5) df_source and df_target have the same number of segments.

Parameters
  • df (param1) – dataframe to store the questionnaire alignment.

  • df_source (param2) – dataframe containing the data of the source questionnaire (always English).

  • df_target (param3) – dataframe containing the data of the target questionnaire

  • item_type (param4) – metadata that indicates if the dataframes contain introductions, instructions or requests.

Returns

df (pandas dataframe) with newly aligned survey item segments.

alignment_based_on_item_structure.align_more_segments_in_source(list_source, list_target, sorted_aligments, source_segments_without_pair, df)[source]

Calls the appropriate method for alignment with more segments in source dataframe, concerning the number of pairless segments. This method is called from a broader sets of cases contained in prepare_alignment_with_more_segments_in_source. If there is only one pairless source segment, call treat_a_single_pairless_source(), otherwise call treat_multiple_pairless_source_segments()

Parameters
  • list_source (param1) – list of source segments (contains segments of same item_name and item_type)

  • list_target (param2) – list of target segments (contains segments of same item_name and item_type)

  • sorted_aligments (param3) – sorted list segments aligned via best match strategy.

  • source_segments_without_pair (param4) – indexes of pairless source segments

  • df (param5) – dataframe to store the questionnaire alignment

Returns

df (pandas dataframe) with newly aligned survey item segments.

alignment_based_on_item_structure.align_more_segments_in_target(list_source, list_target, sorted_aligments, target_segments_without_pair, df)[source]

Calls the appropriate method for alignment with more segments in target dataframe, concerning the number of pairless segments. This method is called from a broader sets of cases contained in prepare_alignment_with_more_segments_in_source. If there is only one pairless target segment, call treat_a_single_pairless_target(), otherwise call treat_multiple_pairless_target_segments()

Parameters
  • list_source (param1) – list of source segments (contains segments of same item_name and item_type)

  • list_target (param2) – list of target segments (contains segments of same item_name and item_type)

  • sorted_aligments (param3) – sorted list segments aligned via best match strategy.

  • target_segments_without_pair (param4) – indexes of pairless target segments

  • df (param5) – dataframe to store the questionnaire alignment

Returns

df (pandas dataframe) with newly aligned survey item segments.

alignment_based_on_item_structure.align_on_metadata(df, df_source, df_target, process_responses)[source]

Calls the appropriate method for alignment based on metadata. Responses are aligned separately of other item types because answers are merged using the item value.

Parameters
  • df (param1) – dataframe to store the questionnaire alignment

  • df_source (param2) – dataframe containing the data of the source questionnaire (always English).

  • df_target (param3) – dataframe containing the data of the target questionnaire

  • process_responses (param4) – indicates if the response segments should be processed, country-specific answers are excluded by design.

Returns

df (pandas dataframe) with newly aligned survey items.

alignment_based_on_item_structure.align_responses(df, df_source, df_target)[source]

Aligns response segments by merging them on item_value metadata. :param param1 df: dataframe to store the questionnaire alignment :type param1 df: pandas dataframe :param param2 df_source: dataframe containing the data of the source questionnaire (always English). :type param2 df_source: pandas dataframe :param param3 df_target: dataframe containing the data of the target questionnaire :type param3 df_target: pandas dataframe

Returns

df (pandas dataframe) with newly aligned response segments.

alignment_based_on_item_structure.filter_by_module(df_source, df_target, module)[source]

Filters the source and target dataframes by the module that is being currently analyzed.

Parameters
  • df_source (param1) – dataframe containing the data of the source questionnaire (always English).

  • df_target (param2) – dataframe containing the data of the target questionnaire

  • module (param3) – questionnaire module being currently analyzed in outer loop.

Returns

df_source (pandas dataframe) and param2 df_target (pandas dataframe). Source and target dataframes filtered by the module specified by parameter.

alignment_based_on_item_structure.find_best_match(list_source, list_target, item_type)[source]

Finds the best match for source and target segments (same item_type) based on the lenght of the segments.

Parameters
  • list_source (param1) – list of source segments (contains segments of same item_name and item_type)

  • list_target (param2) – list of target segments (contains segments of same item_name and item_type)

  • item_type (param3) – the item type of the segments. Can be introduction, instruction or request.

Returns

alignment (list). Alignment pair represented by the index of target and source segments being aligned (index 0 = target,index 1 = source), selected with lenght of the segments strategy.

alignment_based_on_item_structure.get_original_index(list_source, list_target, source_segment_index, target_segment_index, aux_source, aux_target)[source]

Gets the original index of aligned segments, as the auxiliary lists are being modified and the indexes does not correspond to the original ones.

Parameters
  • list_source (param1) – list of source segments (contains segments of same item_name and item_type).

  • list_target (param2) – list of target segments (contains segments of same item_name and item_type).

  • source_segment_index (param3) – source segment aligned in aux_source list.

  • target_segment_index (param4) – target segment aligned in aux_target list.

  • aux_source (param5) – auxiliary list of source segments being modified in outer loop (contains segments of same item_name and item_type).

  • aux_target (param6) – auxiliary list of target segments being modified in outer loop (contains segments of same item_name and item_type).

Returns

original_index_target (int), original_index_source (int). Original indexes (in list_source, list_target) of source/target segments aligned.

alignment_based_on_item_structure.get_study_metadata(filename)[source]

Get study metadata embedded in filename. It can be retrieved either from the source or the target file.

Parameters

filename (param1) – name of either source or target file.

Returns

study (string). Metadata that identifies the study of the questionnaires that are being aligned.

alignment_based_on_item_structure.get_target_language_country_metadata(filename)[source]

Get target language/country metadata embedded in filename, to name the output aligned file.

Parameters

filename (param1) – name of target file.

Returns

target_language_country (string). Metadata that identifies the language/country of the target questionnaire being aligned.

alignment_based_on_item_structure.identify_showc_segment(list_source, list_target, item_type)[source]

Searches in list_source, list_target if there are intructions that seem to be show card segments that should be aligned together. This method was implemented as an additional strategy to align correctly instruction segments.

Parameters
  • list_source (param1) – list of source segments (contains segments of same item_name and item_type)

  • list_target (param2) – list of target segments (contains segments of same item_name and item_type)

  • item_type (param3) – item_type metadata being analyzed. In this method we want to consider only instruction segments.

Returns

str indicating if instructions that follow the show card segments were found.

alignment_based_on_item_structure.only_one_segment_in_source_align(alignment, source_segment, target_segment, list_target, aux_target, df)[source]

Fills the dataframe with remaining target segments that do not have source correspondencies. This method is called when the dataframe contains one source segment to two or more target segments. The alignment pair is defined in the find_best_match() method and the remaining target segments are included in this method, respecting the structure order.

Parameters
  • alignment (param1) – Alignment pair represented by the index of target and source segments being aligned

  • 0 = target ((index) –

  • 1 = source) (index) –

  • with lenght of the segments strategy. (selected) –

  • source_segment (param2) – source segment that has a match according to find_best_match().

  • target_segment (param3) – target segment that has a match according to find_best_match().

  • list_target (param4) – list of target segments (contains segments of same item_name and item_type)

  • aux_target (param5) – list of source segments, excluding the target_segment (contains segments of same item_name and item_type)

  • df (param6) – dataframe to store the questionnaire alignment

Returns

df (pandas dataframe) with newly aligned survey item segments.

alignment_based_on_item_structure.only_one_segment_in_target_align(alignment, source_segment, target_segment, list_source, aux_source, df)[source]

Fills the dataframe with remaining source segments that do not have target correspondencies. This method is called when the dataframe contains one target segment to two or more source segments. The alignment pair is defined in the find_best_match() method and the remaining source segments are included in this method, respecting the structure order.

Parameters
  • alignment (param1) – Alignment pair represented by the index of target and source segments being aligned

  • 0 = target ((index) –

  • 1 = source) (index) –

  • with lenght of the segments strategy. (selected) –

  • source_segment (param2) – source segment that has a match according to find_best_match().

  • target_segment (param3) – target segment that has a match according to find_best_match().

  • list_source (param4) – list of source segments (contains segments of same item_name and item_type)

  • aux_source (param5) – list of source segments, excluding the source_segment (contains segments of same item_name and item_type)

  • df (param6) – dataframe to store the questionnaire alignment

Returns

df (pandas dataframe) with newly aligned survey item segments.

alignment_based_on_item_structure.prepare_alignment_with_more_segments_in_source(df, list_source, list_target, item_type)[source]

Calls the appropriate method for alignment with more segments in source dataframe, concerning the number of pairless segments

Parameters
  • df (param1) – dataframe to store the questionnaire alignment

  • list_source (param2) – list of source segments (contains segments of same item_name and item_type)

  • list_target (param3) – list of target segments (contains segments of same item_name and item_type)

  • item_type (param4) – item_type metadata, can be REQUEST, INTRODUCTION or INSTRUCTION

Returns

df (pandas dataframe) with newly aligned survey item segments.

alignment_based_on_item_structure.prepare_alignment_with_more_segments_in_target(df, list_source, list_target, item_type)[source]

Calls the appropriate method for alignment with more segments in target dataframe, concerning the number of pairless segments

Parameters
  • df (param1) – dataframe to store the questionnaire alignment

  • list_source (param2) – list of source segments (contains segments of same item_name and item_type)

  • list_target (param3) – list of target segments (contains segments of same item_name and item_type)

  • item_type (param4) – item_type metadata, can be REQUEST, INTRODUCTION or INSTRUCTION

Returns

df (pandas dataframe) with newly aligned survey item segments.

alignment_based_on_item_structure.preprocessing_alignment_candidates(text)[source]

Preprocesses the text segment by tokenizing it, removing punctuation. :param param1 text: the text segment to be preprocessed. :type param1 text: string

Returns

The preprocessed tokens (a list of strings).

alignment_based_on_item_structure.treat_a_single_pairless_source(list_source, list_target, sorted_aligments, source_segments_without_pair, df)[source]

Align source and target segments, in case where there is only one source segment without a pair.

Parameters
  • list_source (param1) – list of source segments (contains segments of same item_name and item_type)

  • list_target (param2) – list of target segments (contains segments of same item_name and item_type)

  • sorted_aligments (param3) – sorted list segments aligned via best match strategy.

  • source_segments_without_pair (param4) – index of pairless souce segment

  • df (param5) – dataframe to store the questionnaire alignment

Returns

df (pandas dataframe) with newly aligned survey item segments.

alignment_based_on_item_structure.treat_a_single_pairless_target(list_source, list_target, sorted_aligments, target_segments_without_pair, df)[source]

Align source and target segments, in case where there is only one target segment without a pair.

Parameters
  • list_source (param1) – list of source segments (contains segments of same item_name and item_type)

  • list_target (param2) – list of target segments (contains segments of same item_name and item_type)

  • sorted_aligments (param3) – sorted list segments aligned via best match strategy.

  • target_segments_without_pair (param4) – index of pairless target segment

  • df (param5) – dataframe to store the questionnaire alignment

Returns

df (pandas dataframe) with newly aligned survey item segments.

alignment_based_on_item_structure.treat_multiple_pairless_source_segments(list_source, list_target, sorted_aligments, source_segments_without_pair, df)[source]

Align source and target segments, in case where there are multiple source segments without a pair.

Parameters
  • list_source (param1) – list of source segments (contains segments of same item_name and item_type)

  • list_target (param2) – list of target segments (contains segments of same item_name and item_type)

  • sorted_aligments (param3) – sorted list segments aligned via best match strategy.

  • source_segments_without_pair (param4) – indexes of pairless souce segments

  • df (param5) – dataframe to store the questionnaire alignment

Returns

df (pandas dataframe) with newly aligned survey item segments.

alignment_based_on_item_structure.treat_multiple_pairless_target_segments(list_source, list_target, sorted_aligments, target_segments_without_pair, df)[source]

Align source and target segments, in case where there are multiple target segments without a pair.

Parameters
  • list_source (param1) – list of source segments (contains segments of same item_name and item_type)

  • list_target (param2) – list of target segments (contains segments of same item_name and item_type)

  • sorted_aligments (param3) – sorted list segments aligned via best match strategy.

  • target_segments_without_pair (param4) – indexes of pairless target segments

  • df (param5) – dataframe to store the questionnaire alignment

Returns

df (pandas dataframe) with newly aligned survey item segments.

alignment_utils.instantiate_country_specific_request_object(study)[source]

Instantiates the appropriate set of country-specific requests according to the study. Country-specific requests are deleted from alignment by design because the answer categories frequently change from country to country.

Parameters

study (param1) – study metadata, embedded in filenames.

Returns

country_specific_requests (Python object). Instance of python object that encapsulates the item names of the country specific questions.

alignment_utils.instantiate_language_stopwords_set(language)[source]

Instantiates the appropriate list of language-specific stopwords. These lists were taken from https://github.com/stopwords-iso.

Parameters

study (param1) – study metadata, embedded in filenames.

Returns

country_specific_requests (Python object). Instance of python object that encapsulates the item names of the country specific questions.

class countryspecificrequest.ESSCountrySpecificR01[source]

Item names of country specific questions in ESS round 1.

class countryspecificrequest.ESSCountrySpecificR02[source]

Item names of country specific questions in ESS round 2.

class countryspecificrequest.ESSCountrySpecificR03[source]

Item names of country specific questions in ESS round 3.

class countryspecificrequest.ESSCountrySpecificR04[source]

Item names of country specific questions in ESS round 4.

class countryspecificrequest.ESSCountrySpecificR05[source]

Item names of country specific questions in ESS round 5.

class countryspecificrequest.ESSCountrySpecificR06[source]

Item names of country specific questions in ESS round 6.

class countryspecificrequest.ESSCountrySpecificR07[source]

Item names of country specific questions in ESS round 7.

class countryspecificrequest.ESSCountrySpecificR08[source]

Item names of country specific questions in ESS round 9.

class countryspecificrequest.ESSCountrySpecificR09[source]

Item names of country specific questions in ESS round 9.

class countryspecificrequest.EVSCountrySpecificR02[source]

Item names of country specific questions in EVS wave 2.

class countryspecificrequest.EVSCountrySpecificR03[source]

Item names of country specific questions in EVS wave 3.

class countryspecificrequest.EVSCountrySpecificR04[source]

Item names of country specific questions in EVS wave 4.

class countryspecificrequest.EVSCountrySpecificR05[source]

Item names of country specific questions in EVS wave 5.