Wage Indicator Survey (WIS) data extraction and preprocessing methods¶
-
preprocessing.wis_data_extraction.
extract_wis_data
(df, df_questionnaire, study)[source]¶ Extracts and preprocesses WIS data from df, attibuting MCSQ metadata (and also harmonizing metadata e.g. item names, item types, when necessary).
- Parameters
df (param1) – the input data in a dataframe representation.
df_questionnaire (param2) – a dataframe to hold the processed questionnaire data.
study (param3) – the name of the study, embedded in the WIS export filename.
- Returns
the df_questionnaire (pandas dataframe) with the preprocessed data.
-
preprocessing.wis_data_extraction.
harmonize_item_type
(wis_item_type)[source]¶ Translates the item type indicated in the WIS exports to the item types that are present in the MCSQ. :param param1 wis_item_type: the item type value in a give row from the WIS export. :type param1 wis_item_type: string
- Returns
an item type (string) that corresponds to the MCSQ item types. Could be INTRODUCTION, RESPONSE, REQUEST or INSTRUCTION.
-
preprocessing.wis_data_extraction.
instantiate_survey_item_prefix
(study, column_name)[source]¶ Maps the WIS data export text column names to the ISO standards used in MCSQ. Then, defines the prefix of the survey item in accordance to the MCSQ standard nomenclature. The prefix of an MCSQ survey item is study+’_’+language+’_’+country+’_’
- Parameters
study (param1) – the name of the study, embedded in the WIS export filename.
column_name (param2) – the name of the text column, from the WIS export.
- Returns
a language specific survey item ID prefix (string).
-
preprocessing.wis_data_extraction.
main
(folder_path)[source]¶ Main method of the Wage Indicator data extraction and processing script. The data is extracted, preprocessed and receives appropriate metadata attribution.
The algorithm outputs the tsv representation of the df_questionnaire, used to store questionnaire data (pandas dataframe).
- Parameters
folder_path (param1) – path to the folder where the WIS master file is.
-
preprocessing.wis_data_extraction.
post_process_questionnaire
(df_questionnaire)[source]¶ Loops through the language/country pairs to export questionnaire and alignment data separatedely (one questionnaire file and one aligment per language/country pair).
- Parameters
df_questionnaire (param1) – the preprocessed questionnaire data, containing the text for all language/country pairs.
-
preprocessing.wis_data_extraction.
prepare_df_for_data_extraction
(df, df_questionnaire, study)[source]¶ Does preliminary editions in the input data to make the data extraction easier, such as harmonizing the item names of matrix survey item segments. Then, calls the data extraction method.
- Parameters
df (param1) – the input data in a dataframe representation.
df_questionnaire (param2) – a dataframe to hold the processed questionnaire data.
study (param3) – the name of the study, embedded in the WIS export filename.
- Returns
the df_questionnaire (pandas dataframe) with the preprocessed data.
-
preprocessing.wis_data_extraction.
set_initial_structures
(filename)[source]¶ Set initial structures that are necessary for the extraction of each questionnaire.
- Parameters
filename (param1) – name of the input file.
- Returns
df_questionnaire to store questionnaire data (pandas dataframe) and the study (string), which is embedded in the file name.
-
preprocessing.wis_data_extraction.
simplify_item_name
(wis_item_name)[source]¶ Simplifies the unique item names in the WIS exports to atribute unique item names for each survey item. This makes easier to find the survey items inside the MCSQ, as you will not need to search for each unique variable name (as it is in the WIS export). :param param1 wis_item_name: the item name value in a give row from the WIS export. :type param1 wis_item_name: string
- Returns
an item name (string) that will be used for all segments concerning a given WIS survey item.