#  Analysis of survey data

Author: Thorsten Arendt, Philipps-Universit√§t Marburg

### Import required modules

In [None]:
import pandas as pd

### Load raw survey data 

Read raw cleaned data file in open csv format from GitLab repository...

In [None]:
# If you use the data from the publication site you have to adapt the url to the path on your computer.
url = 'https://vhrz1125.hrz.uni-marburg.de/arendt/tonic-survey/-/raw/master/data/tonic_survey_cleaned_data.csv'
records_all = pd.read_csv(url, error_bad_lines=False)

... and show the first first rows.

In [None]:
records_all.head()

How many records does the survey have? Result: (number of rows/respondents, number of columns/questions/options). 

In [None]:
records_all.shape

### Filter those answers from CRC 135

What is the proportion between the involved CRCs? This is checked by evaluating the answers to question 'To which CRC do you belong?' in column named 'CRC' ...

In [None]:
crc_s = 'CRC'
records_all[crc_s].value_counts()

... and plot it.

In [None]:
records_all[crc_s].value_counts().plot(kind="bar", figsize=(15,7), color="#61d199")

Filter data according to CRC 135 and show the first rows as well as the number of records.

In [None]:
crc135 = 'CRC 135'
records_crc135 = records_all[records_all[crc_s] == crc135]
records_crc135.head()

In [None]:
records_crc135['CRC'].size

### Basic information on CRC 135 participants

What is the proportion of the CRC 135 answers between career stage? This is checked by evaluating the answers to question 'In which career stage would you currently situate yourself?' in column 'Career Stage' ... and plot it.

In [None]:
career_stage = 'Career Stage'
records_crc135[career_stage].value_counts().plot(kind="pie", figsize=(15,7))

What is the proportion of the CRC 135 answers between research fields? This is checked by evaluating the answers to question 'What is/are your main field(s) of research?' in the corresponding columns ... and plot it.

In [None]:
# research_fields = ['Neuroimaging', 'Behavioural Neuroscience', 'Neurophysiology', 'Computational Neuroscience', 'Theoretical Neuroscience', 'Molecular and Cellular Neuroscience', 'Clinical Neuroscience', 'Systems Neuroscience', 'Electrophysiology', 'Psychology', 'Cognitive Science', 'Computational Modelling']
research_fields = ['Neuroimaging', 'Behavioural Neuroscience', 'Neurophysiology', 'Computational Neuroscience', 'Theoretical Neuroscience', 'Clinical Neuroscience', 'Psychology', 'Cognitive Science', 'Computational Modelling']
quantities = [0] * len(research_fields)
for field in research_fields:
    records_tmp = records_crc135[records_crc135[field] == 1]
    quantities[research_fields.index(field)] = records_tmp.shape[0]
s = pd.Series(quantities, index=research_fields)
s.plot(kind="barh", figsize=(15,7), color="#61d199")

What is the proportion of the CRC 135 answers between existing levels of computer affinity? This is checked by evaluating the answers to question 'How computer-affine is your research workflow?' in column 'Computer Affinity' ... and plot it.

In [None]:
computer_affinity = 'Computer Affinity'
records_crc135[computer_affinity].value_counts().plot(kind="pie", figsize=(15,7))

### Basic answers

First we asked the researchers which tamplate looks more similar to their current project structure on the participant's computer. The results show that appr. one third of the participants' structure does not fit to any template while two thirds use a similar structure, with a small preference for the more hierarchical `top-5` structure.

In [None]:
similarity = 'Similarity'
records_crc135[similarity].value_counts().plot(kind="pie", figsize=(15,7))

Next, we askes the participants which of the presented templates looks easier to use. It turned out that the researchers were nearly undecided since both templates got a quite similar vote, again with a small preference for the more hierarchical `top-5` structure.

In [None]:
Usefulness = 'Usefulness'
records_crc135[Usefulness].value_counts().plot(kind="pie", figsize=(15,7))

AS a more hypothetical question we asked what kind of support the researchers would like to have if such a structure would be provided (e.g., in a form of a tool) in order to get familiar with it. Here, multiple responses were possible. The vast majority of the participants would either prefer documentation on a FAQ website or example-like support, or both.

In [None]:
support = ['No Help', 'Example', 'FAQ Docu', 'Training']
quantities = [0] * len(support)
for entry in support:
    records_tmp = records_crc135[records_crc135[entry] == 1]
    quantities[support.index(entry)] = records_tmp.shape[0]
s = pd.Series(quantities, index=support)
s.plot(kind="pie", figsize=(15,7))

The most important question we asked was whether the researchers would use such a structure. It turned out that only a few participants do not see any advantages in using such a homogenuous structure while nearly half of the researchers would use it straight away. The same number of participants as the proponents were undecided.

In [None]:
Willingness = 'Willingness'
records_crc135[Willingness].value_counts().plot(kind="pie", figsize=(15,7))

### Specific analysis

In order to analyse whether the willingness to use a homogenuous research folder structure depends on the participants' properties such as career stage or cumputer affinity we performed the following checks.

Wee started with the question whether the willingness depends on whether the researchers think that using such a structure is usefull. It pointed out that the more hierarchical `9-top` structure is the favorite one.

In [None]:
usefulness_entries = ['5-top', '9-top']
willingness_entries = ['Yes', 'No', 'Maybe']
data_set = [[]] * len(usefulness_entries)
for usefulness_entry in usefulness_entries:
    data_set_index = usefulness_entries.index(usefulness_entry)
    data = [usefulness_entry, 0, 0, 0]
    records_tmp = records_crc135[records_crc135[Usefulness] == usefulness_entry]
    for willingness_entry in willingness_entries:
        data_index = willingness_entries.index(willingness_entry) + 1
        records_inner_tmp = records_tmp[records_tmp[Willingness] == willingness_entry]
        data[data_index] = records_inner_tmp.shape[0]
    data_set[data_set_index] = data
df = pd.DataFrame(data_set,columns=[Usefulness]+willingness_entries)
print(df)
df.plot(x=Usefulness, y=willingness_entries, kind="bar",figsize=(9,8))

The next analysis showed that PIs are more sceptical for using such a structure while PhD students and PostDocs aren't.

In [None]:
career_entries = ['PI', 'Postdoc', 'PhD Student']
data_set = [[]] * len(career_entries)
for career_entry in career_entries:
    data_set_index = career_entries.index(career_entry)
    data = [career_entry, 0, 0, 0]
    records_tmp = records_crc135[records_crc135[career_stage] == career_entry]
    for willingness_entry in willingness_entries:
        data_index = willingness_entries.index(willingness_entry) + 1
        records_inner_tmp = records_tmp[records_tmp[Willingness] == willingness_entry]
        data[data_index] = records_inner_tmp.shape[0]
    data_set[data_set_index] = data
df = pd.DataFrame(data_set,columns=[career_stage]+willingness_entries)
print(df)
df.plot(x=career_stage, y=willingness_entries, kind="bar",figsize=(9,8))

In the next analysis we did not find any correlation between the willingness to use such a template and the research field of the corresponding researcher.

In [None]:
research_field_title = 'Research Field'
data_set = [[]] * len(research_fields)
for research_field in research_fields:
    data_set_index = research_fields.index(research_field)
    data = [research_field, 0, 0, 0]
    records_tmp = records_crc135[records_crc135[research_field] == 1]
    for willingness_entry in willingness_entries:
        data_index = willingness_entries.index(willingness_entry) + 1
        records_inner_tmp = records_tmp[records_tmp[Willingness] == willingness_entry]
        data[data_index] = records_inner_tmp.shape[0]
    data_set[data_set_index] = data
df = pd.DataFrame(data_set,columns=[research_field_title]+willingness_entries)
print(df)
df.plot(x=research_field_title, y=willingness_entries, kind="bar",figsize=(9,8))

Also the computer affinity seems not to be correlated to the willingness to use such a structure.

In [None]:
affinity_entries = ['specialised software usage', 'self coding', 'in silico research', 'paper and spreadsheets usage']
data_set = [[]] * len(affinity_entries)
for affinity_entry in affinity_entries:
    data_set_index = affinity_entries.index(affinity_entry)
    data = [affinity_entry, 0, 0, 0]
    records_tmp = records_crc135[records_crc135[computer_affinity] == affinity_entry]
    for willingness_entry in willingness_entries:
        data_index = willingness_entries.index(willingness_entry) + 1
        records_inner_tmp = records_tmp[records_tmp[Willingness] == willingness_entry]
        data[data_index] = records_inner_tmp.shape[0]
    data_set[data_set_index] = data
df = pd.DataFrame(data_set,columns=[computer_affinity]+willingness_entries)
print(df)
df.plot(x=computer_affinity, y=willingness_entries, kind="bar",figsize=(9,8))

Finally, we correlated the already used structure with the willingness to use such a structure. It turned out that researchers who already use a similar more flat structure (`9-top`) would likely more tend to use such a structure while researchers who already use a more hierarchical structure (`5-top`) are more undecided. 

In [None]:
similarity_entries = ['5-top', '9-top', 'no structure']
data_set = [[]] * len(similarity_entries)
for similarity_entry in similarity_entries:
    data_set_index = similarity_entries.index(similarity_entry)
    data = [similarity_entry, 0, 0, 0]
    records_tmp = records_crc135[records_crc135[similarity] == similarity_entry]
    for willingness_entry in willingness_entries:
        data_index = willingness_entries.index(willingness_entry) + 1
        records_inner_tmp = records_tmp[records_tmp[Willingness] == willingness_entry]
        data[data_index] = records_inner_tmp.shape[0]
    data_set[data_set_index] = data
df = pd.DataFrame(data_set,columns=[similarity]+willingness_entries)
print(df)
df.plot(x=similarity, y=willingness_entries, kind="bar",figsize=(9,8))

This project has been partially funded by Deutsche Forschungsgemeinschaft (DFG), project number 22641018 CRC/TRR 135 TP INF.