Introduction
The gediiccs
package contains the result datasets of the GEDII cross-country survey carried out among EU based research teams during the year 2017. The dataset contains teams from Austria, Belgium, Czech Republic, Denmark, Finland, France, Germany, Italy, Lithuania, the Netherlands, Norway, Poland, Portugal, Spain, Sweden, Switzerland and the UK. Overall, the Consortium recruited 159 research groups submitting a total of 1501 online questionnaires - out of which 1357 are complete.
A complete account of the methodological framework including the questionnaire(s) used, recruitment procedure, duration of field phase is availalbe in “D4.3 Survey Analysis And Performance Indicator Research Report” (Callerstig et al. 2018). The overall conceptual framework that informed the development of the questionnaire(s) is available in “D1.1. Gender Diversity and Team Science. A Conceptual Framework” (Müller et al. 2016).
Key issues are briefly described in the following paragraphs.
The dataset is unique in that it combines individual level responses from research group members regarding team climate, communication patterns, or leadership style amongst others with bibliometric performance data for each particular group retrieved from Web of Science (Clarivate Analytics).
The original gediiccs
package contains two datasets:
gedii_ccs
contains the individual level responses from team members, a total of 1501 entries. Detailed documentation regarding the available variables is available by typing
gedii_tld
contains the aggregated team level data only. For each research team, the individual responses have been aggregated by various procedures. The individual level data needs to be consulted in order to understand the underlying measurement scales for team level data. Team level data variables are in general prefixed with “TL_”. Detailed documentation regarding the available variables is available by typing
NOTE: The open access version of gediiccs
contains the team level data (gedii_tld
) only. This restriction is due to confidentiality concerns: given the large quantity of variables in relation to the relatively small research groups, it would be easy for group members to “reverse engineer” the identity of their own group as well as the responses from individual group members.
Recruitment
Recruitment of research teams targeted research group leaders in a first step. Participating teams agreed to provide a list of full names of team members. The team member questionnaire was send to each team member while plain names were used to retrieve and compile the corresponding bibliometric profile of the entire group. Pre-processing scripts merged all available data sources into a single matrix.
A team was understood as a group of people working together and bound by the same organizational context (e.g. through formal labor contract), including administrative staff, technicians, MA, Phd, Postdoc and Senior positions as well as the team leader.
Questionnaires
The GEDII survey involved two separate questionnaires: first, the team member questionnaire addressing each of the individual research group members. A second questionnaire was directed to our team contact (often the team leader) targeting global information regarding the team such as its founding year, working methodologies used, or the availability of shared office space. Team member and team contact information has been merged into the gedii_ccs
data matrix; the team contact variables are prefixed with TC_
. Not all research teams participating in the survey have returned the team contact questionnaire. This explains why for some teams we have individual responses but no data for team contact variables.
Preprocessing
In an effort to make the research process transparent, we have included in the gediiccs
package the pre-processing scripts used for generating the available datasets. Preprocessing scripts include error fixing, recodification of variables, merging of distributed raw survey matrices, incorporation of bibliometric performance data as well as the Gender Diversity Index, and the aggregation of individual-level to team-level data. The pre-processing script also documents several rounds of data checking and fixing of bibliometric data.
The file build_ccs.R
contains the pre-processing of the individual level data and the merging of the bibliometric performance data as well as the Gender Diversity Index score.
The file build_tld.R
contains the code for aggregating the individual level data to the team level.
Overall, three main data sources are combined:
questionnaire data (team member and team contact). The Gender Diversity Index scores are calcualted based upon questionnaire data.
bibliometric performance data retrieved form Web of Science of each group
desktop research data concerning the gender of team members based upon first name gender assignment in combination with web-based lookups.
Main variable blocks
Socio-demographics
Includes variables such as age, gender, highest qualification, scientific discipline, marital status, ethnic minority, chronic illness,
You and your team
Role in the team, team tenure, time dedicated to team, influence others in team, team climate, working climate, leadership style, mentoring relations,
Working conditions
Type of contract, hours contracted, hours working, years of experience in the field, external funding raised, time dedicated to publications/patents, editor to journals, member of professional association, RRI activities, care responsibilties.
Gender
Self-reported gender among respondents and manually assigned gender based upon first names of complete team member listing. Gender variables report absolute numbers, the percentage in relation to the entire team, and gender balance (100% at 50/50). Gender has been crossed with bibliometric information to derive for example the gender distribution of bibliometric seniors.
The Gender Diversity Index is a composite indicator which combines data on those seven grounds into one fgure. It is bound between 0 and 1, and measures both parity in representaton in the most desirable categories (e.g. senior roles) or more inclusive categories (e.g. care responsibilites), as well as equal chances for women and men to access these categories (atriton) for each of the seven grounds. In other words, women and men should be equally represented in the more desirable or inclusive categories, and the ‘pipeline’ there should not be leaking. The dataset includes the Gender Diversity Index scores as well as the scores for the individual pillars. For a detailed discussion of this composite indicator see (Humbert and Günther 2018).
Citing
Please do cite this dataset when publishing any derived analysis:
Müller, Jörg, Ulrike Busolt, Anne-Charlott Callerstig, Elisabeth Anna Guenther, Anne Laure Humbert, Sandra Klatt, Ulf Sandström. (2019). GEDII Survey on Research Teams Dataset. https://doi.org/10.5281/zenodo.2545196