GEDII Survey on Research Teams Dataset

Introduction

The gediiccs package contains the result datasets of the GEDII cross-country survey carried out among EU based research teams during the year 2017. The dataset contains teams from Austria, Belgium, Czech Republic, Denmark, Finland, France, Germany, Italy, Lithuania, the Netherlands, Norway, Poland, Portugal, Spain, Sweden, Switzerland and the UK. Overall, the Consortium recruited 159 research groups submitting a total of 1501 online questionnaires - out of which 1357 are complete.

A complete account of the methodological framework including the questionnaire(s) used, recruitment procedure, duration of field phase is availalbe in “D4.3 Survey Analysis And Performance Indicator Research Report” (Callerstig et al. 2018). The overall conceptual framework that informed the development of the questionnaire(s) is available in “D1.1. Gender Diversity and Team Science. A Conceptual Framework” (Müller et al. 2016).

Key issues are briefly described in the following paragraphs.

The dataset is unique in that it combines individual level responses from research group members regarding team climate, communication patterns, or leadership style amongst others with bibliometric performance data for each particular group retrieved from Web of Science (Clarivate Analytics).

The original gediiccs package contains two datasets:

gedii_ccs contains the individual level responses from team members, a total of 1501 entries. Detailed documentation regarding the available variables is available by typing

?gedii_ccs

gedii_tld contains the aggregated team level data only. For each research team, the individual responses have been aggregated by various procedures. The individual level data needs to be consulted in order to understand the underlying measurement scales for team level data. Team level data variables are in general prefixed with “TL_”. Detailed documentation regarding the available variables is available by typing

?gedii_tld

NOTE: The open access version of gediiccs contains the team level data (gedii_tld) only. This restriction is due to confidentiality concerns: given the large quantity of variables in relation to the relatively small research groups, it would be easy for group members to “reverse engineer” the identity of their own group as well as the responses from individual group members.

Recruitment

Recruitment of research teams targeted research group leaders in a first step. Participating teams agreed to provide a list of full names of team members. The team member questionnaire was send to each team member while plain names were used to retrieve and compile the corresponding bibliometric profile of the entire group. Pre-processing scripts merged all available data sources into a single matrix.

A team was understood as a group of people working together and bound by the same organizational context (e.g. through formal labor contract), including administrative staff, technicians, MA, Phd, Postdoc and Senior positions as well as the team leader.

Questionnaires

The GEDII survey involved two separate questionnaires: first, the team member questionnaire addressing each of the individual research group members. A second questionnaire was directed to our team contact (often the team leader) targeting global information regarding the team such as its founding year, working methodologies used, or the availability of shared office space. Team member and team contact information has been merged into the gedii_ccs data matrix; the team contact variables are prefixed with TC_. Not all research teams participating in the survey have returned the team contact questionnaire. This explains why for some teams we have individual responses but no data for team contact variables.

Preprocessing

In an effort to make the research process transparent, we have included in the gediiccs package the pre-processing scripts used for generating the available datasets. Preprocessing scripts include error fixing, recodification of variables, merging of distributed raw survey matrices, incorporation of bibliometric performance data as well as the Gender Diversity Index, and the aggregation of individual-level to team-level data. The pre-processing script also documents several rounds of data checking and fixing of bibliometric data.

The file build_ccs.R contains the pre-processing of the individual level data and the merging of the bibliometric performance data as well as the Gender Diversity Index score.

The file build_tld.R contains the code for aggregating the individual level data to the team level.

Overall, three main data sources are combined:

Versioning

The gedii_ccs dataset and consequently the gedii_tld team level dataset track changes of the bibliometric derived variables mainly. The variable “FAP1000” for example holds the initial scores of bibliometric performance indicators. Repeated checks as well as considerations concerning the time window to be examined lead to increasingly refined data. Thus, a mayor update concerning the bibliometric data was performed during November 2018 and the resulting variables then postfixed with “_Nov18“. The different versions of the variables are documented.

Main variable blocks

Socio-demographics

Includes variables such as age, gender, highest qualification, scientific discipline, marital status, ethnic minority, chronic illness,

You and your team

Role in the team, team tenure, time dedicated to team, influence others in team, team climate, working climate, leadership style, mentoring relations,

Working conditions

Type of contract, hours contracted, hours working, years of experience in the field, external funding raised, time dedicated to publications/patents, editor to journals, member of professional association, RRI activities, care responsibilties.

Performance data

Productivity (use FAP5Y1000_Nov18, FAP5Y1000xNrSeniorBib_Nov18) and impact (use: PMM5Y1000_Nov18, PMM5Y1000xNrSeniorBib_Nov18) have been retrieved from Web of Science. Patents have been retrieved from Patstat database. Self-reported performance data: number of publication and number of patents. Note that there are only 12 groups with patent data.

Bibliometric data has been used to assign “seniority” to team members based upon their publication record. Variables that are not based upon questionnaire items (i.e. that are not self-reported) but derived from bibliometric information usually contain a “bib” pre- or postfix. For example NrSeniorBib_Nov18 indicates the number of senior team members - where seniority is defined by their bibliometric record.

Gender

Self-reported gender among respondents and manually assigned gender based upon first names of complete team member listing. Gender variables report absolute numbers, the percentage in relation to the entire team, and gender balance (100% at 50/50). Gender has been crossed with bibliometric information to derive for example the gender distribution of bibliometric seniors.

The Gender Diversity Index is a composite indicator which combines data on those seven grounds into one fgure. It is bound between 0 and 1, and measures both parity in representaton in the most desirable categories (e.g. senior roles) or more inclusive categories (e.g. care responsibilites), as well as equal chances for women and men to access these categories (atriton) for each of the seven grounds. In other words, women and men should be equally represented in the more desirable or inclusive categories, and the ‘pipeline’ there should not be leaking. The dataset includes the Gender Diversity Index scores as well as the scores for the individual pillars. For a detailed discussion of this composite indicator see (Humbert and Günther 2018).

Citing

Please do cite this dataset when publishing any derived analysis:

Müller, Jörg, Ulrike Busolt, Anne-Charlott Callerstig, Elisabeth Anna Guenther, Anne Laure Humbert, Sandra Klatt, Ulf Sandström. (2019). GEDII Survey on Research Teams Dataset. https://doi.org/10.5281/zenodo.2545196

Funding

The collection of the sociometric data has been carried out within the framework of the H2020 GEDII project, which has received financing from the European Commission under the Grant Agreement nº 665851.

References

Callerstig, Anne-Charlott, Elisabeth Anna Guenther, Anne Laure Humbert, Sandra Klatt, Jörg Müller, and Ulf Sandström. 2018. “Survey Analysis and Performance Indicator Research Report.” GEDII Project Deliverable D4.3, March. doi:10.5281/zenodo.2546551.

Humbert, Anne Laure, and Elisabeth Günther. 2018. “Measuring Gender Diversity in Research Teams: Methodological Foundations of the Gender Diversity Index.” GEDII Project Deliverable D3.2, March. doi:10.5281/zenodo.1442705.

Müller, Jörg, Anne Laure Humbert, Elisabeth Anna Guenther, Ulf Sandström, Anne-Charlott Callerstig, and Sandra Klatt. 2016. “Gender Diversity and Team Science. Conceptual Framework.” GEDII Project Deliverable D1.1, May. doi:10.5281/zenodo.1442691.