This repository contains the author gender dataset (as a comma-delimited .csv file) associated with the paper entitled 'The Impact of Gender on Conference Authorship in Audio Engineering: Analysis Using a New Data Collection Method', published in the IEEE Transactions Special Issue on Increasing the Socio-Cultural Diversity of Electrical and Computer Engineering and Related Fields. Available at: dx.doi.org/10.1109/TE.2018.2814613. Please cite both the paper and dataset if used. Visualisation is available at: http://tibbakoi.github.io/aesgender.
The dataset was produced using a novel method which used self-identified pronouns, therefore allowing for as many groups as necessary to describe the population.
A list of authors was generated from conference proceedings.
An email was sent to each author to acquire their pronoun.
If no email was available/no response was received, a pronoun was acquired from a biography.
If no biography was available, a pronoun was inferred from traditional gender markers and gender presentation.
If no gender marker/photograph was available, the entry was labelled as 'Information Unavailable'. For brevity, the label 'Unknown' is used in the paper.
The columns in the dataset are as follows:
ID: unique identifier of entry
Pronoun: pronoun of entry
Position (abs): numerical absolute position within author list for entry
Position (relative): relative position within author list for entry (either First, Last, or Middle)
Single/multi-author: whether the publication for that entry has a single author or has multiple authors (single author publications are excluded from author position analysis)
Conference: Full conference name of entry
Topic: Topic of conference of entry, taken from conference name
Year: Year of conference of entry
Type: Type of publication for that entry as listed on the online conference proceedings
Grouped Type: Grouping of publication types for that entry for easier analysis due to inconsistencies in online conference proceedings (groups are: workshop, poster, paper, panel, keynote, invited speaker, invited paper, demo)
Inc. for author pos?: True/False as to whether to include the entry for analysis over author position (included types are: paper, invited paper, poster (all with multiple authors) as these have meaningful author orders)
Inc. for single/multi-author?: True/False as to whether to include the entry for analysis over single/multi author (includes types are: paper, invited paper, poster as these have meaningful author orders)
Invited paper status: Grouping of the types to allow statistical analysis over invited vs non-invited types (invited types are: invited speaker, invited paper, keynote, panel. Non-invited types are: poster, paper, demo, workshop)
NB: Some grouping of the data is required as online conference proceedings are not always consistent (Column 10). Some labelling of the data is required to determine which entries to include in certain types of analysis (Columns 11-13).
This dataset is distributed in the hopes that it will prove useful under the Creative Commons Attribution 4.0, with no warranty; or the implied warranty of merchantability or fitness for a particular problem.
Dataset curated by: Kat Young and Michael Lovedee-Turner at the Audio Lab, Dept. of Electronic Engineering, University of York.
The use of 'ID' as the first column header may cause a warning popup on opening. The file may be detected as a 'SYLK' file, but will still open.