Published August 14, 2024 | Version v2
Dataset Open

Data curation materials in "Daily life in the Open Biologist's second job, as a Data Curator"

  • 1. Centre for Engineering Biology and School of Biological Sciences, University of Edinburgh, UK.

Description

This is the supplementary material accompanying the manuscript "Daily life in the Open Biologist’s second job, as a Data Curator", published in Wellcome Open Research

It contains:

- Python_scripts.zip: Python scripts used for data cleaning and organization:

           -add_headers.py: adds specified headers automatically to a list of csv files, creating new output files containing a "_with_headers" suffix.

           -count_NaN_values.py: counts the total number of rows containing null values in a csv file and prints the location of null values in the (row, column) format.

           -remove_rowsNaN_file.py: removes rows containing null values in a single csv file and saves the modified file with a "_dropNaN" suffix.

           -remove_rowsNaN_list.py: removes rows containing null values in list of csv files and saves the modified files with a "_dropNaN" suffix.

- README_template.txt: a template for a README file to be used to describe and accompany a dataset. 

- template_for_source_data_information.xlsx: a spreadsheet to help manuscript authors to keep track of data used for each figure (e.g., information about data location and links to dataset description).

- Supplementary_Figure_1.tif: Example of a dataset shared by us on Zenodo. The elements that make the dataset FAIR are indicated by the respective letters. Findability (F) is achieved by the dataset unique and persistent identifier (DOI), as well as by the related identifiers for the publication and dataset on GitHub. Additionally, the dataset is described with rich metadata, (e.g., keywords). Accessibility (A) is achieved by the ease of visualization and downloading using a standardised communications protocol (https). Also, the metadata are publicly accessible and licensed under the public domain. Interoperability (I) is achieved by the open formats used (CSV; R), and metadata are harvestable using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), a low-barrier mechanism for repository interoperability. Reusability (R) is achieved by the complete description of the data with metadata in README files and links to the related publication (which contains more detailed information, as well as links to protocols on protocols.io). The dataset has a clear and accessible data usage license (CC-BY 4.0).

Files

Python scripts.zip

Files (4.1 MB)

Name Size Download all
md5:88346c09b19c05ea15f9e52e36a27b46
3.2 kB Preview Download
md5:88c660164e626564e3b333cd25508aba
3.6 kB Preview Download
md5:64459b430e55571255d814cae0d49cca
4.0 MB Preview Download
md5:0685479332a39d26fac18cab92f93541
18.3 kB Download

Additional details

Related works

Is published in
Publication: https://doi.org/10.12688/wellcomeopenres.22899.1 (URL)

Funding

Wellcome Trust
DNA repair and genetic stability: Elucidating the effects of cell physiology in Escherichia coli 205008/Z/16/A
Medical Research Council
Mental Health and Circadian Science Network MR/X009726/1