Data curation materials in "Daily life in the Open Biologist's second job, as a Data Curator"

Scorza, Livia C T; Zieliński, Tomasz; Millar, Andrew J

doi:10.5281/zenodo.13321937

Published August 14, 2024 | Version v2

Dataset Open

Data curation materials in "Daily life in the Open Biologist's second job, as a Data Curator"

1. Centre for Engineering Biology and School of Biological Sciences, University of Edinburgh, UK.

This is the supplementary material accompanying the manuscript "Daily life in the Open Biologist’s second job, as a Data Curator", published in Wellcome Open Research.

It contains:

- Python_scripts.zip: Python scripts used for data cleaning and organization:

-add_headers.py: adds specified headers automatically to a list of csv files, creating new output files containing a "_with_headers" suffix.

-count_NaN_values.py: counts the total number of rows containing null values in a csv file and prints the location of null values in the (row, column) format.

-remove_rowsNaN_file.py: removes rows containing null values in a single csv file and saves the modified file with a "_dropNaN" suffix.

-remove_rowsNaN_list.py: removes rows containing null values in list of csv files and saves the modified files with a "_dropNaN" suffix.

- README_template.txt: a template for a README file to be used to describe and accompany a dataset.

- template_for_source_data_information.xlsx: a spreadsheet to help manuscript authors to keep track of data used for each figure (e.g., information about data location and links to dataset description).

- Supplementary_Figure_1.tif: Example of a dataset shared by us on Zenodo. The elements that make the dataset FAIR are indicated by the respective letters. Findability (F) is achieved by the dataset unique and persistent identifier (DOI), as well as by the related identifiers for the publication and dataset on GitHub. Additionally, the dataset is described with rich metadata, (e.g., keywords). Accessibility (A) is achieved by the ease of visualization and downloading using a standardised communications protocol (https). Also, the metadata are publicly accessible and licensed under the public domain. Interoperability (I) is achieved by the open formats used (CSV; R), and metadata are harvestable using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), a low-barrier mechanism for repository interoperability. Reusability (R) is achieved by the complete description of the data with metadata in README files and links to the related publication (which contains more detailed information, as well as links to protocols on protocols.io). The dataset has a clear and accessible data usage license (CC-BY 4.0).

Files

Python scripts.zip

Files (4.1 MB)

Name	Size	Download all
Python scripts.zip md5:88346c09b19c05ea15f9e52e36a27b46	3.2 kB	Preview Download
README_template.txt md5:88c660164e626564e3b333cd25508aba	3.6 kB	Preview Download
Supplementary_Figure_1.tif md5:64459b430e55571255d814cae0d49cca	4.0 MB	Preview Download
template_for_source_data_information.xlsx md5:0685479332a39d26fac18cab92f93541	18.3 kB	Download

Additional details

Is published in: Publication: https://doi.org/10.12688/wellcomeopenres.22899.1 (URL)

Wellcome Trust
DNA repair and genetic stability: Elucidating the effects of cell physiology in Escherichia coli 205008/Z/16/A
Medical Research Council
Mental Health and Circadian Science Network MR/X009726/1

	All versions	This version
Views	252	191
Downloads	225	194
Data volume	163.2 MB	162.9 MB

Data curation materials in "Daily life in the Open Biologist's second job, as a Data Curator"

Files

Python scripts.zip

Files (4.1 MB)

Additional details

Related works

Funding

Data curation materials in "Daily life in the Open Biologist's second job, as a Data Curator"

Creators

Description

Files

Python scripts.zip

Files (4.1 MB)

Additional details

Related works

Funding