This R Notebook supports the electronic laboratory notebook (ELN) suvey data shown in the publication: “Considerations for Implementing Electronic Laboratory Notebooks in an Academic Research Environment”, S.G. Higgins, A.A. Nogiwa-Valdez, M.M. Stevens (2021).
Load required packages:
library(here)
here() starts at /Users/stuart/OneDrive - Imperial College London/_Papers/ELN-essay/product_survey_revised
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ──────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
✓ ggplot2 3.3.3 ✓ purrr 0.3.4
✓ tibble 3.0.6 ✓ dplyr 1.0.4
✓ tidyr 1.1.2 ✓ stringr 1.4.0
✓ readr 1.4.0 ✓ forcats 0.5.1
── Conflicts ─────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
library(plotly)
Registered S3 method overwritten by 'data.table':
method from
print.data.table
Registered S3 methods overwritten by 'htmltools':
method from
print.html tools:rstudio
print.shiny.tag tools:rstudio
print.shiny.tag.list tools:rstudio
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
Attaching package: ‘plotly’
The following object is masked from ‘package:ggplot2’:
last_plot
The following object is masked from ‘package:stats’:
filter
The following object is masked from ‘package:graphics’:
layout
library(htmlwidgets)
Load in survey data from file, recode ‘ongoing’ tags in the date_defunct
column to the year 2021, calculate the total number of years active, and create a logical vector for each row determining whether the ELN is active or not in the year 2021: (Note: this Notebook expects file ‘ELN_Review_Higgins_2021_Survey.csv’ to be present in the same directory as the working directory identified by the here
package)
data <-
read_csv(here("ELN_Review_Higgins_2021_Survey.csv")) %>%
mutate(date_defunct_numeric = as.numeric(replace(date_defunct, date_defunct == "ongoing", 2021)),
years_active = date_defunct_numeric - date_released,
defunct_in_2021 =
case_when(
date_defunct == "ongoing" ~ FALSE,
date_defunct == 2021 ~ TRUE,
TRUE ~ TRUE
),
row_number = row_number())
── Column specification ─────────────────────────────────────────────────────────────────────────────────
cols(
product_name = col_character(),
manufacturer = col_character(),
date_released = col_double(),
date_defunct = col_character(),
codebase = col_character(),
notes = col_character(),
reference_1 = col_character(),
reference_2 = col_character(),
reference_3 = col_character(),
reference_4 = col_character(),
references_accessed = col_character()
)
How many ELNs were surveyed?
data %>%
count()
How many of the ELNs surveyed are active (FALSE) or defunct (TRUE) in 2021?
data %>%
count(defunct_in_2021)
What is the average (and spread) of the lifetime (years_active
) of the ELNs surveyed? (Note: the median absolute estimate here has a default scaling constant of 1.4826, so that it acts as as a consistent estimator of the standard deviation)
data %>%
summarise(mean_years_active = mean(years_active),
sd_years_active = sd(years_active),
median_years_active = median(years_active),
mad_years_active = mad(years_active),
iqr_years_active = IQR(years_active),
range_years_active = max(years_active)-min(years_active))
What are the average and spread of the lifetimes of ELNs, sub-divided by codebase?
data %>%
group_by(codebase) %>%
summarise(mean_years_active = mean(years_active),
sd_years_active = sd(years_active),
median_years_active = median(years_active),
mad_years_active = mad(years_active),
iqr_years_active = IQR(years_active),
range_years_active = max(years_active)-min(years_active))
How many of the ELNs surveyed have open-source or proprietary codebases?
data %>%
count(codebase)
Which are the longest running proprietary and open source ELNs (in the survey data)?
data %>%
group_by(codebase) %>%
slice_max(n=1, order_by=years_active) %>%
select(product_name, manufacturer, years_active, date_defunct, codebase)
Define a theme for plotting figures:
mytheme <-
theme_bw() +
theme(
panel.background = element_rect(fill = "white", colour = "black", size = 2),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
text = element_text(size = 25, face = "plain", colour = "black"),
axis.title.x = element_text(size = 25, face = "plain"),
axis.title.y = element_text(size = 25),
element_line(size = 2),
axis.ticks.length = unit(0.15, "cm"))
Define functions for customising the appearance of plotted figures:
get_point_colour <- function(x){
ifelse(x==TRUE, "grey", "grey30")
}
get_line_colour <- function(x){
ifelse(x!="opensource", "#0072B2", "#CC79A7")
}
Produce the timeline plot featured in Figure 1 of the main manuscript:
p_timeline <-
data %>%
mutate(row_number = as_factor(row_number)) %>%
mutate(row_number = fct_reorder(fct_reorder(row_number, years_active, .desc=FALSE), codebase, .desc=FALSE)) %>%
mutate(row_number_new = as.numeric(row_number)) %>%
ggplot() +
geom_segment(aes(x=date_released, xend=date_defunct_numeric,y=row_number_new, yend=row_number_new),
colour=get_line_colour(data$codebase),
linetype="solid",
size=0.5) +
geom_point(aes(x=date_released, y=row_number_new), colour=get_point_colour(data$defunct_in_2021), shape=1, size=2 ) +
geom_point(aes(x=date_defunct_numeric, y=row_number_new), colour=get_point_colour(data$defunct_in_2021), shape=16, size=0.5) +
scale_x_continuous(position="bottom", breaks=c(seq(1980,2021,5))) +
coord_cartesian(xlim=c(1980,2021)) +
theme_bw() +
theme(
plot.margin = margin(0.1, 0.1, 0.1, 0.1, "cm"),
panel.border = element_blank(),
panel.grid.major.y = element_line(colour="grey95", size=0.25),
panel.grid.major.x = element_line(colour="grey95", size=0.25),
panel.grid.minor.x = element_line(colour="grey95", size=0.25),
axis.text.y = element_blank(),
axis.text.x = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.ticks.y = element_blank(),
axis.ticks.x = element_blank(),
legend.position = "bottom"
)
print(p_timeline)
ggsave(here("ELN_Review_Higgins_2021_Timeline.pdf"), plot=p_timeline, width=18.0, height=10, device="pdf", dpi=600, units="cm")