15:00
Getting started with Tidyverse R,
RStudio, and Quarto
John R Little
January 6, 2023
Getting started with Tidyverse R, RStudio, and Quarto
John R Little
Center for Data & Visualization Sciences
Duke University
2023 January 06
Get code, data, and slides for today’s workshop
- https://github.com/libjohn/rfun_flipped
- https://github.com/libjohn/intro2r-code
15:00
R is a programming language
Reproducibility - Obtaining computational results using the same input data, computational steps, methods, code, and conditions of analysis. 1
Replicability - Obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.
Goals of a tool-based, first-class approach
Do as much as possible with code
Integrate prose with code; visualize inline
Generate reports for target audience
Iterate efficiently; {purrr}
Each variable is a column
Each observation is a row
Each type of observational unit is a table

Citation: https://doi.org/10.18637/jss.v059.i10
Preprint: https://vita.had.co.nz/papers/tidy-data.pdf
See more in R for Data Science by Wickham and Grolemund
A dialect of R
Easier to learn because of consistency and documentation
Assumptions
Data have semantic meaning that can be documented grammatically
Tidy data are wrangled, visualized, and iterated easily via grammar
50-80% of any data project is data wrangling
{dplyr} & {tidyr}
Keep stuff organized in the same directory
e.g. data, analysis, scripts, documentation, and outputs
In the notebook, refer to subdirectories via relative paths
better than setwd()
Shareability, portability, legibility, and reproducibility
Use Restart-R-and-Run instead of rm(list=ls())
Literate coding
Intersperse prose and code
Integrated outputs with analysis
Render reports from code
dplyr}library(dplyr) or library(tidyverse). Use {dplyr} to wrangle data
select
filter
arrange
mutate
group_by
summarize
subset by column
subset by row
sort
generate new variables
column totals (or subtotals with group_by)
Visualization with ggplot2 (and interactive graphics) & Modeling (syntax)
Iteration and custom functions
Quarto and Observable interactivity
Best way to learn and/or consultations

John R Little
Data Science Librarian
Center for Data & Visualization Sciences
Duke University Libraries
https://JohnLittle.info ● https://Rfun.library.duke.edu ● https://Library.duke.edu/data

CC BY 4.0 John R Little