Outline for modules
Introduction to R - Refactoring the Workshop
Refactor existing workshop for smaller video segments and flipped presentation
- Download software & run locally
- R & RStudio are not the same thing
- Cloud Options
- https://rstudio.cloud
- https://cmgr.oit.duke.edu/containers
- R v Python
- R is a data first (i.e. analysis) programming language
- Python is a general programming language
- Are there libraries/packages relevant to your work?
- Is there a supportive community?
- Rfun
- R Community
- R Ladies | Locally, R Ladies, RTP
- TidyTuesday weekly practice sessions (+ David Robinson’s recorded live feed of TT on YouTube )
- ML leans towards Python, or does it?
- Programmer/Coder v Analyst
- An RStudio project & reproducibility
- You are your most frequent collaborator separated by time
- A simple test: Identify specific computational steps from a six-month old project?
- A simple goal: Reproduce your computation on a different computer
- Initial Reproducibility in a nutshell
- Do everything with a script
- Avoid point & click
- Use relative paths
- Write your code to run on any similar environment
- Read more: Initial steps toward reproducible research. Karl Broman
- RStudio Projects enable Reproducibility
- Relative files paths
read_csv("data_raw/raw_data.csv")- ProTip:
..to move up one level in the directory structure - Avoid absolute paths
- avoid
setwd() - e.g.
setwd("d:/rfiles/myrproject")
- avoid
- ProTip:
- Restart R and run all chunks
- avoid:
rm(list=ls())
- avoid:
- R Markdown & literate coding
- A script integrates code and natural language
- Explain and describe your analysis within your workflow
- Render reports in multiple formats
- Notebooks, slide decks, web pages, dashboards, e-books, journal articles
- File structure matters
- Practice of Reproducible Research by Kitzes, Turek, Deniz
- Relative files paths
- You are your most frequent collaborator separated by time
EXAMPLE File Structure...
project_name (folder)
|-- project_name.Rproj
|-- README.md
|-- license.txt
| data_raw
| |-- raw_data.csv
| |-- README.txt
| data_clean
| code_source
| |--data_cleaning.Rmd
| |--analysis.Rmd
| images
| reports_results
Get Data & Code Repository
- Access your own data file (e.g. CSV)
- Download & Expand a GitHub repository
- Click on *.Rproj
Tour of the RStudio environment
- Create a blank project
- Console | Files / Packages / Help | Environment | Script Editor
- R Markdown
- Switch to your other project (from Section 2)
- Keyboard Shortcuts
Tidyverse & other library packages
Packages extend the functionality of base R into your domain
Practice Frequency Command Install once install.packages("tidyverse")Load each time library(tidyverse)-
- an opinionated collection of packages with consistent web-based documentation and a supportive community
- a Meta-package that loads 8 helper packages and installs many consistent utilities
Name Purpose readr importing CSV data dplyr transforming data ggplot2 visualizing tibble rectangular grid / data frame tidyr pivot forcats categorical data / factors stringr string data / manipulate natural language purrr iteration Other package repositories
Assignment
<-& pipe%>%and|>R, RStudio IDE
- Base R, in the console
- A big calculator
- RStudio & Tidyverse
- Base R, in the console
Quarto - coding notebooks and publishing outputs
dplyrpackage- “A grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges”
mutate()adds new variablesselect()pick variables / columnsfilter()subset data by rowsummarise()reduces multiple values into a summarycount()a special case ofsummarize()to tally occurrencesarrange()sort rows
- RStudio Keyboard Shortcuts
- “A grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges”
tidyrpackage- Make messy data into tidy data
- Every variable is a column
- Every row is an observation
- Every cell is a single value
- i.e. pivots
- Make messy data into tidy data
dplyr revisited
- People who like
pivot_longer()also likedplyr::left_joint()
- People who like
Exploratory Data Analysis (EDA):
ggplot2()&skimrskimr::skim()from library(skimr)- ggplot2(): a brief overview of visualization
ggplot2(): an introduction to the grammar of graphics, & interactive plots viaplotly
Future modules
More on Quarto
Interactivity with Quarto and Observable JS
Large Data
Version Control: git and GitHub
Quick Start - Demonstration
Make a data folder
Drag fav.csv into the data folder
Make existing folder and RStudio project
Open an R Markdown Notebook
library(tidyverses)plus other librariesIMPORT data
- See Also RStudio data import wizard
- ATTACH data
EDA: Visualize
ggplot(data = starwars, aes(hair_color)) + geom_bar()EDA:
skimr::skim(starwars)EDA: summary(fav_rating)
left_join(starwars, fivethirtyeight)Transform data: five dplyr verbs …
filter,select,arrange,mutatecount/group_by&summarize
Interactive visualization:
ggplotlylinear regression / models (quick syntax introduction)
Reports: notebooks, slides, dashboards, word document, PDF, book, etc.
Resources
Tidy Tuesday practice
Rfun - R we having fun yet‽
-
- Update of Master the Tidyverse
R for Data Science by Wickham and Grolemund
Introduction to Data Science by Çetinkaya-Rundel
Hands-On programming by Grolemund
ggplot2: Elegant Graphics for Data Analysis by Wickham
Interactive web-based data visualization with R, plotly, and shiny by Sievert
Text Mining with R by Silge & Robinson
Statistical Inference via Data Science: A ModernDive into R and the Tidyverse by Ismay and Kim
Tidymodels packages for modeling and machine learning
- Tidy Models with R by Kuhn and Silge