Outline for modules

Author
Affiliations
John R Little

Duke University

Published

January 6, 2023

Introduction to R - Refactoring the Workshop

Refactor existing workshop for smaller video segments and flipped presentation

  1. Download software & run locally
  2. An RStudio project & reproducibility
    • You are your most frequent collaborator separated by time
      • A simple test: Identify specific computational steps from a six-month old project?
      • A simple goal: Reproduce your computation on a different computer
      • Initial Reproducibility in a nutshell
    • RStudio Projects enable Reproducibility
      1. Relative files paths
        • read_csv("data_raw/raw_data.csv")
          • ProTip: .. to move up one level in the directory structure
          • Avoid absolute paths
            • avoid setwd()
            • e.g. setwd("d:/rfiles/myrproject")
      2. Restart R and run all chunks
        • avoid: rm(list=ls())
      3. R Markdown & literate coding
        • A script integrates code and natural language
        • Explain and describe your analysis within your workflow
        • Render reports in multiple formats
          • Notebooks, slide decks, web pages, dashboards, e-books, journal articles
      4. File structure matters
EXAMPLE File Structure...

project_name (folder)
|-- project_name.Rproj
|-- README.md
|-- license.txt
|  data_raw
|  |-- raw_data.csv
|  |-- README.txt
|  data_clean
|  code_source
|  |--data_cleaning.Rmd
|  |--analysis.Rmd
|  images
|  reports_results
  1. Get Data & Code Repository

    • Access your own data file (e.g. CSV)
    • Download & Expand a GitHub repository
    • Click on *.Rproj
  2. Tour of the RStudio environment

    • Create a blank project
    • Console | Files / Packages / Help | Environment | Script Editor
    • R Markdown
    • Switch to your other project (from Section 2)
    • Keyboard Shortcuts
  3. Tidyverse & other library packages

    • Packages extend the functionality of base R into your domain

      Practice Frequency Command
      Install once install.packages("tidyverse")
      Load each time library(tidyverse)
    • Tidyverse:

      1. an opinionated collection of packages with consistent web-based documentation and a supportive community
      2. a Meta-package that loads 8 helper packages and installs many consistent utilities
      Name Purpose
      readr importing CSV data
      dplyr transforming data
      ggplot2 visualizing
      tibble rectangular grid / data frame
      tidyr pivot
      forcats categorical data / factors
      stringr string data / manipulate natural language
      purrr iteration
    • Other package repositories

  4. Assignment <- & pipe %>% and |>

  5. R, RStudio IDE

    • Base R, in the console
      • A big calculator
    • RStudio & Tidyverse
  6. Quarto - coding notebooks and publishing outputs

  7. dplyr package

    • “A grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges”
      • mutate() adds new variables
      • select() pick variables / columns
      • filter() subset data by row
      • summarise() reduces multiple values into a summary
      • count() a special case of summarize() to tally occurrences
      • arrange() sort rows
    • RStudio Keyboard Shortcuts
  8. tidyr package

    • Make messy data into tidy data
      • Every variable is a column
      • Every row is an observation
      • Every cell is a single value
    • i.e. pivots
  9. dplyr revisited

    • People who like pivot_longer() also like dplyr::left_joint()
  10. Exploratory Data Analysis (EDA): ggplot2() & skimr

    • skimr::skim() from library(skimr)
    • ggplot2(): a brief overview of visualization
  11. ggplot2(): an introduction to the grammar of graphics, & interactive plots via plotly


Future modules

  1. More on Quarto

  2. Interactivity with Quarto and Observable JS

  3. Large Data

  4. Version Control: git and GitHub

Quick Start - Demonstration

  1. Make a data folder

  2. Drag fav.csv into the data folder

  3. Make existing folder and RStudio project

  4. Open an R Markdown Notebook

  5. library(tidyverses) plus other libraries

  6. IMPORT data

  7. EDA: Visualize ggplot(data = starwars, aes(hair_color)) + geom_bar()

  8. EDA: skimr::skim(starwars)

  9. EDA: summary(fav_rating)

  10. left_join(starwars, fivethirtyeight)

  11. Transform data: five dplyr verbs …

    • filter, select, arrange, mutate
    • count / group_by & summarize
  12. Interactive visualization: ggplotly

  13. linear regression / models (quick syntax introduction)

  14. Reports: notebooks, slides, dashboards, word document, PDF, book, etc.

Resources