Data Exploration and Wrangling

Ted Laderas

2018-09-21

Our Overall Goal

What is Exploratory Data Analysis?

Remember

“Exploratory data analysis can never be the whole story, but nothing else can serve as the foundation stone.” - John Tukey, Exploratory Data Analysis

Why Data Exploration?

Systolic Blood Pressure

Systolic Blood Pressure

Why Visualization?

Let’s look first

The first few rows of our data

Running the Shiny App

In your workspace, you can use runApp() in the console to start, or open the app.R file and click the Run App play button in the top right of the script window.

runApp()

Map your questions to a tab:

Data Explorer

Data Explorer

What is the Overview Tab for?

Overview Tab

Overview Scavenger Hunt

  1. How many categorical variables are there?
  2. How many missing Cardiovascular Disease Cases are there?
  3. What is the mean age for the dataset?
  4. How is oahi calculated?

What is the Category Tab for?

Categorical Tab

Categorical Scavenger Hunt

  1. How many categories are there for race?
  2. Are the proportions of cvd cases balanced across race?
  3. If you are male, are you more likely to have cardiovascular disease?
  4. Is the proportion of missing data for any_cvd balanced across race categories?

Continuous Tab

Continuous Scavenger Hunt

  1. As you get older, are you more likely to have CVD?
  2. Is age_s1 evenly distributed across race? If not, how are they distributed?
  3. Are bmi_s1 and neck20 correlated? Why do you think that’s the case?
  4. Should we include both bmi_s1 and neck20 in our dataset?

Congratulations

You are now a full fledged data explorer!

https://waynepelletier.com/work/tasty-icons

https://waynepelletier.com/work/tasty-icons

Overall