Loading Libraries

Loading the tidyverse library

library(tidyverse)

Importing User Created Functions

I created functions in the file ‘average.R’ You access them with the source function and note this loads from a relative source (.) which is the working directory and the functions folder.

source('./functions/average.R')

Importing Excel Data

The function for reading csvs in using the readr package is read_csv(). Note base R also has a read csv function but it is slow.

Here we load our csv dataset which is Months with numbers associated with each week.

We load from the data folder which is found using the relative marker . which searches in the working directory which should be set as the project folder then the folder data in the working directory.

There is also a read_excel function which works more or less the same way but with more options like designating sheets.

See here for more information: https://readxl.tidyverse.org/

data <- read_csv('./data/calandar_example.csv')

data
## # A tibble: 12 × 6
##    Month       `1`   `2`   `3`   `4`   `5`
##    <chr>     <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 January      40    42    40    41    40
##  2 February     35    37    38    39    NA
##  3 March        50    51    55    57    60
##  4 April        60    61    62    64    65
##  5 May          74    78    79    81    84
##  6 June         85    86    85    88    89
##  7 July         90    92    94    97    99
##  8 August       99   102   101    98    95
##  9 September    90    88    85    84    82
## 10 October      82    70    65    65    64
## 11 November     55    54    58    52    50
## 12 December     50    48    50    48    47

Tidying Up the Data

Now we take the data and tidy it up using the pivot_longer() function from tidyr.

More information here https://tidyr.tidyverse.org/reference/pivot_longer.html

data_tidy <- data %>% pivot_longer(
  cols = c(2:6),
  names_to = 'week',
  values_to = 'avg_temp',
  values_drop_na = T
)

data_tidy <- data_tidy %>% rename(month = Month)

data_tidy
## # A tibble: 59 × 3
##    month    week  avg_temp
##    <chr>    <chr>    <dbl>
##  1 January  1           40
##  2 January  2           42
##  3 January  3           40
##  4 January  4           41
##  5 January  5           40
##  6 February 1           35
##  7 February 2           37
##  8 February 3           38
##  9 February 4           39
## 10 March    1           50
## # … with 49 more rows

Note that the NA value for the 5th week of February has been dropped with the value_drop_na = T parameter. The 2nd to 6th column were converted into the Week variable and the values were added to the Average Temperature column. Another note is that the week variable is a character which we could modify, but is actually handy for the chart we are going to make.

Plotting with Tidy Data

In order to use the ggplot function the data needs to be tidy. Now that our data is tidy lets we what we can plot.

Note this graph also has a labs function which sets the labels for the axis, titles, subtitles, and legend. More here: https://ggplot2.tidyverse.org/reference/labs.html

Also this graph has a theme, theme_classic. There are other themes like theme_minimal and theme_dark. More here: https://ggplot2-book.org/polishing.html

There are also different of scales as well. More here: https://ggplot2-book.org/scale-position.html

data_tidy %>%
  ggplot(aes(x = month, y = avg_temp, group = week, shape = week, color = week))+
  geom_point()+
  geom_line()+
  #geom_point(aes(x = 2014, y = 440000), color = 'gold', fill = 'gold', shape = 23, size = 5)+
  #geom_text(aes(x = 2000, y = 447000, label = '2505 Bird'))+
  labs(title = 'Average Temperature by Month',
       subtitle = 'Not Ordered',
       tag = 'A',
       x = 'Month',
       y = 'Average Temperature (F)',
       color = 'Week',
       shape = 'Week')+
  scale_y_continuous(limits = c(0,110), breaks = seq(0,110, by = 10))+
  #scale_x_continuous(limits = c(1250,2500), breaks = seq(1250, 2500, by = 250))+
    theme_classic()

Ordering Character Type Data with the Factor Function

Note this graph has the months out of order and the weeks with weird lines. WE can change the order of the months by changing that data type to factor from character and specifying the factor order.

More here: https://r4ds.had.co.nz/factors.html

data_tidy %>% mutate(month = factor(month, levels = month.name))
## # A tibble: 59 × 3
##    month    week  avg_temp
##    <fct>    <chr>    <dbl>
##  1 January  1           40
##  2 January  2           42
##  3 January  3           40
##  4 January  4           41
##  5 January  5           40
##  6 February 1           35
##  7 February 2           37
##  8 February 3           38
##  9 February 4           39
## 10 March    1           50
## # … with 49 more rows

Calculating Averages with Ordered and Tidy Data

Here we use the group_by and summarize functions to calculate an average by month, and week. If we had more variables we could group by more than one set of variables at the same time. We then save them to data frames, df1 and df2.

More here: https://dplyr.tidyverse.org/reference/summarise.html

df1 <- data_tidy %>% 
  mutate(month = factor(month, levels = month.name)) %>%
  group_by(month) %>% 
  summarize(avg_temp_month = mean(avg_temp))

df2 <- data_tidy %>%
  mutate(month = factor(month, levels = month.name)) %>%
  group_by(week) %>%
  summarize(avg_temp_week = mean(avg_temp))

df1
## # A tibble: 12 × 2
##    month     avg_temp_month
##    <fct>              <dbl>
##  1 January             40.6
##  2 February            37.2
##  3 March               54.6
##  4 April               62.4
##  5 May                 79.2
##  6 June                86.6
##  7 July                94.4
##  8 August              99  
##  9 September           85.8
## 10 October             69.2
## 11 November            53.8
## 12 December            48.6
df2
## # A tibble: 5 × 2
##   week  avg_temp_week
##   <chr>         <dbl>
## 1 1              67.5
## 2 2              67.4
## 3 3              67.7
## 4 4              67.8
## 5 5              70.5

Using Function to Calculate Mean

Above we used the source function to import the functions created in the ‘average.R’ script. Those functions are monthly_mean and weekly_mean which do the exact same thing as the code chunk above. You simply input the dataframe and they output the same tibbles.

More here: https://towardsdatascience.com/5-minute-guide-to-calling-functions-from-r-scripts-41c4a09db1eb https://www.earthdatascience.org/courses/earth-analytics/multispectral-remote-sensing-data/source-function-in-R/ and a Python example https://www.geeksforgeeks.org/python-call-function-from-another-file/

monthly_mean(data_tidy)
## # A tibble: 12 × 2
##    month     avg_temp_month
##    <fct>              <dbl>
##  1 January             40.6
##  2 February            37.2
##  3 March               54.6
##  4 April               62.4
##  5 May                 79.2
##  6 June                86.6
##  7 July                94.4
##  8 August              99  
##  9 September           85.8
## 10 October             69.2
## 11 November            53.8
## 12 December            48.6
weekly_mean(data_tidy)
## # A tibble: 5 × 2
##   week  avg_temp_week
##   <chr>         <dbl>
## 1 1              67.5
## 2 2              67.4
## 3 3              67.7
## 4 4              67.8
## 5 5              70.5

Plotting with Ordered and Tidy Data

Now that we know we have to order the data frames we can plot again with the right month order. Note in this plot I expanded the theme options for more a more customized plot. We also save this plot as f1.

More here: https://ggplot2-book.org/polishing.html

f1 <- data_tidy %>% mutate(month = factor(month, levels = month.name)) %>%
  ggplot(aes(x = month, y = avg_temp, group = week, shape = week, color = week))+
  geom_point()+
  geom_line()+
  #geom_point(aes(x = 2014, y = 440000), color = 'gold', fill = 'gold', shape = 23, size = 5)+
  #geom_text(aes(x = 2000, y = 447000, label = '2505 Bird'))+
  labs(title = 'Average Temperature by Month',
       x = 'Month',
       y = 'Average Temperature (F)')+
  scale_y_continuous(limits = c(0,110), breaks = seq(0,110, by = 10))+
  #scale_x_continuous(limits = c(1250,2500), breaks = seq(1250, 2500, by = 250))+
    theme_minimal()+
  theme(plot.title = element_text(size = 12, face = 'bold'), 
        legend.position = 'right', 
        legend.text = element_text(size = 12), 
        #legend.title = element_blank(), 
        axis.title = element_text(size = 12), 
        axis.text = element_text(size = 12), 
        axis.text.x = element_text(angle = 15), 
        #axis.text.y = element_text(angle = 15), 
        #axis.title.y = element_blank(), 
        #strip.text.x = element_text(size = 8, face = 'bold'),
        #legend.key.width = unit(0.1, 'in'), 
        #legend.key.height = unit(0.1,'in'), 
        #legend.margin = margin(t = 0, unit = 'cm'), 
        #plot.margin = unit(x = c(0.5,0.5,0.5,0.5), units = 'mm'), 
        #axis.ticks.y = element_blank()
        )

f1

## Exporting Data Frames as CSVs

The new tables we made for the monthly and weekly averages can be exported as csvs using the write_csv function. They can also be exported as Excel files.

More here: https://www.statology.org/export-data-frame-to-csv-in-r/

write_csv(df1, file ='./results/csvs/monthly_averages.csv')

write_csv(df2, file ='./results/csvs/weekly_averages.csv')

Exporting Graphs

We can also export figures. There are two main functins one that works for ggplot graphs (ggsave) and one that works with all plots.

Typically we want to export as a pdf for a vectorized figure that can be resized without loss of quality and a png or jpeg for quick sharing.

ggsave has can adjust the width and height or dpi of the image. The default width and height are 7x5 inches. dpi can only be adjusted on non vectorized image types like png and jpeg and the standards are retina (320 ppi), print (300 dpi), and screen (72 dpi).

More information here: https://sscc.wisc.edu/sscc/pubs/using-r-plots/saving-plots.html#other-options-for-saving-plots https://www.statology.org/export-data-frame-to-csv-in-r/

ggsave(plot = f1, filename = './results/figures/plot_vector.pdf', width = 7, height = 5, device = 'pdf', units = 'in')

ggsave(plot = f1, filename = './results/figures/plot_nonvector.png', dpi = 320)

jpeg(filename = './results/figures/plot_basesave.png', width = 6, height = 4, units = 'in', res = 300)
f1
dev.off()
## png 
##   2