Loading the tidyverse library
library(tidyverse)
I created functions in the file ‘average.R’ You access them with the source function and note this loads from a relative source (.) which is the working directory and the functions folder.
source('./functions/average.R')
The function for reading csvs in using the readr package is read_csv(). Note base R also has a read csv function but it is slow.
Here we load our csv dataset which is Months with numbers associated with each week.
We load from the data folder which is found using the relative marker . which searches in the working directory which should be set as the project folder then the folder data in the working directory.
There is also a read_excel function which works more or less the same way but with more options like designating sheets.
See here for more information: https://readxl.tidyverse.org/
data <- read_csv('./data/calandar_example.csv')
data
## # A tibble: 12 × 6
## Month `1` `2` `3` `4` `5`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 January 40 42 40 41 40
## 2 February 35 37 38 39 NA
## 3 March 50 51 55 57 60
## 4 April 60 61 62 64 65
## 5 May 74 78 79 81 84
## 6 June 85 86 85 88 89
## 7 July 90 92 94 97 99
## 8 August 99 102 101 98 95
## 9 September 90 88 85 84 82
## 10 October 82 70 65 65 64
## 11 November 55 54 58 52 50
## 12 December 50 48 50 48 47
Now we take the data and tidy it up using the pivot_longer() function from tidyr.
More information here https://tidyr.tidyverse.org/reference/pivot_longer.html
data_tidy <- data %>% pivot_longer(
cols = c(2:6),
names_to = 'week',
values_to = 'avg_temp',
values_drop_na = T
)
data_tidy <- data_tidy %>% rename(month = Month)
data_tidy
## # A tibble: 59 × 3
## month week avg_temp
## <chr> <chr> <dbl>
## 1 January 1 40
## 2 January 2 42
## 3 January 3 40
## 4 January 4 41
## 5 January 5 40
## 6 February 1 35
## 7 February 2 37
## 8 February 3 38
## 9 February 4 39
## 10 March 1 50
## # … with 49 more rows
Note that the NA value for the 5th week of February has been dropped with the value_drop_na = T parameter. The 2nd to 6th column were converted into the Week variable and the values were added to the Average Temperature column. Another note is that the week variable is a character which we could modify, but is actually handy for the chart we are going to make.
In order to use the ggplot function the data needs to be tidy. Now that our data is tidy lets we what we can plot.
Note this graph also has a labs function which sets the labels for the axis, titles, subtitles, and legend. More here: https://ggplot2.tidyverse.org/reference/labs.html
Also this graph has a theme, theme_classic. There are other themes like theme_minimal and theme_dark. More here: https://ggplot2-book.org/polishing.html
There are also different of scales as well. More here: https://ggplot2-book.org/scale-position.html
data_tidy %>%
ggplot(aes(x = month, y = avg_temp, group = week, shape = week, color = week))+
geom_point()+
geom_line()+
#geom_point(aes(x = 2014, y = 440000), color = 'gold', fill = 'gold', shape = 23, size = 5)+
#geom_text(aes(x = 2000, y = 447000, label = '2505 Bird'))+
labs(title = 'Average Temperature by Month',
subtitle = 'Not Ordered',
tag = 'A',
x = 'Month',
y = 'Average Temperature (F)',
color = 'Week',
shape = 'Week')+
scale_y_continuous(limits = c(0,110), breaks = seq(0,110, by = 10))+
#scale_x_continuous(limits = c(1250,2500), breaks = seq(1250, 2500, by = 250))+
theme_classic()
Note this graph has the months out of order and the weeks with weird lines. WE can change the order of the months by changing that data type to factor from character and specifying the factor order.
More here: https://r4ds.had.co.nz/factors.html
data_tidy %>% mutate(month = factor(month, levels = month.name))
## # A tibble: 59 × 3
## month week avg_temp
## <fct> <chr> <dbl>
## 1 January 1 40
## 2 January 2 42
## 3 January 3 40
## 4 January 4 41
## 5 January 5 40
## 6 February 1 35
## 7 February 2 37
## 8 February 3 38
## 9 February 4 39
## 10 March 1 50
## # … with 49 more rows
Here we use the group_by and summarize functions to calculate an average by month, and week. If we had more variables we could group by more than one set of variables at the same time. We then save them to data frames, df1 and df2.
More here: https://dplyr.tidyverse.org/reference/summarise.html
df1 <- data_tidy %>%
mutate(month = factor(month, levels = month.name)) %>%
group_by(month) %>%
summarize(avg_temp_month = mean(avg_temp))
df2 <- data_tidy %>%
mutate(month = factor(month, levels = month.name)) %>%
group_by(week) %>%
summarize(avg_temp_week = mean(avg_temp))
df1
## # A tibble: 12 × 2
## month avg_temp_month
## <fct> <dbl>
## 1 January 40.6
## 2 February 37.2
## 3 March 54.6
## 4 April 62.4
## 5 May 79.2
## 6 June 86.6
## 7 July 94.4
## 8 August 99
## 9 September 85.8
## 10 October 69.2
## 11 November 53.8
## 12 December 48.6
df2
## # A tibble: 5 × 2
## week avg_temp_week
## <chr> <dbl>
## 1 1 67.5
## 2 2 67.4
## 3 3 67.7
## 4 4 67.8
## 5 5 70.5
Above we used the source function to import the functions created in the ‘average.R’ script. Those functions are monthly_mean and weekly_mean which do the exact same thing as the code chunk above. You simply input the dataframe and they output the same tibbles.
More here: https://towardsdatascience.com/5-minute-guide-to-calling-functions-from-r-scripts-41c4a09db1eb https://www.earthdatascience.org/courses/earth-analytics/multispectral-remote-sensing-data/source-function-in-R/ and a Python example https://www.geeksforgeeks.org/python-call-function-from-another-file/
monthly_mean(data_tidy)
## # A tibble: 12 × 2
## month avg_temp_month
## <fct> <dbl>
## 1 January 40.6
## 2 February 37.2
## 3 March 54.6
## 4 April 62.4
## 5 May 79.2
## 6 June 86.6
## 7 July 94.4
## 8 August 99
## 9 September 85.8
## 10 October 69.2
## 11 November 53.8
## 12 December 48.6
weekly_mean(data_tidy)
## # A tibble: 5 × 2
## week avg_temp_week
## <chr> <dbl>
## 1 1 67.5
## 2 2 67.4
## 3 3 67.7
## 4 4 67.8
## 5 5 70.5
Now that we know we have to order the data frames we can plot again with the right month order. Note in this plot I expanded the theme options for more a more customized plot. We also save this plot as f1.
More here: https://ggplot2-book.org/polishing.html
f1 <- data_tidy %>% mutate(month = factor(month, levels = month.name)) %>%
ggplot(aes(x = month, y = avg_temp, group = week, shape = week, color = week))+
geom_point()+
geom_line()+
#geom_point(aes(x = 2014, y = 440000), color = 'gold', fill = 'gold', shape = 23, size = 5)+
#geom_text(aes(x = 2000, y = 447000, label = '2505 Bird'))+
labs(title = 'Average Temperature by Month',
x = 'Month',
y = 'Average Temperature (F)')+
scale_y_continuous(limits = c(0,110), breaks = seq(0,110, by = 10))+
#scale_x_continuous(limits = c(1250,2500), breaks = seq(1250, 2500, by = 250))+
theme_minimal()+
theme(plot.title = element_text(size = 12, face = 'bold'),
legend.position = 'right',
legend.text = element_text(size = 12),
#legend.title = element_blank(),
axis.title = element_text(size = 12),
axis.text = element_text(size = 12),
axis.text.x = element_text(angle = 15),
#axis.text.y = element_text(angle = 15),
#axis.title.y = element_blank(),
#strip.text.x = element_text(size = 8, face = 'bold'),
#legend.key.width = unit(0.1, 'in'),
#legend.key.height = unit(0.1,'in'),
#legend.margin = margin(t = 0, unit = 'cm'),
#plot.margin = unit(x = c(0.5,0.5,0.5,0.5), units = 'mm'),
#axis.ticks.y = element_blank()
)
f1
## Exporting Data Frames as CSVs
The new tables we made for the monthly and weekly averages can be exported as csvs using the write_csv function. They can also be exported as Excel files.
More here: https://www.statology.org/export-data-frame-to-csv-in-r/
write_csv(df1, file ='./results/csvs/monthly_averages.csv')
write_csv(df2, file ='./results/csvs/weekly_averages.csv')
We can also export figures. There are two main functins one that works for ggplot graphs (ggsave) and one that works with all plots.
Typically we want to export as a pdf for a vectorized figure that can be resized without loss of quality and a png or jpeg for quick sharing.
ggsave has can adjust the width and height or dpi of the image. The default width and height are 7x5 inches. dpi can only be adjusted on non vectorized image types like png and jpeg and the standards are retina (320 ppi), print (300 dpi), and screen (72 dpi).
More information here: https://sscc.wisc.edu/sscc/pubs/using-r-plots/saving-plots.html#other-options-for-saving-plots https://www.statology.org/export-data-frame-to-csv-in-r/
ggsave(plot = f1, filename = './results/figures/plot_vector.pdf', width = 7, height = 5, device = 'pdf', units = 'in')
ggsave(plot = f1, filename = './results/figures/plot_nonvector.png', dpi = 320)
jpeg(filename = './results/figures/plot_basesave.png', width = 6, height = 4, units = 'in', res = 300)
f1
dev.off()
## png
## 2