---
title: "Summary statistics on material, brushing, dirting and before/after"
author: "Ivan Calandra"
date: "`r Sys.time()`"
output:
  html_document:
    toc: true
    toc_depth: 5
    toc_float: true
    theme: cerulean
    highlight: pygments
params:
  data: 
    value: "Data/brushing_v2.Rbin"
---

<!-- Define the Rbin-file to use on line 14 --> 
<!-- This Rbin-file must be in a 'Data' folder within the directory of the present Rmd file --> 

<!-- Define numeric variables on which summary stats should be calculated on line 91 --> 
<!-- Define grouping variables for the summary stats on lines 123, 129 and 135 --> 

---

Rbin-file used for this script: **`r paste0("~/", params$data)`**


```{r Knitr Options, include=FALSE}
	knitr::opts_chunk$set(comment=NA, message=FALSE, indent="", error=TRUE)
```

---


### 1. Goal of the script
This script computes standard descriptive statistics for each group.  
The groups are based on:

* Brushing  
* Dirt  
* Before/After  
  
It computes the following statistics:  

* n (sample size = `length`): number of measurements  
* smallest value (`min`)  
* largest value (`max`)
* mean  
* median  
* standard deviation (`sd`)

```{r}
dir.stats <- "Summary-stats"
dir.create(dir.stats, showWarnings=FALSE)
```

The results will be written to XLSX-files in a subfolder "`r dir.stats`" within the directory of the present Rmd file, i.e. "`r paste("~",dir.stats,sep='/')`".


---


### 2. Load packages
```{r}
library(openxlsx)
library(R.utils)
library(tools)
library(doBy)
```


---


### 3. Load data into R object
```{r}
imp.info <- file.info(params$data)
imp.data <- loadObject(params$data)
```

The imported file is: "`r paste0("~/", params$data)`"  
Its modification, 'last status change' (= 'creation' on Windows) and last access times are, respectively: "`r imp.info$mtime`", "`r imp.info$ctime`" and "`r imp.info$atime`".


---


### 4. Define numeric variables
```{r}
num.var <- 21:length(imp.data)
```

The following variables will be used: 

```{r, echo=FALSE}
for (i in num.var) cat("[",i,"] ", names(imp.data)[i], "\n", sep="")
```

---


### 5. Compute summary statistics
#### 5.1. Create function to compute the statistics at once
```{r}
nminmaxmeanmedsd <- function(x){
	y <- x[!is.na(x)]
	n_test <- length(y)
	min_test <- min(y)
	max_test <- max(y)
	mean_test <- mean(y)
 	med_test <- median(y)
 	sd_test <- sd(y)
 	out <- c(n_test, min_test, max_test, mean_test, med_test, sd_test)
 	names(out) <- c("n", "min", "max", "mean", "median", "sd")
 	return(out)
}
```

#### 5.2. Compute the summary statistics in groups
##### 5.2.1. Before.after
```{r}
ba <- summaryBy(.~Before.after, data=imp.data[c("Before.after",names(imp.data)[num.var])], FUN=nminmaxmeanmedsd)
str(ba)
```

##### 5.2.2. Brush+Before.after
```{r}
brush.ba <- summaryBy(.~Brush+Before.after, data=imp.data[c("Brush","Before.after",names(imp.data)[num.var])], FUN=nminmaxmeanmedsd)
str(brush.ba)
```

##### 5.2.3. Dirt+Before.after
```{r}
dirt.ba <- summaryBy(.~Dirt+Before.after, data=imp.data[c("Dirt","Before.after",names(imp.data)[num.var])], FUN=nminmaxmeanmedsd)
str(dirt.ba)
```

##### 5.2.4. Brush+Dirt+Before.after
```{r}
brush.dirt.ba <- summaryBy(.~Brush+Dirt+Before.after, data=imp.data[c("Brush","Dirt","Before.after",names(imp.data)[num.var])], FUN=nminmaxmeanmedsd)
str(brush.dirt.ba)
```

---


### 6. Write results to XLSX
#### 6.1. Format file output name
```{r}
file.out <- paste0(dir.stats, "/", basename(file_path_sans_ext(row.names(imp.info))), "_summary-stats.xlsx")
```

The results will be written to the file: "`r file.out`"


#### 6.2. Write to XLSX
```{r}
write.xlsx(list(B.A=ba, Brush_B.A=brush.ba, Dirt_B.A=dirt.ba, Brush_Dirt_B.A=brush.dirt.ba), file=file.out)
xlsx.info <- file.info(file.out)
```

The exported XLSX-file is: "`r basename(row.names(xlsx.info))`"  
It is saved in "`r paste0('~/',dir.stats)`"  
Its modification, 'last status change' (= 'creation' on Windows) and last access times are, respectively: "`r xlsx.info$mtime`", "`r xlsx.info$ctime`" and "`r xlsx.info$atime`".

---

```{r}
sessionInfo()
```

RStudio Version `r readLines("RStudioVersion.txt")`

---

END OF SCRIPT
