class: center, middle, inverse, title-slide # Welcome to the R
3
course! --- layout: true --- # Please do these as we get ready: - ✅ Go to your assigned table and group (see list) - ✅ Introduce yourself to your group members - ✅ Re-install r3 `remotes::install_gitlab("rostools/r3")` - ✅ Accept the GitHub Organization invite - ✅ Accept the Slack channel invite ??? Introduce instructors and helpers after welcoming everyone and getting them to do this. --- class: middle # Question: ## Before this course... How many knew or have heard about <u>reproducibility</u>? ??? Raise your hands. --- class: middle # Question: ## Before this course... How many knew or have heard about <u>open science</u>? --- class: middle # Question: ## Before this course... or even <u>open access, open data, open methods/protocols, or open source</u>? --- class: middle # Question: ## How many have read a method in a paper and wondered how they <u>actually</u> did it? ??? Because you are trying to do the same or similar? And you've probably realize by now, way more is done than shown in the "Methods". --- class: middle # Question: ## Have you ever received confusing code? Or maybe have written your own confusing code? ??? I definitely have in my research career. We want to change the culture around code by encouraging and teaching how to share code and to write better code in general. --- ## Code sharing: From scientific principle of "reproducibility" ... often confused with "replicability" <a name=cite-Plesser2018a></a>[[1](https://doi.org/10.3389/fninf.2017.00076)]<sup>1</sup> ??? How many know the difference between replicating and reproducing? -- .pull-left[ ### Replicability - Repeating a study by *independently* performing another identical study - Difficult, usually needs funding - Linked to the "irreproducibility crisis"<sup>2</sup> ] -- .pull-right[ ### Reproducibility - Generating the exact same results when using the same data and code - Should be easy right? Wrong, often just as hard - *Question*: If we can't even *reproduce* a studies results, how can we expect to replicate it? ] .footnote[ 1. Also from an American Statistical Association [statement](https://www.amstat.org/asa/files/pdfs/POL-ReproducibleResearchRecommendations.pdf). 2. Or rather "irreplicability crisis". ] --- class: middle # Problem: Biomedical studies almost entirely don't publish code with the published paper <a name=cite-Leek2017a></a><a name=cite-Considine2017a></a>[[2](https://doi.org/10.1146/annurev-statistics-060116-054104); [3](https://doi.org/10.1007/s11306-017-1299-3)] ??? Vast majority of papers *still* don't provide code. Except for maybe in bioinformatics, where a bit more than half of studies do. There are lots of reasons for this, that I talk more about tomorrow. --- class: middle # These issues can be fixed by creating and nurturing a culture of openness ??? All of this is because of a problem with our culture in research. We aren't open, we don't really share, and don't often follow basic principles of science. To fix this, we need to start creating and nurturing a better and healthier culture. We all can be involved in that, we all have that power to do something, even if its small thing. --- class: middle # Goal of this course? Start changing the culture by providing the training --- ## Often asked: So why are we using R and why learn it? .pull-left[ - Open source, free - Very large online community - Learning resources, support, help - Massive selection of packages - Latest statistical methods - Productivity tools - Report writing - Visualization ] .pull-right[ - Recent push to improve teaching, usability - e.g. with tidyverse, RStudio - One of best visualization tools available - Powerful capabilities - Big Data - Programming - Reproducibility ] ??? Regarding it being free, that means that you can take the knowledge and skills for using R anywhere you go. --- ## Often asked: Why are we spending time on learning things other than R? Because the course is about doing *open and reproducible research*, while using R. ??? At least a few times, we get feedback in our survey about the fact we didn't spend enough time learning R or they expected more R. That's because we're teaching open and reproducible research, and that involves more than just R. -- | Session | Reason | |--|--| | Management of R projects | Reproducibility starts at the file level | | Version control | Openness and reproducibility is about transparency and inspection | | Data management and wrangling | This one is about R 😸| | Creating reproducible documents | Hopefully obvious 😛 | | Data visualization | Also about R! | --- ## Course setup and layout - Course is mix of: - "Code-alongs" (we type and explain, you type along) - Hands-on exercises - Final group work for [assignment](../assignment.html) (quickly go over it) - Groups were made with range of skill and knowledge - True to our mission, material publicly accessible and [openly licensed](https://r-cubed.rostools.org/license.html) - <https://r-cubed.rostools.org/> ??? With the final group project, you'll be in the same group for the course, working together on it and on the final exercises. As a team, you'll help each other out with learning and overcoming any struggles, with of course our help too! I've tried to organize the groups to include a range of skills and experiences, so there is a mix of novice and more experienced users. --- ## Getting or asking for help 🙋 .pull-left[ - Put the sticky on your laptop to get help - There are lots of helpers - Team members, try to help out too ] .pull-right[ - We're all learning here! - This is a supportive and safe environment - Remember our [Code of Conduct](conduct.html) - We have a [cheatsheet](resources/cheatsheet.pdf)! ] --- class: middle # Practice using stickies: Have you re-installed r3 and joined the GitHub Organization as well as the Slack group? --- class: middle # Activity: ## Stand and arrange 🚶 or raise hand 🙋 based on question ??? We're going to do a "stand and re-arrange yourself" activity based on some questions I ask. --- class: middle ## Who has not yet used R? ??? Go into different corners for "yes" and "no". --- class: middle ## Those who've used R, how do you perceive your skill in R? 🚶 ??? Along the wall, arrange to one side is "novice/basic" and other side is "advanced". --- class: middle ## Those who've used R or other coding tool (like Stata), have you had formal training in "coding" in it? 🙋 ??? Raise hands. --- class: middle ## How do you perceive your general skill in data analysis? 🚶 ??? Along the wall, arrange from "novice/basic" to "advanced". --- class: middle ## Get back into your groups and get to know each other a bit more ??? So, as we prepare for the next session, introduce yourselves to the group and get to know each other more. You'll be relying on them for help, so find out who is the "more experienced" R user. --- # References <a name=bib-Plesser2018a></a>[[1]](#cite-Plesser2018a) H. E. Plesser. "Reproducibility Vs. Replicability: A Brief History of a Confused Terminology". In: _Frontiers in Neuroinformatics_ 11 (Jan. 2018). DOI: [10.3389/fninf.2017.00076](https://doi.org/10.3389%2Ffninf.2017.00076). <a name=bib-Leek2017a></a>[[2]](#cite-Leek2017a) J. T. Leek and L. R. Jager. "Is Most Published Research Really False?" In: _Annual Review of Statistics and Its Application_ 4.1 (Mar. 2017), pp. 109-122. DOI: [10.1146/annurev-statistics-060116-054104](https://doi.org/10.1146%2Fannurev-statistics-060116-054104). <a name=bib-Considine2017a></a>[[3]](#cite-Considine2017a) E. C. Considine, G. Thomas, et al. "Critical Review of Reporting of the Data Analysis Step in Metabolomics". In: _Metabolomics_ 14.1 (Dec. 2017). DOI: [10.1007/s11306-017-1299-3](https://doi.org/10.1007%2Fs11306-017-1299-3).