Good day everybody and welcome to the IPBES data management tutorials.

Today we will be covering an example which is look based on the literature assessment in the global assessment chapter four and this session is a part of the chapter which is titled examples of implementing the this data management policy.

My name is Rainer Krug and I'm a member of the IPBES task force and knowledge and data and I'm working at the university of Zurich.

Each literature review can be roughly separated in two steps the two steps are firstly identification of the literature and second the review and analysis of the literature itself.

As these are separate steps, I will covering them separately and they also result in two deposit packages with different content,

The first one the identification of the literature was for the literature review in chapter 4 of the global assessment where we started with 7000 identified articles at the beginning which was then narrowed down to about 750 articles which were then finally reviewed.

That information is available in Zenodo and is avaliable under the DOI mentioned there. 

The second part the review and the analysis of the actual results was based on a very extensive review where we were working with 20 experts 20 reviewers in our expert group.

We've used an excel workbook as a template for the review itself with questions which needed to be addressed based on the paper.

This workbook had about four worksheets and each worksheet of up to 26 different columns, different questions, and sometimes there were even multiple rows per paper so it was quite an extensive review with lots of work put in by the experts themselves.

I'm going to start off with looking at the identification of literature.

As mentioned this one is submitted to the Zenodo, it is available on Zenodo and the submission looks like this.

The document which you see is a central document it is a data management plan which consists of all the information needed to understand the data to make it reproducible to re-create the data.

How does it look in detail? What did we follow and how did we document it?

Well let's start with the first step of the literature review where we identified in a search on web of science and scopus based on specific search terms literature and these two were merged. 

These two sets of references were merged and duplicates were removed.

And you can already see that we've got in this flowchart the blue links so each of these links refers to one bibliography.

Each step is documented and still available in the data deposit package.

 The next step is the screening of the bibliography. Obviously if you do a search you're ending up with a huge number of references and you've got two choices if you make it broad to cut get all references which you need you will end up with a huge number of references which are not valid.

If you make it too narrow you will lose probably a large number of references. 

So you always have to filter and screen your results from the review and that's what we did.

We don't use a method where we first screened by words so we had quite a few literatures which are dealing exclusively cancer research which could be excluded easily but the main filtering was done by the experts again which did filtering based on the abstracts and on the title of the of the references.

This process is obviously not reproducible but the method we use the approach we use is detailed in the data management report so you can understand what we did and how we did it although you will not be able to reproduce the final results as you will have different experts.

During this process, we or rather the experts realize that certain papers were missing from that list, important papers, so we had a two more approaches of adding references to the dataset.

Again these ones are subjective they are based on the experience of the individual authors of the individual experts so they are not reproducible but the method what was used.

For example in reviewing using reviews to identify papers can be detailed and can be described in detail as it is done in the data management report.

The next step was then the actual review identifier.

Where we've got a certain number of articles from each of these groups.

Where we try to review all of them.

There are certain reasons why they were not reviewed, many time constraints at the end.

So we ended up with one data set one bibliography which consists of all the reviewed articles. 

Looking at the individual bibliographies you can really trace one step to the next one you can see which ones are excluded and how many are excluded.

Start to sum up that point there is a step of the search and there's a step of the screening and filtering which is done here and all of these steps are documented in the data management report and when code was used that code is included as well.

Also an important point is to describe the non-reproducible steps.

Describe what you did and how you did it although as experts are involved in their expert decisions, the results itself are not reproducible but the process is.

So you end up with bibliographies from all different stages of this filtering and the selection process.

Okay now you've got your bibliography which you want to review so let's come to the second part which is to review and analyze the results.

That one is also in the node although it will only be accessible beginning of 2021. 

It was an extensive process with many reviews as i mentioned already.

So how did we do it?  We used an excel workbook for the review template essentially the questionnaire but during the process of really working with the data of combining the different reviews, it was realized that other tools would have been much more suitable for their job than excel spreadsheet.

Excel spreadsheets have huge limitations and tools like surveymonkey or google forms are much better suited for their job.

Okay, it might take longer really to develop and design these different forms but on the other hand you're saving lots of time in editing your final data in collecting that one and your quality of the data will be higher so look into the possibilities of using online forms like surveymonkey or google forms.

An important important essential point is the raw data, so the reviews done by the reviewers are sacrosanct they are not to be edited they are as they are. 

There's only one person who can change something in these individual reports that's a person who has done the review only they know what they meant by certain statements so they can clarify or change them not the person who is doing the analysis.

I was the one who always contacted the authors and the reviewers and saying please fix this.

The whole process from raw data to final results should be open and reproducible.

That's in contrast to the first step of the selection of the bibliography.

This step should be completely open, reproducible, to make this possible we use the combination of R markdown, R functions and the R package structure to combine the reviews, to check for possible errors, to create preliminary reports about the review, to conduct an analysis, and finally to produce final graphs.

So whenever new reviews came in they were stored in a folder and three four different markdown documents were regenerated and we had them all all that information was updated.

You don't have to use R, you can use whatever you want but you should aim it being completely reproducible.

To sum up this step, the first big task is really in a systematic review to design the review template.

Then you've got the control quality control, and the combination of the individual reviews, and you've got the analysis and the visualization.

The review template is filled in and you end up with the individual reviews that's your raw data not to be touched.

This raw data is then combined into one data set and this data set is then analyzed with code and workflows for analysis and graphing which then finally results in your final graphs.

All these things are again put into the data management report as well as the individual code and data sets will become part of the data deposit package.

To wrap up we've got these two different steps and both of them are available so you can look at them, you can look at them as examples.

You don't have to follow them but they are providing a nice example of how things can be done in a transparent reproducible way. I want to thank you for listening and I hope you enjoyed this session on the literature ready from the global assessment chapter 4.

Feel free to contact us at tsu.data@ipbes.net if you have any further questions to these or other sessions.

Finally i would like to thank the other contributors to the session. 

Aidin Niamir and Joy Kumagai who helped to prepare the content as well as the editors of the different chapters for the whole tutorial.

Definitely last, but not least, I would like to thank everybody who was involved in the global assessment chapter 4 in doing this whole review, which was a huge chunk of work.

Thank you.