The Demographic and Health Surveys (DHS) Program has collected and disseminated population survey data from over 90 countries for over 30 years. In many countries, DHS provide the key data that mark progress towards targets such as the Sustainable Development Goals (SDGs) and inform health policy such as detailing trends in child mortality (Silva 2012) and characterising the distribution of malaria control interventions in Africa in order to map the burden of malaria since the year 2000 (Bhatt et al. 2015). Though standard health indicators are routinely published in survey final reports, much of the value of DHS is derived from the ability to download and analyse standardized microdata datasets for subgroup analysis, pooled multi-country analysis, and extended research studies.
The analysis of the microdata datasets, however, requires a ‘clean’ dataset that contains all the desired information. One of the main challenges when interacting with the raw DHS datasets is isolating the required dataset variables across different countries. Since the DHS Program started, there have been 7 ‘phases’ of questionnaires used between 1984 - 2018. The data from each phase then recoded to consistency and comparability across surveys. However, new questions are often included or ammended between different phases of the DHS program, which results in variable names sometimes changing between different phases. As well as this, there are a number of country specific records that are not part of model questionnaires. As such, it can become increasingly difficult to identify which variables to use within your final ‘clean’ dataset.
The rdhs package was designed to facilitate the management and processing of DHS survey data. This occurs through both functioning as an API client, allowing access to all data provided within the DHS API, and helping to download the raw datasets from the DHS website and read them into conventional R data structures. In overview, the package provides a suite of tools for the following:
The functionality provided represents the output of conversations with numerous research groups globally, and serves to simplify commonly required analytical pipelines. The end result aims to increase the end user accessibility to the raw data and create a tool that supports reproducible global health research. Furthermore, the package is hoped to enable researches in lower middle income countries, which constitute the majority of countries that are surveyed as part of the DHS program, to analyse their data without the need for proprietary software.
Bhatt, S, D J Weiss, E Cameron, D Bisanzio, B Mappin, U Dalrymple, K E Battle, et al. 2015. “The effect of malaria control on Plasmodium falciparum in Africa between 2000 and 2015.” Nature 526: 207–11. doi:10.1038/nature15535.
Silva, Romesh. 2012. “Child Mortality Estimation: Consistency of Under-Five Mortality Rate Estimates Using Full Birth Histories and Summary Birth Histories.” PLoS Medicine 9: e1001296. doi:10.1371/journal.pmed.1001296.