Towards a comprehensive simulation model of malaria epidemiology and control

SUMMARY Planning of the control of Plasmodium falciparum malaria leads to a need for models of malaria epidemiology that provide realistic quantitative prediction of likely epidemiological outcomes of a wide range of control strategies. Predictions of the effects of control often ignore medium- and long-term dynamics. The complexities of the Plasmodium life-cycle, and of within-host dynamics, limit the applicability of conventional deterministic malaria models. We use individual-based stochastic simulations of malaria epidemiology to predict the impacts of interventions on infection, morbidity, mortality, health services use and costs. Individual infections are simulated by stochastic series of parasite densities, and naturally acquired immunity acts by reducing densities. Morbidity and mortality risks, and infectiousness to vectors, depend on parasite densities. The simulated infections are nested within simulations of individuals in human populations, and linked to models of interventions and health systems. We use numerous field datasets to optimise parameter estimates. By using a volunteer computing system we obtain the enormous computational power required for model fitting, sensitivity analysis, and exploration of many different intervention strategies. The project thus provides a general platform for comparing, fitting, and evaluating different model structures, and for quantitative prediction of effects of different interventions and integrated control programmes.


I N T R O D U C T I O N
'' Hubris seems to be the chief vice of the model builder without, and even with, field experience. It induces justified scepticism and is counter-productive. It serves only to convince the determinedly non-numerate public health worker that models are a seductive alternative to understanding '' (Bradley, 1982).
'' … and perhaps none will fully understand the behaviour of the whole system as it operates within the economic, social and political constraints of a given society. This may explain some of the difficulties of predicting with sufficient accuracy the likely consequences of any chosen strategy of intervention. And if one cannot predict the consequences of a given strategy, one had no rational basis for choosing between the available alternatives. '' (Bailey, 1982).
Mathematical modelling of malaria transmission dynamics goes back to Ross, who described the population dynamics of malaria and identified entomological thresholds for elimination (Ross, 1911). In the 1950s, as the emergence of indoor residual spraying made the elimination of malaria over wide areas conceivable, Ross' work was further developed by Macdonald and others, who provided the mathematical explanation of the effect of reduction of mosquito longevity on malaria transmission, especially through the concept of the basic reproduction number (Macdonald, 1957). Later models incorporated superinfection and immunity (Dietz, Molineaux and Thomas, 1974), linked to a field investigation of the feasibility of malaria elimination in the West African savannah (Molineaux and Gramiccia, 1980). The focus of malaria modelling up until then (reviewed in detail by Bailey (1982)) was on transmission dynamics, and in particular strategies for eliminating the parasite.
Models of malaria dynamics have continued to be developed (e.g. Aron, 1988 ;Struchiner, Halloran and Spielman, 1989 ;McKenzie and Bossert, 2005) but a very different kind of modelling became prominent in the last few decades of the 20th century when control programmes, especially in Africa, focused on reducing morbidity and mortality rather than suppressing transmission. Elimination had proven unrealistic in tropical Africa and had been given up in most tropical countries in Asia and the Americas following severe setbacks. New interventions considered suitable for long-term control were validated through large-scale field trials. This motivated the development of predictive models that use empirical estimates of the effectiveness of interventions, studies on disease burden, and unit costs to quantify morbidity and mortality and likely cost-effectiveness of interventions. Examples include the work of Goodman, Coleman and Mills (2000) and Rowe et al. (2007). In such models, epidemiological impact is generally inferred from field trial results, which assess only short-term effects using wellcontrolled delivery systems, and there is no explicit consideration of the dynamics of transmission and immunity, in particular the loss of population immunity that results from decreased transmission.
With a yearly toll of the order of one million lives, malaria remains the main cause of death in young children in Africa (Roll Back Malaria, 2005). The disease has therefore become one of the top priorities on the international health agenda. There are a number of efficacious interventions such as insecticide treated nets (ITNs), indoor residual spraying (IRS) and artemisinin-based combination therapy (ACT) but it is not obvious what will be their impact, taking into consideration effects on transmission and health system constraints, and so it is not obvious when and how best to deploy and combine them. The resource allocation issues are considerable, with current estimates of US$ 3 . 8 to 4 . 5 billion per year for the total costs of controlling malaria worldwide in accordance with internationally agreed targets (Kiszewski et al. 2007) and the actual annual international transfers for malaria control increasing year by year, reaching USD 600 million in (Roll Back Malaria, 2005. Ongoing field research as well as programme monitoring and evaluation is essential to decide on the best strategies in different contexts, but it is not realistic to field-test a large number of combinations of different interventions in different epidemiological and operational settings. The planning and prioritization of integrated control strategies thus need models that can quantify both short-and long-term effects on transmission, morbidity, mortality, immunity and resistance to medicines and insecticides, at the same time as incorporating field-based data about control, including health system information. Unfortunately there have been very few examples of disease modelling spanning transmission dynamics as well as economic factors, and these have been constrained by the application of limited sets of field data to linear models and limited combinations of interventions (Flessa, 2002). This paper is an overview of the project we are undertaking to develop such comprehensive simulation models of malaria epidemiology and control and to fit them to field data.
This project started when we undertook to develop a model to simulate the potential effects of the introduction of pre-erythrocytic malaria vaccines. We chose to address this problem using an individualbased stochastic model, or micro-simulation approach. Micro-simulation is an increasingly popular modelling technique (see for instance Stolk et al. in this special issue) because it offers almost total flexibility in the description both of inputs and outputs, making it relatively easy to extend the framework indefinitely to mimic very complex systems, so that both short-and longer-term dynamics can be considered at the same time. Moreover, stochastic approaches lend themselves to modelling rare events (such as deaths), and stochastic models can be readily extended to model heterogeneities as functions of distributions. Micro-simulations of infectious disease control can reproduce arbitrary aspects of interventions such as delivery to selected groups of individuals, and also look at the implications of monitoring selected population subgroups, such as specific age groups or only those with clinical symptoms. Despite this potential, micro-simulations of human parasitic diseases have so far had little impact on public health policy, with the notable exception of the ONCHOSIM model (Plaisier et al. 1990 ;Habbema et al. 1992), which has proved an invaluable decision support tool in the Onchocerciasis Control Programme in West Africa.
The results of our initial simulations of preerythrocytic malaria vaccines are already published (for an overview see Smith et al. (2006 a)). The approach lends itself to prediction of the potential impact of any malaria intervention strategy under consideration and we are now adapting it for forecasting the epidemiological and economic impact and cost-effectiveness of the whole range of malaria control interventions, singly or combined, applied across a broad range of epidemiological and health system scenarios in malaria endemic areas. In the next section of the paper we outline the major challenges that we face in meeting this ambitious goal. We then describe how we have addressed these challenges to develop a model that can be of real value for malaria control decision-makers and planners.
C H A L L E N G E S A general predictive model for malaria epidemiology and control needs a parsimonious structure that mimics malaria biology well enough to capture the mechanisms of action of all the different interventions, and at the same time needs parameter values that reproduce the observed quantitative relationships between epidemiological measures. The simultaneous achievement of both these objectives is a major challenge.
The biology and epidemiology of malaria comprises many different dynamic processes with (positive and negative) feedback loops. Furthermore, drug treatments, vector control, personal protection and vaccines may all have effects on outcomes that are more complex than the effects on primary infections in the non-immune host. Even if the effect of a vaccine is simply to reduce the force of infection, the short-term consequences in terms of morbidity and mortality are not simply proportional to the reduction in infection rate. There are also longer term dynamic effects due to changes in the immune status of the population, benefits due to herd immunity (in the case of vaccines), community effects of vector control, and long-term evolutionary consequences (see, for instance, Koella and Zaghloul (2008)). Predictions of long-term effectiveness of interventions in programmes also need to consider the health system realities that result in sub-optimal access, targeting accuracy, provider compliance, and consumer adherence. Longer term effects also include the ramifications with regard to health service demands.
The core of a model that can capture all of these is the specification of the properties of an infection. An individual malaria infection can last for many months, during which densities of both asexual parasites and gametocytes vary irregularly as consequences mainly of the developmental cycle of the parasite, of host immunity, and of antigenic variation (in P. falciparum). Disease and death are associated with high parasite densities, and the most obvious impact of acquired immunity is in reducing parasite densities. The irregularities in parasite densities lend themselves to treatment as statistical fluctuations. Since the course of each infection is different, and the average behaviour is of less importance than the extent of variation, convincing models for individual infections (e.g. those of Molineaux et al. (2001) and Gatton and Cheng (2004)) have adopted stochastic simulation approaches, rather than treating the development of an individual malaria infection as a deterministic process.
A population model for predicting the impact of interventions that modify parasite densities, such as asexual blood stage vaccination, or treatment with incompletely effective drugs, must explicitly capture relationships with parasite densities, and thus needs to nest within it such a stochastic model for individual infections. This constrains the whole modelling process into the individual-based stochastic framework. In addition to its advantages, the stochastic approach also leads to challenges : stochastic models are computationally demanding, and are more difficult to fit to data than deterministic equivalents. The fitting of a complex stochastic model to a wide range of different epidemiological outcomes pushes limits both in terms of algorithms of multi-objective optimization and computational resources. In particular, because death is a relatively infrequent event, many tens of thousands of lifehistories must be simulated in order to generate stable estimates of mortality rates that can be compared with field data.
Fitting of models to data aims to minimize the uncertainty in our predictions, but a further challenge is to identify and quantify the many uncertainties that will remain. For instance, it is very difficult to quantify natural variation in exposure to mosquito bites, because it is influenced profoundly by a number of factors including, for example, microenvironment, climate and human behaviour. Statistical imprecision can be reduced by acquiring more data, but doubts will remain in any attempt to predict the future. We do not know what new interventions may be discovered, and the future costs of malaria control commodities may develop almost unpredictably in reaction to demand, raw material prices, industrial innovation and other factors. Other uncertainties, especially those that relate to the human immune response to the parasite, arise because we have not been able to measure the relevant quantities, or do not know what they are. The lack of a proxy measure for protective immunity is particularly a problem for vaccine development but it also represents a challenge for the specification of models of disease dynamics. Finally, there are especially large uncertainties with regard to both the performance of health systems and the healthseeking behaviours of populations. This is partly because of the limited extent of comprehensive health systems research in malaria endemic areas. Validation of parameters for models of interventions and health systems is constrained to relatively few sites with thorough costing and effectiveness studies of interventions in real life health systems.
The first phase of the project (from 2003-2005) focused on developing a single model for malaria epidemiology that reproduces the relevant aspects of malaria biology. We used this to predict the likely impact of pre-erythrocytic vaccines, including projections of the consequences of introducing such vaccines into the Expanded Program on Immunization, for both malaria epidemiology  and vaccine cost-effectiveness . This required us to develop simulations of the malaria transmission cycle, of the course of individual infections, of the risk of morbidity and mortality as a function of transmission intensity, and of the impact of the health systems on malaria in endemic settings. The details of the models used, including their justification in terms of biology and epidemiology have been published as a supplement to the American Journal of Tropical Medicine and Hygiene, of which the first paper provides an overview, together with the mathematical description .
The simulated time-courses of parasite densities are based on a description of parasite densities in neurosyphilis patients who received therapeutic P. falciparum infections. By placing parasite densities as the centre of a causal web linking infection with its downstream effects (Fig. 1), we are able to reproduce key features of malaria biology that are difficult to capture with other approaches.
Notably, in our models as in the real world, the main effect of naturally acquired immunity to malaria is to reduce infection density, rather than persistence of the infection (Sama, Killeen and Smith, 2004 ;Sama et al. 2006). We model the probability that a mosquito is infected when feeding as a stochastic function of time-lagged asexual parasite densities in the human host. This is a considerable simplification but does capture the important density dependence demonstrated in experimental models (Sinden et al. 2007). High parasite densities are also a trigger of clinical malaria and are quantitatively related to severity of disease, and so in our simulations these events occur when parasite densities exceed given thresholds.
In the first phase we concentrated on simulating malaria epidemiology in sites where there are fieldbased measurements of the seasonal pattern in the entomological inoculation rate (EIR). We introduce infections into the simulated human population via a stochastic process dependent on the average EIR, allowing for annual periodicity (Smith et al. 2006 b). Because the presence of P. falciparum in the blood does not prevent a new infection (superinfection), the population model needs to simulate not just single infections but also recurrent and concurrent multiple infections. The subsequent parasite densities are then sampled using 5-day time steps, where the sampling distributions for the parasite densities are derived from the time-trends in malaria therapy data and from immunity parameters determined from each simulated individual's history of infection (Maire et al. 2006 b).
In the first phase of the project the requirement for field measurements of the EIR restricted us to simulating only areas of intense endemic transmission. We generated the simulated infections via Poisson processes (Smith et al. 2006 b), incorporating both seasonality and the age dependence that arises because larger people generally receive more bites by mosquitoes by virtue of their larger body surface (Port, Boreham and Bryan, 1980), but neglecting the clustering in infection that certainly occurs in malaria. Some people are more attractive to mosquitoes than others (Knols, de Jong and Takken, 1995), and some live in micro-environments where they are more exposed.
In high transmission scenarios it is reasonable to treat transmission as homogeneous, and generate mortality rates by simulating very large closed populations. This is equivalent to averaging many smaller, village size, simulations because the population average level of transmission is not much affected by stochastic variation. In low transmission scenarios it is important to consider both individual differences in risk of infection and to simulate receptivity (the risk of epidemics from imported infections) in realistically-sized human populations. Stochastic models are the natural way to achieve this, and we are currently exploring different distributional assumptions for transmission heterogeneity (Smith, 2008). We expect to confirm the effects of heterogeneity on the basic reproduction number, R 0 ), but it is unclear how important are the secondary consequences of individual differences in risk, such as the stronger development of natural immunity for high-risk persons.

M O D E L S O F I N T E R V E N T I O N S T R A T E G I E S
We plan to define and simulate a minimal essential set of interventions, intervention combinations and scale-up strategies for malaria control. Table 1 outlines an example of some among the many possible strategies, each of which will be specified in detail, and for each of which a model for action of the interventions must be incorporated into the simulations. Integrated strategies may simultaneously include many different interventions, the appropriateness of which will depend on the epidemiological setting and health system.
For case management of malaria we include the effects on morbidity, mortality and transmission. Models of health system effectiveness in intervention delivery are an important part of the simulation of any preventive and curative intervention, since the effectiveness can be affected by constraints both on the provider and the consumer side. Improvements in the health system, such as rolling-out of rapid diagnostic tests represent an important set of interventions that we aim to simulate. We also consider the reduced burden on the health system resulting from preventive interventions and case management. Quantitative modelling of the feedbacks in health systems is in its infancy, and so far we have completed only a basic model based on data from Tanzania (Tediosi et al. 2006 b). We are currently revising this to incorporate improved simulations of diagnosis, drug action, referral, the contribution of the informal sector, and of the implications of intermittent preventive treatment as well as transmission reducing interventions.
Measurements of immune responses to malaria have generally not demonstrated strong correlations with protection (pre-clinical studies of mosquito stage transmission blocking vaccines are exceptions (Saul, 2008)). For this reason we have formulated models of vaccine action in terms of the effects on clinical and parasitological outcomes, rather than modelling immune effectors. Our analyses of the results of field trials of a pre-erythrocytic vaccine were compatible with a very simple such model, namely, that such vaccines block a percentage of sporozoite inoculations from reaching the blood. This percentage may vary between human hosts, but it seems no-one is completely protected from infection by such a vaccine .
We can model mosquito-stage transmission blocking vaccines similarly by assuming a defined proportionate reduction in the probability that a mosquito becomes infected from any one feed. We have also modelled blood stage vaccines by assuming the immediate effect to be reduction in parasite densities. This assumption may be an over-simplification as vaccine-induced reduction in asexual blood stage challenge very likely has complex effects on parasite dynamics. We are developing improved within-host models based on stochastic simulations to include interaction between asexual parasites and host immune responses. We anticipate that these should provide us with models for asexual blood stage vaccination and for the action of antimalarial drugs that capture these dynamics. & This column indicates whether this objective is included in the simultaneous multi-objective fitting of the model, or whether the data are used to separately parameterise model components. $ Some scenarios are used to predict several outcomes, so the total of this column does not equal the total of 61 scenarios involved in fitting the models. * The number of data points is the sum over all scenarios and simulated survey periods of the number of age groups into which the data were disaggregated for comparison with the model predictions. # In relation to the seasonal pattern in the EIR. £ Model predictions for this objective are compared with interpolations between the field data points. n.a. (not applicable) indicates that this objective is not included in the multi-objective optimisation.
We base our models for the effects of personal protection and vector control on discrete-time models of the mosquito feeding cycle (Saul, Graves and Kay, 1990 ;Killeen and Smith, 2007). We have generalized these models to incorporate the heterogeneity of human and non-human animal hosts, and explicitly model the different stages of the mosquito feeding cycle that can be targeted by different interventions, allowing the duration of the feeding cycle to vary (Chitnis, Steketee and Smith, 2007). The model is being parameterized to simulate the effects of Indoor Residual Spraying (IRS) and for simulating effects of Insecticide Treated Nets (ITNs) at various performance levels of the health system. Combinations of both interventions can also be simulated. For the latter, we are considering the dynamics of mosquito net coverage, to include the effects of deterioration of nets and loss of insecticidal effects. These models will be validated by simulating field trials of ITNs. Once these models are in the overall stochastic simulation platform, we will be able not only to simulate plausible vector control strategies, but also integrated control involving multiple interventions.

S O F T W A R E A N D F I T T I N G T O D A T A
Our stochastic simulation models are highly computation-intensive, and are implemented in the general purpose programming languages Fortran 95 and C++ for performance and flexibility. The specifications of the scenario to be simulated are provided in an input file written in Extensible Markup Language (XML), and output is returned in the form of text files containing predictions of age-and timespecific epidemiological quantities (infection prevalence and density, multiplicity of infection, incidence and severity of morbidity, mortality, and levels of intervention coverage), allowing for effects of misdiagnosis where necessary.
In our initial overview publication we listed a total of 38 parameters in the overall model of malaria epidemiology. Considering that this comprises simulations of the whole transmission cycle, relevant aspects of immunity, and of a wide range of morbidity and mortality outcomes, this is not excessive, but optimisation within a space of this dimension is challenging.
Some of the current components of this model are estimated independently of the others. Six of the parameters are part of a statistical model for how anaemia depends on infection status (Carneiro et al. 2006). This anaemia component does not feed back into the rest of the model. Nor do values of five parameters that quantify infectivity of human hosts to mosquitoes affect the estimates of the rest of the parameters and these were estimated in stand-alone exercises Killeen, Ross and Smith, 2006). Two other components of the models are estimated directly from field data, namely, the time-course of parasite densities in individual infections in a previously untreated host, which is based on a statistical description of malaria therapy profiles (Maire et al. 2006 b) ; and the relationship between the force of infection (parasitological inoculation rate) and the EIR, which is estimated from the data of a field study that specifically addressed this question (Beier et al. 1994). Malaria attributable neonatal mortality rates are taken from systematic reviews .
The other components of the model, corresponding to 25 different unknown quantities (parameters), are fitted by treating the stochastic simulation as an implicit statistical model (a list of these parameters is given by Smith et al. (2006 a). A total of 61 standard scenarios have been constructed, corresponding to sites where both the pattern of transmission and one or more epidemiological variables have been measured (Table 2). These epidemiological variables define a set of 10 different objective functions (likelihoods or sums of squares) corresponding to different epidemiological outcome measures that must be optimised. In the first phase of the project we were able to fit our model to all of these objectives using a simulated annealing algorithm. We used a desktopgrid approach to distribute the simulations across our local computer network, using up to 40 processors at any one time. However this was only possible by fitting the different objectives and corresponding model components sequentially, each conditional on the upstream objectives. Even then, many of the optimisation runs required several weeks of computing. The computational demands were particularly high for the large number of scenarios used for predicting rates of severe disease and mortality. Since these are relatively rare events, many simulated individuals were needed to provide stable predictions of rates that can be compared with field data.
In the current phase of the project the computational demands are even higher because we are developing and comparing a number of alternative model formulations. These involve explicit simulations of longitudinal patterns of parasitaemia within hosts ; effects of heterogeneity in transmission (to simulate areas of focal transmission) ; decay in acquired immunity against malaria; parasite drug resistance ; and different models for preventive and curative interventions, singly and combined.
To optimise all these models we are harnessing the spare capacity of computers made available across the internet by volunteers via the Berkeley Open Infrastructure for Network Computing (BOINC) (http://boinc.berkeley.edu/). The BOINC middleware manages the allocation of tasks to volunteer computers (scheduling), information exchange about the project, and the corresponding security issues both for our servers and for the volunteers. Using this infrastructure we can fit many models in parallel using genetic algorithms (Holland, 1975). Each iteration of the fitting process involves sending each of the 61 standard scenarios to the volunteers, combining their results to give measures of model fit (loss function) for each of the 10 objectives, and computing an overall loss function, thus making possible simultaneous multi-objective optimisation. The project server generates new parameterisations and tracks convergence. Fig. 2 illustrates how the overall loss function computed from all 61 scenarios improved during a typical implementation of this iterative process. This run illustrates slow convergence when many similar parameterisations are run in parallel using the current genetic algorithm.
Within this framework, many different optimizations can be carried out in parallel, with the main limitations being the server capacity, the minimum turn-around time on the work units sent out to the volunteers, and the capacity of the research team to process the results. These in turn mean that the time needed to perform one optimization remains of the order of several weeks or months with the current algorithm.

O U T L O O K
The development of a comprehensive simulation model for malaria epidemiology and control requires a substantial trans-disciplinary effort, and extraordinary computing power. To this end, the Swiss Tropical Institute has assembled a research team that draws on skills from epidemiology, clinical medicine,  immunology, entomology, laboratory sciences, health systems, mathematics, statistics, economics, and computer sciences. The project is guided by an international technical advisory group of malaria and modelling experts. Finally, development of a valid and useful comprehensive model such as this requires access to numerous large and diverse field data sets informing the many dimensions required to validate the model. For this we are fortunate to have the collaboration of many partners and field projects from areas with endemic malaria and intend to broaden further these links to obtain the strongest possible validation. It seems unlikely that any of the intervention strategies we are simulating will prove to be a magic bullet that will solve the problem of malaria for good. On the other hand, the application of all existing control interventions in all endemic areas is likely to be highly cost-ineffective. Practical malaria control experience indicates that the application of a well selected limited menu of interventions in a given area can capitalise on the advantages of each one. Modelling integrated control strategies will thus be of great utility for informing malaria control programmes about which strategies are most likely to be cost-effective under specific conditions.
In the light of the many uncertainties inherent in such an exercise, it is crucial to supplement the model predictions with sensitivity analyses, development of new models, and validation of model outputs against accumulated field data. The availability of the www. malariacontrol.net volunteer computing resources enables us not only to fit many models, but also to carry out the extensive analysis of model predictions, and in the same way as climate modellers now make predictions from ensembles of different models, we will be most confident about the predictions of disease models if we see that they are consistent across a range of different sets of assumptions. To judge how robust are the simulation results we can also perform probabilistic sensitivity analysis (PSA) to identify the sensitivity of model predictions to the different parameters, thus helping to identify which model assumptions are most important in determining the predictions, and where further data may be needed (Briggs, 2000). By including costing parameters in the PSA we can also construct acceptability curves for specific intervention strategies.
The final set of model results will be made publicly accessible. We are engaged in a consultation process with various users including health planners ; control programme managers ; international policy makers ; researchers, and members of industry, in order to determine what kinds of user-interfaces will be required. These will allow the users to query specific eco-epidemiological and health system scenarios, and hence enable the models to take their place as important tools for rational planning of malaria control.
A C K N O W L E D G E M E N T S This project depends on the efforts of many researchers and field staff engaged in malariological fieldwork in endemic areas, and many thousands of volunteers who make their computers available to malariacontrol.net. The first phase of the study was funded by the PATH Malaria Vaccine Initiative and GlaxoSmithKline Biologicals S.A. The second phase is currently supported by the Bill & Melinda Gates Foundation. This publication and the contents hereof do not necessarily reflect the endorsement, opinion, or viewpoints of the PATH Malaria Vaccine Initiative, GlaxoSmithKline Biologicals S. A., or the Bill & Melinda Gates Foundation.