Models with higher effective dimensions tend to produce more uncertain estimates

Mathematical models are getting increasingly detailed to better predict phenomena or gain more accurate insights into the dynamics of a system of interest, even when there are no validation or training data available. Here, we show through ANOVA and statistical theory that this practice promotes fuzzier estimates because it generally increases the model’s effective dimensions, i.e., the number of influential parameters and the weight of high-order interactions. By tracking the evolution of the effective dimensions and the output uncertainty at each model upgrade stage, modelers can better ponder whether the addition of detail truly matches the model’s purpose and the quality of the data fed into it.

: Dynamics of the PSACOIN Level 0 model [36]. The time t is in years.  Figure S4: Dynamics of the SIR(S), the SIR(S) with vaccination and the extended SIR(S) proposed by Saad-Roy et al. [45,46]. The time t is in weeks and covers 5 years (260 weeks). See section 2.3 for a description of the models.

The PSACOIN model
It describes the ideal behavior of a set of selected radionuclides buried deep into an underground disposal for nuclear waste, packed into sealed canisters and surrounded by a buffer material conceived to delay their transit time in case of canister corrosion. The nuclides are separated by the biosphere by several hundred metres of a geological formation. The simulations normally span ten million years, a time supposed to be characteristic of the nuclide transit time through the various media (barriers). The case is part of a series of benchmarks runs by the Nuclear Energy Agency of the OECD, aimed to test the agreement among several computer codes involved in the analysis of the safety of nuclear waste disposal. The description that follows is a summary of the first and simplest case, PSACOIN Level 0 [37].

First barrier: waste form
The leach rate R wf (t) ( Kg m 2 a ) is given by where wf stands for waste form, R 0 is a time invariant leach rate, t is time, τ D is a characteristic leach time and H is the Heaviside step function The leach rate τ D is given by where Q is the initial amount of waste and S its surface area, both constants. The release rate of nuclide i, F w i (t) is derived as where I i (t) is the inventory of nuclide i in mol per kilogram of waste form given by where I 0 i is the initial inventory or nuclide i and λ i its decay constant in a −1 .

Second barrier: buffer form
A buffer of thickness X B around the waste form constitutes the second barrier to the migration of radionuclides. The flow of nuclides out of the buffer, in mol a , is given by: Since the buffer is supposed to be a purely diffusive barrier, the value of τ B i is given by (S7)

Third barrier: the geosphere
The nuclear waste is separated from the biosphere by a geological formation (geosphere) of thickness X G , which both delays and spread the nuclides. The migration into the geosphere is driven by advection (transport by water flow) and dispersion. The flow in mol a is where where and (S12)

Fourth barrier: biosphere
The model for the biosphere considers that the flow F G i coming from the geosphere is entirely intercepted by an abstraction well used for drinking water. The concentration C i of a given nuclide in the water is given in Bq m 3 , where Bq stands for becquerel, the SI unit for radiation: where A i is the molar specific activity of nuclide i in Bq mol , and W the abstraction rate in m 3 a . Thus the resulting dose to humans is simply expressed in Sv a , where Sv stands for sievert, the SI unit of dose equivalent describing the biological effect of ionizing radiation. The other terms in the equation are W m , the water consumption rate by a human drinking the water of the well, in m 3 a , and D i a dose factor converting the ingested becquerels into sieverts.

Uncertain parameters
Tables S1-S2 present the probability distributions used to characterize the uncertain parameters and the constant values of PSACOIN Level 0 respectively.

The irrigation water witdrawal model
Many Global Hydrological models compute irrigation water withdrawals with variations of the following equation: where y is a scalar representing irrigation water withdrawals [m 3 ], I a is the extension of irrigation [m 2 ], ET c is the crop evapotranspiration [m], P is the precipitation [m] and E p is the irrigation efficiency [-].
is the reference crop evapotranspiration (usually grass or alfalfa) and k c [-] is a coefficient that accounts for the differences between ET 0 and the crop under study (wheat in our case). In the paper we consider two different equations for ET 0 , the Priestley-Taylor and the FAO-56 Penman-Monteith [42]. The former reads as

Input Unit Description Distribution
whereas the latter reads as where A is the net radiation minus the soil heat flux (MJ m −2 d −1 ), ∆ the gradient of saturated vapour pressure (kPa ºC −1 ), γ the psychometric constant (kPa ºC −1 ), α the Priestley-Taylor constant, T a the mean daily air temperature at 2m (ºC), w the average daily wind speed at 2m (m s −1 ) and v the vapor pressure deficit (kPa). See Allen [70] for an explanation of the constants.

Uncertain parameters
See Puy et al. [41,71] for an explanation of the uncertainties involved in the calculation of irrigation water withdrawals, including the selection of the probability distributions used in this paper (Table S3).

The epidemiological models
We use the Susceptible-Infected-Recovered [SIR(S)] models by Saad-Roy et al. [45] (Equations S18-S19) and by Saad-Roy et al. [46] (Equation S20). See Tables S4-S5 for a description of the parameters and coefficients, and Saad-Roy et al. [45,46] for a full explanation of the models' dynamics. The most simple SIR(S) reads as where S P denotes fully susceptible individuals, I P denotes individuals with primary infection that transmit at rate β, R denotes fully immune individuals (as a result of recovery), S S denotes individuals whose immunity has waned at rate δ and are again susceptible to infection, and I S denotes individuals with secondary infection. The SIR(S) with a vaccination term reads as where V denotes vaccinated individuals. The SIR(S) extended with different vaccination strategies reads as where V i denotes individuals vaccinated with i doses, S Si denotes individuals whose complete i-dose immunity has waned and are susceptible again, I Si denotes individuals who were in S Si and have now been infected again, and I V denotes individuals for whom the vaccine did not prevent infection.

Uncertain parameters
Tables S4-S5 respectively present the probability distributions used to characterize the uncertain parameters and the constant values of the epidemiological models. Table S4: Probability distributions used to describe the uncertainty in the parameters of the epidemiological models, selected from Saad-Roy et al. [45,46].

Input Description Distribution
Reduction in susceptibility to secondary infections relative to primary ones U(0.4, 1) α, α 1 , α 2 , α V Reduction in the infectiousness of secondary infections relative to primary ones U(0.8, 1) ν Fraction of the fully and partially susceptible populations vaccinated each week U(0.001, 0.009) t vax Time at which vaccination is introduced U(48, 78)