Hierarchical Bayesian Learning Framework for Multi-Level Modeling using Multi-level Data

A hierarchical Bayesian learning framework is proposed to account for multi-level modeling in structural dynamics. In multi-level modeling the system is considered as a hierarchy of lower-level models, starting at the lowest material level, progressing to the component level, then the subsystem level, before ending up to the system level. Bayesian modeling and uncertainty quantification techniques based on measurements that rely on data collected only at the system level cover a quite limited number of component/subsystem operating conditions that are far from representing the full spectrum of system operating conditions. In addition, the large number of models and parameters involved from the lower to higher modeling levels of the system, constitutes this approach inappropriate for simultaneously and reliably quantifying the uncertainties at the different modeling levels. In this work, comprehensive hierarchical Bayesian learning tools are proposed to account for uncertainties through the multi-level modeling process. The uncertainty is embedded within the structural model parameters by introducing a probability model for these parameters that depend on hyperparameters. An important issue that has to be accounted for is that parameters of models at lower levels are shared at the subsystem and system levels. This necessitates a parameter inference process that takes into account data from different modeling levels. Accurate and insightful asymptotic approximations are developed, substantially reducing the computational effort required in the parameter uncertainty quantification procedure. The uncertainties inferred based on datasets available from the different levels of model hierarchy are propagated through the different levels of the system to predict uncertainties and confidence levels of output quantities of interest. A simple dynamical system consisting of components and subsystems is employed to demonstrate the effectiveness of the proposed method.


Introduction
Computational finite element (FE) models are extensively employed to represent physical structures and perform analyses in structural dynamics [1,2]. FE model updating has been an essential step to assess and improve the accuracy of the computational models [3,4]. In the process of modeling a structure, uncertainties arise from variabilities in experimental tests, environmental and operational conditions [5], manufacturing variabilities [6], material properties variabilities, as well as model error [5,7]. However, physical structures consist of multiple components, similar or dissimilar, that are assembled to subsystems that comprise the overall system [8]. Such structures often exhibit complexities resulting in the uncertainty due to variability arising from the assembling process, manufacturing process as well as nonlinearities manifested at various modeling levels during operation under harsh environments [9,10].
Several contributions have already been made for capturing such uncertainties and improving the accuracy of the computational models in multi-level structures. Statistical model calibration technique for a hierarchy system was proposed to improve the predictive capability of the computational models from the lowest level to the highest level [11]. Model parameters at a lower level was calibrated using a conventional Bayesian modeling approach from test data at the same level. Data from higher levels of model hierarchy had no effect on learning parameters involved at lower modeling level. An alternative Bayesian learning procedure was also used in [12], where data at the lowest level were used to construct prior probability distribution of the model parameters and then data from the next level of model hierarchy were used to update the model parameters. A hierarchical model updating strategy using uncorrelated modes was also developed for complex assembled structure to ensure a high accuracy of the updated 2 FE model [13]. Uncertainties inferred at different levels were propagated to uncertainties to quantify the confidence in predictions at different levels [11] and at the system level [14]. Design of validation experiments for multi-level models was also proposed in [8]. The aforementioned developments in hierarchical modeling based on Bayesian calibration methods do not use experimental data from different levels of model hierarchy to simultaneously inform the parameters of models at all levels of hierarchy and infer the variability of the parameter estimates even at the lowest level of model hierarchy. This is proposed to be addressed in this work using a hierarchical Bayesian modeling framework.
The hierarchical Bayesian modeling (HBM) framework was recently proposed in various engineering fields to capture the uncertain variability [15][16][17][18]. Full sampling-based HBM framework is commonly used to calibrate the model parameters, quantify the uncertainties and predict their uncertainties to the system quantities of interest (QoI) using data obtained from the system level [19][20][21][22][23]. The up-to-date full-sampling HBM approach is also applied to a two-level masonry structural [24] as well as a red blood cell multi-level model to inform the parameters of models at all levels of model hierarchy [25]. Undoubtedly, the above-mentioned approaches show a good performance on quantifying the uncertainties or improving the models for a multi-level structure. However, there is still room for improvement about the process of calibrating the models in a multi-level structure. For example, model error is neglected in a multi-level structure leading to the missing information of the discrepancy between the responses of physical structures and the outputs of computational models [11]. Additionally, the sampling procedure utilized during the process of updating the models makes the HBM approach inefficient and often provides no insight into the estimated uncertainties. For improving the efficiency and obtaining valuable insights of HBM, approaches based on asymptotic approximations have been developed to handle the uncertainty quantification and propagation in structural dynamics problems [26][27][28][29]. The existing HBM framework is conducted using information collected from the system level. However, the framework has great potential to be implemented to a multi-level modeling of the system using multi-level data for the purpose of model validation, uncertainty quantification and propagation.
To this end, this paper extends the recently proposed HBM framework [29] to a hierarchy structure, aiming to take into account the diverse sources of the uncertainty through the hierarchy of models of the systems and propagate it through the computation models for improving the confidence in the response predictions. It is noted that, given the hierarchy of models for a structure, the information from a lower level of the structure is also inferred from the data at the higher levels. This guides us to build reliable models at all system levels that account in the modeling and parameter estimation process for the uncertain variability arising from component manufacturing and assembling process. This is achieved by assigning a probabilistic distribution for the model parameters, where the hyper parameters are inferred from datasets available at lower and higher levels of modeling. The estimation process of the hyper parameters is conducted based on the available datasets, and the response QoI are predicted according to the uncertainties obtained from the parameter estimation process. Asymptotic approximations are introduced to the framework, providing insightful expressions and efficiently improving the computational cost of the proposed framework.
This work is structured as follows. Section 2 describes the proposed hierarchical Bayesian modeling framework for multi-level models, including the formulations for quantifying the uncertainties and propagating the uncertainties through the computational models manifested in the system hierarchy. Section 3 demonstrates the effectiveness and accuracy of the proposed method using a carefully designed numerical case study. Section 4 reports the conclusions of this study.

Hierarchical Bayesian Modeling Framework for Multi-level Models
The HBM framework developed in [29] is used as a tool to integrate the multiple experimental datasets and the hierarchy of models within a dynamical structure. The major difference between this work and the work in [29] is that, instead of using the available information from a full-scale system or a single component, the proposed strategy aims to take into account the diverse sources of the uncertainty through the multi-level structure to estimate model parameters and their uncertainties as well as propagate uncertainties through the computational models for the response predictions. The details for the multi-level models along with their uncertainties are introduced below.
, are considered to predict the output quantities of interest (QoI) at the system level. Note that each individual dataset can consist of either response time histories for linear and nonlinear models of components and/or subsystems, or modal properties identified from the response time histories for linear models of components or subsystems. For the ease of the further use, both types of datasets are denoted by the notation y . As an example, the i-th dataset in the component C1 is represented as (1) (1) = ii D y . Similar notations (2) j y and (3) k y are used for the datasets of component C2 and subsystem SS1, respectively.

Bayesian learning of model parameters and hyperparameters
The objective in this work is to fuse all available information in the multi-level modeling approach in order to quantify different sources of uncertainties and to propagate such uncertainties to predict output QoI.

Prior hypothesis for model parameters
To take into account in the modeling and parameter estimation process the uncertainties of model parameters due to experimental, environmental, operational and manufacturing variabilities, these uncertainties are embedded within the model by introducing a hierarchy in the model parameters. This is accomplished by postulating the 1 1, 2, , iN  and (3) k D , 3 1, 2, , kN  available at the component C1 and the subsystem SS1 levels. Similar definition is described by hyperparameter 2 ψ , denoted with blue in Fig. 1

Parameter estimation
Assume that the model outputs from either the components or subsystem levels are linked to their computational models corresponding to a particular value of the model parameters. The discrepancy between the model outputs and the measured data can be modeled based on the prediction error equation, given as follows: Substituting (8) and (6) into (4) one derives the posterior distribution of all parameters which can be populated with samples using available sampling algorithms such as Gibbs sampling [15] or transitional Markov Chain Monte Carlo (TMCMC) [23,31,32]. However, the large number of parameters that may be involved in structural model parameter set  results in computationally very expensive sampling procedures.

Posterior distribution of hyperparameters using Asymptotic Approximations
Two asymptotic approximations for the likelihood functions, developed in [29] and asymptotically valid for large number of data within a dataset, are employed herein to simplify the analysis and derive analytical expressions for the marginal posterior distribution of the hyperparameters ψ . These approximations leverage the use of Taylor expansion, and approximate each likelihood function involved in Eq. (7) to a simplified form of a Gaussian distribution by expanding the negative logarithm of the likelihood function for each dataset with respect to either the model and prediction error variance parameters (approximation 1, A-1 for short) or only the model parameters (approximation 2, A-2 for short) about their maximum likelihood estimate (MLE). Details are presented in [29]. By utilizing approximation A-1, Eq. (7) can be rewritten as the product of several Gaussian distributions as follows: where the MLEs (  and they are referred to as the identification uncertainty. Likewise, by utilizing approximation A-2 in [29], Eq. (7) takes the form:    in Eq. (10) for modal properties data is defined in [29], while for time histories data, it is defined as:   (14) Note that the model parameter θ in Eq. (14) consists of the model parameters for each dataset, and in effect it can be calibrated alongside the hyper parameter ψ and prediction error variance parameter 2 σ using any Markov Chain Monte Carlo (MCMC) algorithm. However, the large number of parameters involved in Eq. (14) poses difficulties for the calculation process of sampling methods. Also, for the purpose of getting analytical expressions of the posterior distributions of hyper parameters, a marginalization procedure is preferred. To this end, the marginal distribution of the hyper and prediction error variance parameters is first calculated. It is achieved by integrating the posterior distribution in Eq. (14) over the model parameter space  , resulting in A large number of multidimensional integrals over the model parameters for each dataset is involved. Using the fact that the integrals of the product of two Gaussian distributions in Eq. (15) result in a Gaussian distribution [33], the marginal distribution 2 ( , | ) p ψ σ D can be readily derived in the analytical form: According to Eq. (16) or (17), the samples of the hyper and prediction error variance parameters can be generated though any Markov Chain Monte Carlo (MCMC) algorithm such as the transitional MCMC (TMCMC) [31,32] or the nested sampling algorithm [34]. It should be pointed out that the number of parameters involved in the marginal posterior distributions (16) or (17) is substantially less than the number of parameters involved in the original posterior distribution (6), which makes the sampling approach for the distributions (16) or (17) computationally very efficient.

Most probable values (MPV) of hyperparameters
The MPVs of the hyper parameters and prediction error variance parameters can be estimated by minimizing the negative logarithm of the joint distribution 2 ( , | ) p ψ σ D in Eqs. (16) and (17). The analytical solutions are derived by selecting the identification uncertainty to be the same for each dataset and equal to the average of the identification uncertainty over all datasets. Details are given in Appendix A. For approximation A-1, the results are given as: For approximation A-2, the MPVs of all the parameters are given as follows: It is reminded that the identification uncertainty reduces as the number of data in a data set increases [29]. This is evident also in (13), where the parameter identification uncertainty for each data set is inversely proportional to the number td NN of data points in each dataset. Therefore, given a sufficient number of data points, the identification uncertainties become negligible and thus the MPVs of the hyper covariance equal to the uncertainty due to variability arising from the multiple datasets. This is consistent with a frequentist point of view where the hyper covariance is defined as a variation of the parameters computed from one dataset to another dataset.

Posterior distribution of model parameters
Once the samples of the hyper parameters and the prediction error variance parameters are available, the posterior predictive distribution of new model parameters can be calculated through the following procedure: where new θ defines the new model parameters whose samples will be used to predict the output QoI. The multidimensional integral in Eq. (28) can be estimated based on the samples obtained from Eq. (16) or Eq. (17), given as: where each sample    Fig. 2 shows the framework for propagating the uncertainties to the response predictions. It is noted that both the uncertainties from the model parameters and the prediction error parameters are considered for the observed output QoI, while for the unobserved output QoI only the uncertainty from the model parameters can be considered because of the absence of information for the prediction error term. The probability distribution of the unobserved QoI Unob y can be estimated by evaluating the conditional distribution for a new output of the model given the observations, as follows:

Response predictions
where   where the number of the samples o N can be either the same or different with the number of samples u N .

Fig. 2 Uncertainty propagation for response predictions
Some points for the prediction in a multi-level modeling approach using multi-level datasets are outlined as follows:  For the predictions at component levels C1 and C2, only the uncertainty of the model parameters   1 θ or   2 θ is considered for the unobserved QoI in component C1 or component C2. However, due to the fact that both parameters are shared in the subsystem level SSl, they are also inferred from the datasets available in the sub-system level.  For the predictions at subsystem level, all three sets of model parameters are involved for predicting the observed and unobserved QoI, with the parameters informed from the datasets available at the components C1, C2 and subsystem SS1 levels.  For the system level, since there is no dataset available, the prediction error variance parameter may not be calibrated, and all the QoI in this level will only take into account the uncertainty from the model parameters.
The uncertainties in the parameters are informed from the dataset available at component C1, C2 and subsystem SS1 levels.

Algorithms
Algorithm 1 shows the steps for estimating the hyper and prediction error parameters using the proposed hierarchical Bayesian modeling framework, and Algorithm 2 summarizes the process of propagating uncertainties for predicting response QoI. In Algorithm 1, it is reminded that a full sampling (FS) procedure [23,31,32] could be used as an alternative to obtain the samples of the hyper and prediction error parameters. This can be accomplished by drawing the samples of structural model and prediction error parameters from the likelihood function for each dataset available at components C1, C2 and subsystem SS1, and the available samples are subsequently utilized for computing the posterior distributions of the hyper parameters and prediction error parameters. 1 1, 2, , iN  , 2 1, 2, , jN  , 3 1, 2, , kN  For A-2: (1.1) Minimize the negative logarithm of the likelihood functions given in Eq. (8) to with respect to model parameters to compute the MLEs

Numerical Example
This section illustrates the proposed method using a numerical case study. A six degree of freedom (DOF) springmass chain model of a mechanical system is presented in Section 3.1, where the model with its decompositions in subsystems and components and the available observations are described in detail. The parameter estimation process is conducted based on the available datasets in the different levels of model hierarchy, and the response QoI are predicted according to the uncertainties obtained from the parameter estimation process. Results are discussed in detail in Section 3.2.

Problem description
As shown in Fig. 3, the six DOFs model is represented as the system in this study. The system is decomposed into one subsystem and three components. Component C1, denoted with green, consists of the first two links of the model, component C2, denoted with blue, contains the third and the fourth links, and component C3, denoted by orange, contains the fifth link. The sub-system SS1 in Fig. 3 consists of the components C1, C2 and C3. The system S is assembled from the subsystem SS1 and a second subsystem that consist of the sixth link with known mechanical properties. Parameterized computational models for the system, subsystem and components are introduced to make simulation-based predictions of structural responses. The information for the computational model and the parameterization is given in Table 1. As shown, the model is fully parameterized using three stiffness-related model parameters . Specifically, the first parameter (1)  is linked to the stiffness of the first two springs of the component C1. The parameter (1)  is shared between component C1 and subsystem SS1. The second parameter (2)  is linked to the stiffness of the third and the fourth spring, shared between component C2 and subsystem SS1. The last parameter (3)  corresponds only to the stiffness of the fifth spring, shared between the component C3 and subsystem SS1. Each parameter is multiplied by the corresponding nominal stiffness values reported in Table 1, representing the model stiffness which is used for the process of updating the models.   Fig. 3(bd). Different types of tests are carried out for components C1, C2 and subsystem SS1, reporting different sources of the uncertainty due to variability. In lieu of experimental data, datasets are simulated as follows.
For component C1 a type A testing is performed based on the testing configuration in Fig. 3c. Responses are simulated under the condition where model error is present. For this, the first mass is perturbed by 5% from the nominal mass value reported in Table 1 and the resulting perturbed model of component C1 is used to produce the responses that will be used as datasets. 1 =10 N datasets of acceleration time histories are simulated subjected to 10 independent base excitations. The excitation herein is assumed as a Gaussian sequence with mean zero and standard deviation equal to one. Sensors for recording the data are assumed to locate at both the first and second link, and therefore 2 sets of acceleration time histories exist in a dataset. The sampling rate for each set is taken as 0.01s corresponding to a sampling frequency 100Hz for total of 5.2 seconds. The test-to-test variability in component C1 is attributed to the presence of model error. Different loading conditions subjected to the same model result in different responses, and will further lead to different identification values of the model parameters from dataset to dataset.
For component C2 a Type A testing is also performed. Data sets consisted of modal properties are generated under the assumption of zero model error. 2 =20 N datasets in total are simulated based on a Gaussian distribution for (2)  with mean 1 and standard deviation 0.05. Specifically, to generate a dataset j D , 2 1, , jN  , a realization (2) j  is drawn from the distribution  (2)  from dataset to dataset are statistically independent, different datasets of the modal properties are thus statistically independent. For component C2, the test-to-test variability can be due to the alterations of environmental conditions and material properties. For example, temperature or humidity changes make the material properties different, and it can further affect the modal features. Such changes are represented as the variations of the model parameters, following a Gaussian distribution with a significant variance. For subsystem SS1 level a type B testing is performed. The subsystem is assembled from the fixed component C1, the fixed component C2, and a component C3 that is a member of a population of identically manufactured components (links). The subsystems generated through this process form a population of identically assembled subsystems, although variability exists due to manufacturing errors caused by the manufacturing variabilities arising from the component C3. Since the subsystem SS1 contains the component C1, it is apparent that the model error also exists in any one of the assembled subsystems. 3 =20 N subsystems are assembled through this process and then a single test is performed for representative members in the population of subsystems SS1. Acceleration datasets are collected from the test carried out for each subsystem. Datasets k D , 3 1, , kN  , 3 =20 N , are generated based on samples of the model parameters (2) (3) [ , ] T  generated from a Gaussian distribution with mean [1 1] T and a diagonal covariance matrix with the same diagonal elements 0.05 2 . For each acceleration dataset, 3 sensors are placed at the first, third and fifth links, and thus 3 sets of the time histories data are included in a dataset. Each set consists of =520 t N data points in the increment of 0.01s, the same with the one in component C1. The process to assemble the system is to connect the subsystem SS1 (with the component C3 chosen from the population of identically manufactured components C2) with a fixed link (the 6 th link in Figure 3) representing the second subsystem to form the final system S. The uncertainty at the system level arises due to the variability from the manufacturing process of components C3 used to assemble the subsystem SS1, due to the variability from the different environmental conditions under which the component C2 was tested, and due to the model error present in component C1. For system level, no dataset is available.
Results and discussions for identifying and propagating uncertainties are presented in the next subsection. Although the components, subsystems and systems are simple spring mass chain models, the model is representative of more involved models of industrial components that are assembled to form the subsystems and eventually the systems. The differences are in the complexity of the models for the components, the number of parameters per component and the higher computational cost that would be required to perform computations. A simple model was purposely chosen to represent the features for the proposed method and provide valuable insight into uncertainty quantification and propagation process.

Parameter estimation
The foregoing asymptotic approximation A-1 for the likelihood function in Eq. (9) is utilized to compute the MLE and the identification uncertainty in each dataset. It is noted that the first two parameters (1)  and (2)  are included in both the components and subsystem levels. The MLEs of these two parameters (1)  and (2)  along with their identification uncertainties (1)   and ( 2)   can be calculated according to each dataset from either the components or subsystem levels. Results are shown in Fig. 4. The blue dots and error bars indicate the MLEs and the identification uncertainties from the components level, while the red ones display the results from the subsystem SS1 level. It is seen that the MLEs of the first two parameters for a specific dataset computed from the components level differ from those from the subsystem SS1 level. Also, the MLEs of the model parameters from either the components or subsystem levels significantly fluctuate across the datasets. For the first parameter (1)  , the identified values from the different datasets vary considerably due to the presence of model error. Similar variations are observed for the second and third parameter (2)  and (3)  which are due to the changes of environmental conditions and manufacturing process. It is also seen that the identification uncertainties keep rather constant and extremely small values for each parameter. It is reminded that for the first and third parameters, such small uncertainties are produced by the sufficient number of data points available for each dataset due to the fact that experimental data consist of response time histories. For the second parameter, the small uncertainties are attributed to the assumptions of zero model error in component C2. It should be observed that such small identification uncertainties cannot capture the variations of the MLEs, failing to represent the uncertainty of model parameters due to variability. For describing the uncertainty due to variability, the distributions of the hyper parameters can be computed based on the approximation A-1 in Eq. (15) and the approximation A-2 in Eq. (17). For exploring the effect of the datasets at different levels of hierarchy on the posterior distribution of the hyper parameters, results are presented for three cases where the datasets are considered from components only, subsystem only and both of them. Results based on the approximation A-1 are shown in Fig. 5. Observing the hyper mean values, it is noted that the hyper mean values inferred using datasets from C1 only and the subsystem SS1 only are shifted from the nominal values, while the hyper mean value inferred using datasets from C2 is fairly close to the nominal values. This is due to the effect of the model error present in C1 and SS1, so that no model parameter values provide predictions that exactly match the measurements. However for the case of component C2, the perfect model assumed can yield the exact nominal values when given a sufficient large number of datasets.
It is also seen that the MPV of the hyper mean 1   from both components and subsystem is around 1.047 which corresponds approximately to an average value of the MPV 1.041 and 1.049 computed respectively from only the component C1 and only the subsystem SS1. For the hyper standard deviations (square root of the diagonal entry in hyper covariance), it is seen that a relatively larger value is computed (around 1.5%) for the first hyper parameter 1   in C1 while a smaller one (0.5%) is obtained in SS1, and the hyper standard deviations from both the components and subsystem SS1 is around 1% by considering the overall information. Similar remarks can be concluded for other hyper parameters, which indicates that the overall hyper parameter values consider the information from both the components and subsystem levels. This scenario is consistent with the analytical expression for the hyper means in Eq. (18) and the hyper covariance in Eq. (19). Finally, it is also noted that the identification uncertainties shown in Fig. 4 are small compared to the parameter variability shown by the hyper standard deviations in Fig. 5.
The posterior distribution of model error for C1, C2 and SS1 can be computed together with the posterior distribution of the hyper parameters according to the approximation A-1 in Eq. (15). Results for the posterior distribution of model error are depicted in Fig. 6. As seen, due to the fact that the model error is present in the first mass, both the component C1 and subsystem SS1 exhibit a large model error. It is also seen that the model error in C1 is larger than that of SS1. This can be attributed to the fact that at the subsystem level the number of parameters involved are more than the number of the parameters involved at the component level, making the three-parameter model more flexible to provide a better fit to the datasets than the one-parameter model involved at the component level, thus reducing the model error that is quantified by the prediction error parameters. The model error for C2 is fairly close to zero implying excellent matches between the updated model and the measurements. This is due to the absence of the model error assigned in component C2. For the purpose of comparing the accuracy of the proposed approximation A-1 and A-2, the results of the hyper parameters alongside the model error from A-1 and A-2 as well as results for these hyperparameters obtained from the analytical solutions derived in Subsection 2.2.4 are compared to the full simulation (FS) approach [25] based on TMCMC algorithm [31,32] and reported in Table 2. The FS procedure provides an accurate solution for a large number of samples. It is seen that both the results from A-1 and A-2 are in good agreements with the FS method. This is due to the fact that the asymptotic approximation holds true when the identification uncertainty is small compared to the parameter variability. The likelihood of the model parameters can be asymptotically approximated as a Gaussian distribution when given a large number of data points in a dataset (for C1 and SS1) or given a perfect computational model (for C2). The lightly difference between A-2 and the FS shown in Table 2 may arise from the number of insufficient samples used for FS. Most importantly, both the results from A-1 and A-2 exhibit acceptable accuracy. The A-2 exhibits higher accuracy than A-1 due to the fact that the asymptotic approximation for the prediction error parameters considered in A-1 is less accurate. Compared to the analytical results with A-1 and A-2, it is found that the analytical results are also slightly different from the results obtained by the A-1 and A-2 approximations. This is due to the assumptions of the identification uncertainty that was used to simplify derivations and obtain the analytical expressions. For gaining more insightful expressions, the analytical solutions was derived by selecting the identification uncertainty to be the same for each dataset and equal to the average of the identification uncertainty over all datasets. However, the identification uncertainty may not be exactly the same in each dataset. These assumptions have an effect on the accuracy of the analytical expressions. The analytical solutions do provide a better understanding of the estimates of the hyper parameters and the model error parameters. However, from the results in Table 2 they also provide very good estimates of the hyperparameter and model error parameter values.
The computational gain of using the approximations A-1 and A-2 compared with FS is also investigated and reported in Table 3. Herein the sample procedure employs the parallel version of the TMCMC algorithm [32] so that the parallel tools can be used during the computation. For guaranteeing the accuracy of FS, the number of the samples is chosen to be 10000. All the calculations are made in a 32-core computer. It is clear that the computational effort of A-1 and A-2 are much faster than FS, although the parallel tools are used for the FS method. Specifically, the running time for A-1 and A-2 are relatively close and less than 1.5 minutes. However, it takes around 38.3 minutes to get the same results by using FS. As a consequence, the proposed approximations A-1 and A-2 not only guarantee good results, but more importantly they can significantly improve the computational efficiency, providing faster computational tools for quantifying the uncertainties in a multi-level modeling approach. Table 2 Estimates of the means of hyper parameters and model error  Given the hyperparameter uncertainties in Fig. 5, the posterior distribution of each model parameter is obtained under the same cases with Fig. 5 where the datasets are considered from components only, subsystem only and both of them. Results are shown in Fig. 7. It is seen that the means of the parameter (1)  and (2)  computed by considering the overall information is a kind of average of the values obtained using datasets from components only and subsystem only. It is also seen that the uncertainty of parameter (1)  using datasets available at component C1 only is larger than that using both the datasets available at component C1 and subsystem SS1, while the opposite case occurs for the uncertainty of the parameter (2)  . This is attributed to the effect that datasets from the subsystem level has on the estimates of these parameters. As shown in Fig. 5, a small standard deviation is computed for parameter (1)  using the subsystem level datasets, while a larger one is obtained for parameter (2)  using datasets from the subsystem level. For the third parameter (3)  , since there is no dataset available for component C3, the results are expected to be exactly the same using the datasets available at subsystem only and at both components and subsystem.  Fig. 8. Such posterior distributions are calculated based on all the datasets from all levels of hierarchy, and thus the model parameters involve all the information from the components and subsystem levels. It is apparent that large uncertainties are obtained for the model parameters which are irreducible as the number of datasets at all levels of hierarchy increases. In contrast, the identification uncertainties obtained from each dataset in Fig. 4 yield unreasonable thin uncertainty bounds for the model parameters. Most importantly, the irreducible uncertainties can be propagated to the un-observed QoI, which may provide a realistic uncertainty bound for the response predictions. Results and discussions for predicting the responses through the system level are conducted in the next subsection.

Response predictions through the computational model at system level
The parameter uncertainties are next propagated through the hierarchy of computational models for computing the response predictions at the system level. The acceleration input with 5.2s and 100Hz sampling frequency shown in Fig. 9 is applied to the base of the system model in Fig. 3. For exploring the effect of datasets from the different levels of hierarchy on the uncertainty in the response predictions, results for the response predictions considering datasets from components/sub-system only are compared with the ones considering the datasets from both the components and sub-system. Fig. 10 shows the predicted results of displacement of the second and fourth DOFs by propagating the uncertainties through the system model. The light red shaded area shows the 95% uncertainty bounds (UB) of displacements considering the uncertainty of model parameter (1)  inferred from the datasets of C1 only, and the uncertainties of (2)  and (3)  inferred from the overall datasets of components and subsystem, while the light blue shaded area displays the 95% UB of displacements considering the uncertainties of model parameters θ inferred from all datasets available from the components and subsystem. The cyan and blue solid lines show the mean predictions corresponding to the light red shaded area and light blue shaded area, respectively, and the red line shows the actual measurements generated from the exact system model subjected to the base input in Fig. 9. It is observed that a slightly difference can be found for the mean predictions between the case where the model parameter (1)  is inferred from the dataset at component C1 level only and the case where the model parameter (1)  is inferred from the datasets available from both component C1 and subsystem SS1 levels. This is due to the fact that the hyper mean estimated from component C1 level is slightly different with the one identified from both component C1 and subsystem SS1 levels, as seen in the first column of Fig. 5. Regarding the uncertainty bounds of 23 the displacements, it is observed that the 95% UB of displacement of the 2 nd DOF calculated for the case where (1)  is inferred from component C1 level datasets is somewhat larger than that computed for the case where (1)  is inferred from both component C1 and subsystem SS1 level datasets. This is reasonable as the hyper standard deviation of the first model parameter computed from component C1 only is larger than the one estimated from both component C1 and subsystem SS1 (see Fig. 5). This scenario reveals that the uncertainty in response predictions depends on the datasets used at the different levels of modeling hierarchy to infer the parameters of the models involved. Herein, the uncertainties of the model parameter (1)  , estimated based on a comprehensive effect considering both the information from component C1 and subsystem SS1, provides different uncertainty bounds than the ones provided by the uncertainty in the model parameter (1)  inferred by neglecting the data at the subsystem level. The same finding is revealed for the displacement of the fourth DOF.  (1)  inferred from C1 only and also inferred from all information Results for the predictions of displacement of the 2 nd and 4 th DOFs are also conducted by considering the uncertainty of parameter (2)  inferred from component C2 level datasets only, as shown in Fig. 11. It is found that the mean predictions of displacements considering the uncertainty of parameter (2)  inferred from component C2 datasets only are different from the ones inferred from the datasets available from all levels. Again, this is attributed to the difference of the identified values of the model parameters between component C2 level datasets only and both component C2 and subsystem SS1 level datasets (see Fig. 5). For the UB of displacements, different scenario is found compared to the cases in Fig. 10. It is obvious that the 95% UB of displacements considering only the uncertainties of model parameters from component C2 are much smaller than the uncertainty bounds computed based on the uncertainties from the overall information, revealing that the information of the model parameters is also taken from the subsystem SS1. This finding is crucial for a multi-level modeling approach, especially when the uncertainties of the model parameters identified from the components level is insignificant. Besides, due to the large variability obtained from component C2 and subsystem SS1 levels, the actual displacement generated based on a new dataset through system model falls within the 95% UB of predicted displacement. Cases are also investigated for the UB of the modal properties. Fig. 12 and Fig. 13 show the predictions of modal frequencies and mode shapes under the same conditions with Fig. 11. For the ease of comparison, the modal frequencies are normalized by the nominal values of the modal frequencies. It is seen that the uncertainty bounds of the predictions, obtained considering only the model parameter uncertainties inferred from component C2 datasets, contain the real measurement. This is due to the fact that the parameter variability in component C2 is significant compared with the zero model error assumed in component C2. When the uncertainties of model parameters estimated from both component C2 and subsystem SS1 are considered, larger uncertainty bounds of modal frequencies are observed in Fig. 12(b), indicating that datasets from different levels may significant affect prediction uncertainties. The results also show the measured modal properties corresponding to a new dataset. It is clear that the new measurement is inside of the 95% UB of the predictions. For the predictions of mode shapes, similar results are obtained although the uncertainty seems to be smaller, especially for the lowest four modes, than the uncertainty observed for modal frequencies. This could be due to the insensitivity of the lowest four modeshapes to the model parameters.
25 Fig. 11 Predictions of displacement of (a) the second and (b) the fourth DOFs by considering the uncertainty of parameter (2)  inferred from C2 only and inferred from all information (a) (b) Fig. 12 Prediction of the ratio of modal frequencies to their nominal values obtained by propagating the uncertainty of the model parameter (2)  (a) inferred from C2 level datasets only (b) inferred from all available datasets 26 (a) (b) Fig. 13 Prediction of the mode shapes by propagating the uncertainty of the model parameter (2)  (a) inferred from C2 level datasets only, and (b) inferred from all available datasets Fig. 14 depicts the predicted acceleration of the fifth DOF by propagating different types of uncertainties through the system model. The light red shaded area shows the 95% UB of predictions considering the parameter uncertainties inferred from subsystem SS1 level datasets only, while the blue one shows the 95% UB considering the parameter uncertainties inferred from all available datasets. The model error is also considered for the predicted 5 th DOF acceleration, as shown in the green shaded area in Fig. 14(b). It is reminded that in Fig. 5, the hyper standard deviation 1   from subsystem SS1 datasets only is smaller than that from both subsystem SS1 and component C1 datasets, while the opposite case is observed for hyper standard deviation 2   . Interestingly, the 95% UB of acceleration considering the uncertainties of model parameters from subsystem SS1 only is wider than that from both subsystem SS1 and components. This can be due to the fact that the hyper standard deviation 2   is larger than the hyper standard deviation 1   , and the former one contributes more on the uncertainty bounds of the predictions. When the model error is considered for the prediction affect the UB from 0 to 1.5 sec while the UB for higher that 1.5 sec remain unaffected, indicating that in this case that a large part of the uncertainty is captured by the model parameters.

27
The parameter uncertainties computed from the overall information are also used to predict the UB of displacement and acceleration of the 6 th DOF which is at the system level and not at the levels of the tested components or subsystem. Actual measurements are also generated from the perturbed model. Results shown in Fig. 15 depict the 95% UB of the predictions along with the mean predictions and measurements. It is expected that the measurements are far away from the mean predictions due to the presence of model error assumed in the first mass. However, wide uncertainty bounds are obtained for both the displacement and acceleration. The measurements over almost the whole duration fall within the 95% UB of the predictions, which demonstrates the validity and effectiveness of the proposed approach.

Concluding Remarks
A systematic hierarchical Bayesian learning framework is developed to account for model hierarchy in structural dynamics utilizing multiple datasets obtained from different levels of the hierarchy. The framework can incorporate the uncertainty in the model parameters manifested in the different levels of model hierarchy by assigning hyper distributions for model parameters and inferring the hyperparameters using datasets collected from testing at the different levels of model hierarchy. The proposed approach captures the uncertainty in the model parameters due to variabilities in experimental data, environmental conditions, material properties, manufacturing process, assembling process as well as different mechanisms activated under different loading conditions. Asymptotic approximations developed in [29] are integrated within the framework, considerably simplifying the computational process of estimating the uncertainties in the hyper parameters and model prediction error parameters. The accuracy of the approximations is guaranteed for sufficiently large number of data within a dataset. Analytical expressions are derived for the most probable values of the hyper parameters and model error parameters, providing useful insight on the effect of datasets from the different levels of model hierarchy on the estimates of the model hyper parameters. Ultimately, the uncertainties are propagated for predicting confidence levels for output QoI by considering the overall information provided from the datasets available at the different levels of model hierarchy.
The effectiveness of the proposed framework is demonstrated by applying it to a simple system that includes two lower levels of hierarchy, with datasets available at the component level for two out of the three components considered and also at the subsystem level. Selective results are presented to demonstrate various issues related to the dependence of the uncertainties in the parameters and the predictions obtained by propagating the uncertainties through the model hierarchy on the availability of datasets at different levels of model hierarchy, the effect of the level of model error and level of manufacturing variability. The accuracy of the asymptotic approximations and the analytical expressions for estimating the MPV of the hyperparameters were also investigated by comparing results with the ones obtained from the full sampling approach. Good accuracy is observed which makes the asymptotic approximations suitable to address the computational burden that is involved in uncertainty estimation and propagation problems. Specifically, for the example application presented, the computational effort of the proposed approach is significantly less, by one to two order of magnitude, than the computation effort required from full sampling approaches. Predictions of the (un)observed QoI are conducted by considering different cases of uncertainty modeling in the model hierarchy.
The proposed method properly accounts for the uncertainties by considering the information contained in all datasets available from the different levels of hierarchy, providing a realistic tool for calibrating the parameters manifested at different levels of model hierarchy, and propagating the data-informed uncertainties through the model hierarchy to compute output QoI at the system level, thus improving confidence in response predictions. In this work, the formulation is presented using a specific structure of model hierarchy for the system assembled from two subsystems and one of the subsystems is assembled by three components. Datasets are available at component level for the two out of the three components and also at the subsystem level for one subsystem. General structures of model and data hierarchies were not treated. Moreover, the 6-DOF spring-mass chain model used to demonstrate the framework was selected due to its simplicity, with the procedure be representative of more complex models. The application of the framework in more complex computational models is only expected to substantially increase the computational effort due to the presence of more parameters per component level and the time-consuming operations for performing model simulations at various levels of model hierarchy. It is the subject of future work to generalize the proposed approach to account for complex computational models and data dependencies.

Appendix A. Calculations of the MPVs of hyper parameters and prediction error variance parameters for approximations A-1 and A-2
The MPVs of hyper parameters and prediction error variance parameters can be obtained by minimizing the negative logarithm of the joint distributions 2 ( , | ) p ψ σ D in Eqs. (16) for approximation A-1 and (17) for approximation A-2, given by (within a constant):  Similar results can be found for other two prediction error parameters, given as: 33  The formulas (24) to (27) for the MPVs of all parameters for the approximation A-2 are obtained in a similar manner by utilizing the chain rule and existing solutions of the MPVs of hyper and prediction error parameters for a full-scale system [29].