AI-enhanced iterative solvers for accelerating the solution of large scale parametrized linear systems of equations

Recent advances in the ﬁeld of machine learning open a new era in high performance computing. Applications of machine learning algorithms for the development of accurate and cost-eﬃcient surrogates of complex problems have already attracted major attention from scientists. Despite their powerful approximation capabilities, however, surrogates cannot produce the ‘exact’ solution to the problem. To address this issue, this paper implements up-to-date ML tools and delivers customized iterative solvers of linear equation systems, capable of solving large-scale parametrized problems at any desired level of accuracy. Speciﬁcally, the proposed approach consists of the following two steps. At ﬁrst, a reduced set of model evaluations is performed and the corresponding solutions are used to establish an approximate mapping from the problem’s parametric space to its solution space using deep feedforward neural networks and convolutional autoencoders. This mapping serves a means to obtain very accurate initial predictions of the system’s response to new query points at negligible computational cost. Subsequently, an iterative solver inspired by the Algebraic Multi-grid method in combination with Proper Orthogonal Decomposition, termed POD-2G, is developed that successively reﬁnes the initial predictions towards the exact system solutions. The application of POD-2G as a standalone solver or as preconditioner in the context of preconditioned conjugate gradient methods is demonstrated on several numerical examples of large scale systems, with the results indicating its strong superiority over conventional iterative solution schemes.


Introduction
In scientific computing, there is a constant need for solving larger and computationally more demanding problems with increased accuracy and improved numerical performance.This holds particularly true in multi-query scenarios such as optimization, uncertainty quantification, inverse problems and optimal control, where the problems under investigation need to be solved for numerous different parameter instances with high accuracy and efficiency.In this regard, constructing efficient numerical solvers for complex systems described by partial differential equations is crucial for many scientific disciplines.The preconditioned conjugate gradient method (PCG) [1,2,3,4] and the preconditioned generalised minimal residual method (PGMRES) [5,6,7] are amongst the most powerful and versatile approaches to treat such problems.The choice of a suitable preconditioner plays a major role on the convergence and scalability of these solvers and notable examples include the incomplete Choleski factorization [8] and domain decomposition methods [9,10], such as the popular FETI methods [11,12,13] and the additive Schwarz methods [14,15].In a similar fashion, Algebraic and Geometric Multigrid (AMG, GMG, resp.)[16] are equally well-established methods that are commonly employed for accelerating standard iterative solvers and they may also service as highly efficient preconditioners for PCG [17,18,19] or PGMRES [20,21,22].
Nevertheless, optimizing the aforementioned solvers so as to attain a uniformly fast convergence for multiple parameter instances as required in multi-query problems, such us uncertainty quantification, sensitivity analysis and optimization, remains a challenging task to this day.To tackle this problem, several works suggest the use of interpolation methods tasked with constructing approximations of the system's inverse operator for different parameter values [23,24,25], which can then be used as preconditioners.Another approach can be found in [26], where primal and dual FETI decomposition methods with customized preconditioners are developed in order to accelerate the solution of stochastic problems in the context of Monte Carlo simulation, as well as intrusive Galerkin methods.Augmented Krylov Subspace methods showed great promise in handling sequences of linear systems [27], such as those arising in parametrized PDEs, however, the augmentation of the usual Krylov subspace with data from multiple previous solves led in certain cases to disproportional computational and memory requirements.To alleviate this cost, optimal truncation strategies have been proposed in [28], as well as deflation techniques [29,30,31].
In recent days, the rapid advancements in the field of machine learning have offered researchers new tools to tackle challenging problems in multi-query scenarios.For instance, deep feedforward neural networks (FFNNs) have been successfully employed to construct response surfaces of quantities of interest in complex problems [32,33,34,35,36].Convolutional neural networks (CNNs) in conjuction with FFNNs have been employed to predict the high-dimensional system response at different parameter instances [37,38,39].In addition, recurrent neural networks demonstrated great potential in transient problems for propagating the state of the system forward in time without the need of solving systems of equations [40,41].All these non-intrusive approaches utilize a reduced set of system responses to built an emulator of the system's input-output relation for different parameter values.As such, they are particularly cheap to evaluate and can be very accurate in certain cases, however, they can be characterized as physics-agnostic in the sense that the derived solutions do not satisfy any physical laws.This problem is remedied to some extent from intrusive approaches based on reduced basis methods, such as Principal Orthogonal Decomposition (POD) [42,43,44] and proper Generalized Decomposition [45,46,47].These methods rely on the premise that a small set of appropriately selected basis vectors suffices to construct a low-dimensional subspace of the system's high-dimensional solution space and the projection of the governing equations to this subspace will come at minimum error.In addition, several recent works have investigated the combination of either linear or nonlinear dimensionality reduction algorithms and non-intrusive interpolation schemes to construct cheap emulators of complex systems [48,49,50,51,52,53,54,55]. Nevertheless, none of these surrogate modelling schemes can guarantee convergence to the exact solution of the problem.
In the effort to combine the best of two worlds, a newly emergent research direction is that of enhancing linear algebra solvers with machine-learning algorithms.For instance, POD has been successfully employed to truncate the augmented Krylov subspace and retain only the high-energy modes [56] for efficiently solving sequences of linear systems of equations characterized by varying right-hand sides and symmetric-positivedefinite matrices.In [57], neural networks were trained for predicting the geometric location of constraints in the context of domain decomposition methods, leading to enhanced algorithm robustness.Moreover, the close connection between multigrid methods and CNNs has been studied in several recent works, which managed to accelerate their convergence by providing data-driven smoothers [58], prolongation and restriction operators [59].
The present work focuses on the problem of developing a ML-enhanced linear algebra solver to combat the excessive computational demands of detailed finite element models in multi-query scenarios.A novel strategy is proposed to utilize ML tools in order to obtain system solutions within a prescribed accuracy threshold, with faster convergence rates than conventional solvers.Specifically, the proposed approach consists of two steps.Initially, a reduced set of model evaluations is performed and the corresponding solutions are used to establish an approximate mapping from the problem's parametric space to its solution space using deep FFNNs and CAEs.This mapping serves a means of acquiring very accurate initial predictions of the system's response to new query points at negligible computational cost.The error in these predictions, however, may or may not satisfy the prescribed tolerance threshold.Therefore, a second step is proposed herein, which further utilizes the knowledge from the already available system solutions, in order to construct a datadriven iterative solver.This solver is inspired by the idea of the Algebraic Multigrid method combined with Proper Orthogonal Decomposition, termed POD-2G, that successively refines the initial predictions towards the exact system solutions with significantly faster convergence rates.
The paper is organised as follows.In Section 2 the basic principles of the PCG and AMG iterative solvers are illustrated.In Section 3, the elaborated methodology for developing an ML-enhanced linear algebra solver is presented.Section 4 presents a series of numerical examples that showcase the performance of the method compared to conventional iterative solvers.Section 5 summarizes the outcomes of this work and discusses possible extensions.

The Finite Element Method
This work focuses on linear elliptic PDEs defined on a domain Ω ⊆ R dim , dim = 1, 2, 3, which are parametrized by a vector of parameters θ ∈ Θ, with Θ ⊆ R n being the parameter space.The variational formulation of the PDE can be stated as: given θ ∈ Θ, find the solution v = v(θ) from the Hilbert space for every w ∈ V(Ω) with compact support in Ω.The Lax-Milgram lemma proves that eq. ( 1) has a unique solution for every θ, provided that the bilinear form κ(•, •; θ) is continuous and coercive and f (•; θ) is a linear and continuous one-form.
In practice, however, obtaining an exact solution v is not feasible for most applications of interest and instead, an approximate solution is sought using numerical techniques, such the finite element method focused in this work.In this case, a finite-dimensional subspace V h ⊆ V is considered, spanned by a finite number of basis vectors {N i } N i=1 , consisting of polynomials.These polynomials are compactly supported on a set of small polyhedra (finite elements) that partition the domain Ω and within each element e the approximate displacement vector field v h ∈ V h and test functions w h can be expressed as: The Galerkin method relies on the linearity of the forms κ, f and the orthogonality of the polynomial basis vectors, in order to obtain the coefficients u e = [u e i , • • • , u e N ] T ∈ R N in the expansion of the unknown field approximation.In particular, since eq.( 1) must hold within each finite element e and for any test function w, the system of linear equations follows: or, Equation ( 5) describes an N × N linear system of equations to be satisfied within the e-th element.Repeating this procedure for all elements and appropriately assembling the respective equations will result in the following d × d linear system with d being the total number of unknowns in the system, K ∈ R d×d is a real symmetric positive definite matrix, u ∈ R d is the unknown solution vector and f ∈ R d the force vector.Solving such a linear system for a detailed discretization (d 1) can be computationally intensive.This holds particularly true for multiquery problems that require numerous system evaluations for various instances of parameters θ, such as optimization, parameter inference, uncertainty propagation, sensitivity analysis, etc.Based on this, it becomes evident that efficient numerical solvers for large-scale linear systems play a crucial role in science and engineering.This section revisits the basic ideas behind two of the fastest methods for solving such systems, namely, the PCG and the AMG methods.

Preconditioned conjugate gradient method
The Conjugate Gradient method was originally proposed by Hestenes and Stiefel as a direct method [60] for solving linear systems, but its full potential is demonstrated as an iterative solver for large-scale sparse systems of the form Ku = f , with K being a symmetric positive definite matrix.The goal of CG is to minimize the quadratic form which is equivalent to setting the residual r = −∇Q(u) = f − Ku to zero.Let us consider the Krylov subspaces, These subspaces are nested, K 0 ⊆ K 1 ⊆ . . ., and have the key property that K −1 f ∈ K d .Then, a Krylov sequence consists of the vectors {u (k) } such that and based on the previous property, it follows that u (d) = K −1 f .In this regard, CG is a recursive method for computing the Krylov sequence {u (0) , u (1) , . . .}.It can be proven that the corresponding (nonzero) residuals r (k) form an orthogonal basis for the Krylov subspaces, that is and that the 'steps' 1) are conjugate (K-orthogonal): Therefore, the vectors q i form a conjugate basis for the Krylov subspaces Introducing the coefficients with p k being the scaled versions of q k given by the recursion: then, the Krylov sequence and the corresponding residuals are given by the relations: The conjugate gradient algorithm starts by an initial guess u (0) with corresponding residual r (0) = f − Ku (0) and updates this guess according to equations ( 15)-( 16) for k = 0, 1, . . ., until r (k) is suffiently small.In theory CG terminates in at most d steps, however, due to rounding errors it may take more than d steps or even fail in practice.Also, the improvement in the approximations u (k) is determined by the condition number c(K) of the system matrix K; the larger c(K) is, the slower the improvement.
A standard approach to enhance the convergence of the CG method is through preconditioning, namely the application of a linear transformation to the system with a matrix T , called the preconditioner, in order to reduce the condition number of the problem.Thus, the original system Ku − f = 0 is replaced with T (Ku − f ) = 0, such that c(T K) is smaller than c(K).The preconditioned CG consists of the following steps: s k+1 = T r (k+1) 10: T s k 11: The choice of the preconditioner T in PCG plays a crucial role in the fast convergence of the algorithm.Some generic choices include the Jacobi (diagonal) preconditioner , where K = L LT is an approximation of K with cheap Cholesky factorization, and the incomplete LU factorization.Moreover, multigrid methods such as the AMG, elaborated on the next section, apart from standalone iterative schemes, are also very effective as preconditioners to the CG method.

Algebraic Multigrid Method
AMG was originally introduced in the 1980's [61] as an efficient numerical approach for solving large ill-conditioned sparse linear systems and eigenproblems.Its main difference from the (geometric) multigrid method lies only in the method of coarsening.While multigrid methods require knowledge of the mesh, AMG methods extract all the needed information from the system matrix.AMG methods have been successfully applied to numerous problems including PDEs, sparse Markov chains and problems involving graph Laplacians (e.g.[62,63,64,65,66]).The key idea in AMG algorithms is to employ a hierarchy of progressively coarser approximations to the linear system under consideration in order to accelerate the convergence of classical simple and cheap iterative processes, such as the damped Jacobi or Gauss-Seidel, commonly referred to as relaxation or smoothing.Relaxation is very efficient in eliminating the highfrequency error modes, but inefficient in reducing the low-energy modes.AMG overcomes this problem through the coarse-level correction, elaborated below.
Let us consider the linear system of eq. ( 6), which describes the fine problem and let u (0) be an initial solution to it.The two-level AMG defines a prolongation operator P , which is a full-column rank matrix in R d×dc , d c < d and a relaxation scheme such as the Gauss-Seidel (GS).Then, the two-level AMG algorithm consists in the following steps: Algorithm 2 Two-level AMG algorithm residual tolerance δ and an initial approximation Pre-relaxation: Perform r 1 iterations of the relaxation scheme on the current approximation and obtain ũ(k) as: ũ(k) ← G u (k) ; r 1

5:
Compute the residual: Restrict the residual to the coarser level and solve the coarse level system K c e (k) c = P T r (k) , where Prolongate the coarse grid error e (k) = P e   To better illustrate algorithm 2 and its convergence properties, let us consider the GS algorithm as the relaxation scheme, where the matrix K is split into K = L + V , L being a lower triangular matrix that includes the diagonal elements and V is the upper triangular part of A. The iterative scheme of the GS method is as follows: where the subscripts m, m + 1 in the above equation denote the iteration number of the GS algorithm.If u is the exact solution to the system and e m = u − u m the error after the m-th iteration, then where Returning to alg. 2, the error at the end of the k-th cycle of the two-level AMG can be computed as: In the above equation, the matrix M r2 CM r1 determines the convergence behavior of the two-level cycle.
The relaxation matrix M plays a role, however, in practice the selection of the prolongation operator P is the key to designing an efficient algorithm.In this regard, the most popular variations of AMG include the Ruge-Stüben method [61] and the smoothed aggregation based (SA) AMG [67].Lastly, another factor the affects the number of iterations in AMG to reach the prescribed threshold of accuracy, is the choice of the initial solution.In absence of other information, u (0) = 0 is usually considered.

Problem statement
The aim in this section is to develop an efficient data-driven solver for the parametrized system of eq. ( 6), by combining linear algebra-based solvers with machine learning algorithms.More specifically, the idea proposed herein, is to utilize a reduced set of high-fidelity system solutions, obtained after solving eq. ( 6) for specified parameter instances, in two different yet complementary ways.First, a surrogate model will be established in the form of a 'cheap-to-evaluate' nonlinear mapping from the problem's parameter space to its solution space using convolutional neural networks (CNNs) and feedforward neural networks (FFNNs).Even though CNNs and FFNNs have been shown to produce astonishing results even for challenging applications [68,39,38], nevertheless, their black-box and physics-agnostic nature doesn't provide any means to improve the solutions they produce.To combat this problem, POD is performed on this data set of solutions and an efficient iterative solver is developed based on the idea of AMG, where in this case the prolongation operator is substituted by the projection matrix to the POD reduced space.

Surrogate model
A surrogate model is an imitation of the original high fidelity model and serves as a 'cheap' mapping from the parametric space θ ∈ R n to the solution space u ∈ R d .In general, it is built upon an initial data set {u i } N i=1 , which is created by solving the problem for a small, yet sufficient number, N , of parameter values.It is essential to span the problem's parametric space effectively, thus sophisticated sampling methods are often utilized, such as the Latin Hypercube [69].Many surrogate modeling techniques have been introduced over the past years, including linear [47,43,44] and nonlinear [37,38,50] dimensionality reduction methods.The selection of the appropriate method is problem dependent.
In the present work, a surrogate modeling scheme based on convolutional autoencoders (CAEs) and feedforward neural networks (FFNNs) that was introduced in [37] for parametrized time-dependent PDEs is employed.It consists of two phases, namely the offline and the online phase.The offline phase begins with the training of a CAE that consists of an encoder and a decoder, in order to obtain low dimensional latent representations, z i ∈ R l , through the encoder with l d and a reconstruction map by the decoder.It is trained over the initial data set {u i } N i=1 to minimize the objective function: where ũi is the reconstructed input.After the training is completed, the latent space data set {z i } N i=1 is obtained.The second step of the offline phase is the training of the FFNN, which is used to establish a nonlinear mapping from the parametric space θ ∈ R n to the latent space z ∈ R l .Again, the aim of the training is the minimization of the loss function: where zi is the network's output.Consequently, the online phase utilizes the fully trained surrogate model, which is now capable of delivering accurate predictions of the system's response for new parameter values θ j as follows:

Multigrid-inspired POD solver
POD, also known as Principal Component Analysis, is a powerful and effective approach for data analysis and dimensionality reduction aimed at indentifying low-order modes of a system.In conjunction with the Galerkin projection procedure it is commonly utilized as an efficient method to reduce the dimensionality of large linear systems of equations [70,71,72].The theory and application of POD is covered in many publications, however, to keep this paper as self-contained as possible the POD procedure used within this framework is summarized below.Let us denote with U ∈ R d×N the matrix consisting of N solution vectors [u 1 , ..., u N ] for different parameter values {θ i } N i=1 and with R = U U T ∈ R d×d the correlation matrix.Then POD consists in the following steps.
1. Compute the eigenvalues and eigenvectors of R that satisfy RΦ = ΦΛ.This step can be very demanding when d 1, however, in practice N d and since R, R T have the same non-zero eigenvalues, it is computationally more convenient to solve instead the eigenvalue problem U T U Ψ = ΨΛ.Then, the eigenvectors Φ and Ψ are linked according to the formula.
2. Form the reduced basis Φ r , be retaining only the r first columns of Φ, corresponding to the largest eigenvalues.3.Under the assumption that each solution to eq. ( 6) can be approximated as: with u r ∈ R r being the unknown coefficients of the projection on the truncated POD basis, then the reduced-order linear system becomes: Solving equation ( 26) for u r is significantly easier since K r ∈ R r×r , with r small.4. Retrieve the solution to their original problem: Based on the above, a similarity between the 2-level AMG method and POD can be observed, under the identification of Φ r as the prolongation operator and Φ T r the corresponding restriction.Then, the algorithm 2 remains practically the same.In this case, the iterative scheme becomes 3.4.Proposed data-driven framework for parameterized linear systems The final step is to combine the surrogate model of section 3.2 and the multigrid-inspired POD solver of the previous section into a unified methodological framework for solving efficiently large-scale parametrized linear systems.In particular, an initial data set of system solutions {u i } N i=1 is performed for specified instances of the parameter vector {θ i } N i=1 .Then, these solution vectors are utilized as training data for the CNN and FFNN and the surrogate model can be established.The error between the exact solution and the surrogate's prediction for a given θ can be given as: Despite one's best efforts, however, e sur = 0 and the surrogate's predictions will not satisfy exactly equation 6.At this point, instead of simply performing iterations of PCG or AMG to improve the surrogate's predictions, we propose to further utilize the knowledge available to us from the data set of solution vectors, in order to enhance the performance of these iterative solvers.In particular, we perform POD to the solution matrix U = [u 1 , ..., u N ] and obtain the projection matrix Φ T r and apply the iterative scheme of equation ( 28) either directly, or as a preconditioner T in the PCG algorithm.The proposed methodology is data-driven and, as such, it is not possible to provide a priori estimates of the error for general systems.Nevertheless, under the assumption that the training data set U is 'large' enough to contain almost all possible variations of the solution vector, then an estimate for the error can be provided as follows: In the above, • denotes the l2-vector norm, when the input is a vector, and induced operator norm (spectral norm) when the input is a matrix.We know, by construction, that M is a stable matrix and that its spectral radius ρ(M ) is less than 1.Therefore, there exists γ ∈ (0, 1) and L 1 > 0 such that: Therefore, M r1 , M r2 ≤ 1 for r 1 , r 2 large enough.Now, focusing on the term then, by definition: Under the assumption that a given u ∈ R d can be decomposed as u = Φ r u r + u ⊥ , with Φ r u r ∈ Φ and u ⊥ ∈ Φ ⊥ , where Φ = span {φ 1 , ..., φ r } and Φ ⊥ its orthogonal complement in R d , then, thus, for some L 2 > 0. Due to the orthogonality of Φ r u r and u ⊥ , it follows that In fact, by choosing an appropriate number of eigenvectors r in POD, we can obtain u ⊥ ≤ 1 L2 and then, inequality (34) becomes with C := C(u ⊥ ) and C ∈ (0, 1).Inserting the inequalities ( 31) and ( 36) into (30), we have: Applying the above inequality recursively, we conclude: The above inequality provides us with some valuable insight regarding the performance of the proposed data-driven solver.Most importantly, we notice the critical role that the surrogate's predictions play in the convergence, since the error is bounded be the surrogate's error e sur .Even though this result agrees with common intuition, nevertheless, being rigorously proven excludes the possibility of good initial predictions requiring more iterations for the solution to converge.Secondly, be retaining more eigenvectors to construct the reduced space Φ reduces the norm of u ⊥ ∈ Φ ⊥ , resulting in faster convergence.In the following section, we test the solver on numerical applications of scientific interest and assess its performance in comparison with conventional solvers.

Numerical applications
The proposed methodology is tested on two parametrized systems.The first case is the indirect tensile strength (ITS) test, which is treated with the theory of 2D linear elasticity, while the second one is a 3D deformable porous medium problem, also known as Biot problem.

Indirect tensile strength test
A popular test to measure the tensile strength of concrete or asphalt materials is the ITS test.It contains a cylindrical specimen loaded across its diameter to failure.The specimen is usually loaded at a constant deformation rate and measuring the load response.When the developed tensile stress in the specimen under loading exceeds its tensile strength then the specimen will fail.In this application, we restrict our analysis to the linear regime and model the cylinder as a 2D disk under plain strain assumptions, as shown in 3.In this case, the weak form of the problem reads: the strain tensor and f the loading.Also, µ and λ are the Lamé's constants, which are linked to the Young modulus E and the Poisson ratio according to equations 41: In this example, the specimen has a diameter of 150 mm and due to symmetry in geometry and loading we only need to model one quarter of the disk, as illustrated in figure 4. The approximate solution to eq. ( 39) u ∈ R d is obtained through the finite element method using a mesh that consists of triangle planestrain finite elements with a total of d = 5656 dofs.The Young modulus E and the load P are considered uncorrelated random variables following the lognormal distribution as described in table 2. The Poisson ratio is considered a constant parameter ν = 0.3.Figure 4    The first step of the proposed solver is to generate a sufficient number of samples.To this purpose, the Latin Hybercube sampling method was utilized to generate N = 200 parameter samples {[E i , P i ]} N i=1 .Subsequently, the corresponding problems are solved with the finite element method and the solution vectors obtained, {u i } N i=1 , are regarded as 'exact' solutions.Next, a surrogate model is trained over these solutions in order to establish a 'cheap' mapping from the parametric to solution space.The methodology for the surrogate model is described in section 3.1.The selected CAE and FFNN architectures are presented in figure 5.
To tackle the problem of overfitting the standard hold-out approach was exploited.In particular, the data set was randomly divided into train and validation subsets with a ratio of 70%-30% and each network's performance on the validation data set was assessed in order to avoid overfitting.The CAE is trained for 40 epochs with a batch size of 10 and a learning rate of 5e − 4, while the FFNN is trained for 3000 epochs with a batch size of 20 and a learning rate of 1e − 4. The average normalized l 2 norm error of the surrogate model in the test data set is 0.54%.The second step is to form the POD basis Φ r by performing eigendecomposition on the correlation matrix U U T (or equiv.on U T U ), with U = [u 1 , . . ., u N ] being the solution matrix.In this case, the number of eigenvectors kept is r = 8, which correspond to over 99.99% of the variance in the training data.Consequently, when all components of the proposed POD-2G solver are defined and fully trained, the methodology described in section 3.2 can be applied to obtain new system's solutions for different parameter values.
In order to test the proposed POD inspired solver, a number of N test = 500 test parameter samples {[E j , P j ]} Ntest j=1 were generated according to their distribution.The corresponding problems were solved with the Ruge-Stüben AMG solver for 2,3 and 5 grids and the proposed POD-2G solver for different values of tolerance.The mean value of the CPU time and the number of cycles required for convergence to the desired number of tolerance are displayed in figure 6.The results indicated that the proposed POD based solver surpasses significantly traditional AMG solvers in computational cost.In particular, for ε = 1e − 5, which is a typical target tolerance in the majority of science problems and u (0) = 0, a reduction of computational cost of ×2.53 is observed between the proposed and the 3-grid AMG solver.A key component of the proposed methodology is to obtain a close estimation of the solution by the surrogate model, u (0) = u sur .As observed from the convergence analysis, an initial solution u (0) with normalized error in the magnitude of 1.00% is capable of drastically reducing the computational cost.Specifically, for ε = 1e − 5, the decrease in CPU time is of ×13.89 when compared with the case of u (0) = 0.
Furthermore, the convergence behaviour of the proposed method when used as a preconditioner in the context of the PCG method is presented in figure 7.  Again, the results obtained proved that the proposed methodology is superior than classic AMG preconditioners.In particular, for ε = 1e − 5 and u (0) = 0, a reduction of computational cost of ×1.53 is observed between the proposed method and the 3-grid AMG.In addition, the initial solution delivered by the surrogate model, u (0) = u sur , is again a crucial factor of fast convergence.In that case, the reduction of computational time is of ×4.00 when compared with using u (0) = 0. Finally, in order to highlight the computational gain of the proposed framework in the context of the Monte Carlo method, N M C = 1e+05 simulations are performed to determine the probability density function (PDF) of the vertical displacement u top y of the top node (where the load P is applied).The calculated PDF is presented in figure 8a.Each simulation is solved with PCG and two different proconditioners, namely the proposed POD-2G method and a standard three grid Ruge-Stüben AMG preconditioner.The results are displayed in figure 8b and prove that the proposed method is superior to classic AMG when dealing with parametrized systems.In particular, the conventional method needed 21109 s to complete the 10 5 simulations, while the proposed data-driven solver required 4013 s for the same task including the offline cost (initial simulations and training of the surrogate model).This translates to a remarkable decrease in CPU time of ×5.26.

Biot problem -deformable porous medium
Biot's theory describes wave propagation in a porous saturated medium, i.e., a medium made of a solid matrix, fully soaked with a fluid.Biot does not take into account the microscopic level and assumes that continuum mechanics can be applied to measurable macroscopic quantities [73].Biot problem in weak form can be stated as: with A, D being the Biot coefficient tensor and diffusion tensor, respectively.In this test case, the domain Ω is a cube and each side has a length of L = 1.00 mm.Regarding the boundary conditions, a pressure distribution p lef t := p| x=0 = 1.0 M P a is applied on the left face of the cube along with a displacement load u top y := u y | z=1 = 0.20 mm on the top face, while all displacements u x , u y and u z are restrained in the bottom face (z = 0).The problem definition is presented in figure 9.    were generated according to their distribution and the corresponding problems were solved with the proposed POD inspired solver and different Ruge-Stüben AMG solvers.The results are very promising in terms of computational cost.In particular, for ε = 1e − 5 and u (0) = 0, a reduction of computational cost of ×7.65 is achieved when comparing the proposed solver with the 5-grid AMG solver.Furthermore, obtaining an accurate initial solution u (0) is again a very important component of the proposed framework.Specifically, for ε = 1e − 5, the decrease in CPU time is of ×5.94 when compared with the case of u (0) = 0.The mean value of the CPU time and the number of cycles required for convergence to the desired number of tolerance are displayed in figure 12.    Again, the results delivered by the proposed methodology showed its superiority not only over AMG preconditioners but also over ILU and Jacobi preconditioners.In particular, for ε = 1e − 5 and u (0) = 0, a reduction of computational cost of ×2.37 is observed between the proposed method and the 5-grid AMG, of ×1.63 with the ILU and of ×1.16 with the Jacobi.Last but not least, the initial solution delivered by the surrogate model, u (0) = u sur , is again a crucial factor of fast convergence.In that case, the reduction of computational time is of ×2.12 when compared with using u (0) = 0. Finally, in order to highlight the computational gain of the proposed framework in the context of the Monte Carlo method, N M C = 2e + 05 simulations are performed to determine the probability density function (PDF) of the displacement magnitude ||u|| of the monitored node (see figure 9).The calculated PDF is presented in figure 14a.As in the previous example, each simulation is solved with PCG and two different proconditioners, namely the proposed POD-2G and a standard three grid Ruge-Stüben AMG preconditioner.Again, the results obtained by the proposed methods demonstrate significant computational advantage over conventional preconditioners.In particular, the Jacobi preconditioner needed 4.23e + 05 s to complete 2e + 05 simulations, while the proposed data-driven solver required 1.75e + 05 s for the same task including the offline cost (initial simulations and training of the surrogate model).This translates to a decrease in CPU time of ×2.42.

Conclusions
The present work introduced a framework for accelerating the solution of parametrized problems that require multiple model evaluations.The proposed framework consisted of two distinct yet complementary steps.The first step in the methodology was the construction of a 'cheap-to-evaluate' metamodel using FFNNs and CAEs, trained over a reduced set of high-fidelity system solutions.Despite giving very accurate predictions at new parameter instances, these predictions were bound to exhibit some discrepancy with respect to the actual system solutions, since they are not constrained by any physical laws.The second step in the methodology aimed precisely at fixing this by proposing a data-driven iterative solver, inspired by the AMG method, that will refine the metamodel's predictions until a prescribed level of accuracy has been attained.In particular, using again the already available set of high-fidelity system solutions, POD was performed on this set to identify the subspace that captures most of the variation in the system responses.Next, a 2-level multigrid scheme was developed, termed POD-2G, using the projection operator from POD as the prolongation operator.This scheme was tested on numerical applications as a standalone solver, as well as a preconditioner to PCG, and in both cases, its superior performance with respect to conventional iterative solvers was demonstrated.

8 :
Correct the fine grid solution: ũ(k) = ũ(k) + e (k)9:Post-relaxation: Perform additional r 2 relaxation iterations and obtain u (k) ← G ũ(k) ; r 2 10:k = k + 1 11: end whileIn the above algorithm, lines 4-10 describe what is known as a V -cycle and it is schematically depicted in figure1a.The multi-level version of the above algorithm is easily obtained as the result of recursively applying the two-level algorithm, as shown in fig.1bfor the 3-level setting.

Figure 2 :
Figure 2: Schematic representation of the surrogate model

Figure 3 :
Figure 3: Diametrically point loaded disk also displays a contour plot of the displacement magnitude u for E = 2000 M P a and P = −1000 N .Parameter Distribution Mean Standard deviation E(M P a)

Figure 5 :
Figure 5: Surrogate model architecture (a) Comparison of mean CPU time (b) Comparison of mean number of cycles

Figure 6 :
Figure 6: Comparison of mean CPU time and mean number of cycles over 500 analyses for different multigrid solvers (a) Comparison of mean CPU time (b) Comparison of mean PCG iterations

Figure 7 :
Figure 7: Comparison of mean CPU time and mean number of PCG iterations over 500 analyses for different preconditioners (a) PDF of u top y (b) Comparison of computational cost

Figure 8 :
Figure 8: PDF of u top y for 1e + 05 MC simulations and comparison of computational cost.

Figure 9 :
Figure 9: Geometry and boundary conditions of Biot problem

Figure 10 :Figure 9
Figure 10: Displacement magnitude u and pressure distribution p for λ = 1.70 M P a and µ = 0.30 M P a

Figure 11 :
Figure 11: Surrogate model architecture (a) Comparison of CPU time (b) Comparison of cycles

Figure 12 :
Figure 12: Comparison of CPU time and number of cycles for different multigrid solvers (a) Comparison of CPU time (b) Comparison of PCG iterations

Figure 13 :
Figure 13: Comparison of CPU time and number of cycles for different preconditioners (a) PDF of ||u|| (b) Comparison of computational cost

Table 1 :
Random parameters of the ITS test

Table 2 :
Random parameters of the Biot problem