Massively Parallel Modeling of Battery Energy Storage Systems for AC/DC Grid High-Performance Transient Simulation

Extensive integration of power electronics apparatuses complicates the modern power grid and consequently necessitates time-domain transients study for its planning and operation. In this work, a heterogeneous computing architecture utilizing the CPU and graphics processing unit (GPU) is proposed for the efficient study of interactions between a power grid network and massive utility-scale battery energy storage systems (BESSs). The device-level electromagnetic transient (EMT) simulation aiming at enhanced fidelity of the BESS is conducted simultaneously with electro-mechanical transient stability (TS) simulation which suffices system-level dynamic security assessment. Since the reservation of a large amount of energy storage units is computationally intensive for the CPU, the concurrent multi-streaming, multi-threading capability of GPU is exploited to achieve asynchronous sequential-parallel processing, so that the proposed EMT-TS co-simulation can flexibly harness all available computing resources. The multi-rate scheme is adopted for further computational burden alleviation in addition to achieving timely information exchange. It shows that the heterogeneous computation of an IEEE 118-bus system integrated with a substantial number of distributed batteries becomes feasible following the achievement of a remarkable speedup of over 200, and the device- as well as system-level accuracy are validated by MATLAB/Simulink and DSAToolsTM/TSAT simulation, respectively.


I. INTRODUCTION
M ASSIVE integration of power-converter-based apparatuses such as renewable energies and microgrids into a regional transmission and distribution grid diversifies its power supply, whilst posing a challenge to the system dynamic security [1], [2]. The semi-autonomy of these devices leads to more Manuscript  complex electric network operation scenarios as it implies the very likelihood of their sudden connection or disconnection that consequently causes a powerflow redistribution or momentary disequilibrium which could gradually intensify and eventually cause instability. Hence, extensive utility-scale battery energy storage systems (BESSs) are envisioned for voltage and frequency support [3], and some work with aggregated units have been carried out [4], [5]. To gain an accurate insight into their efficacy for grid operation and planning purpose, it requires a system-wide study with each individual unit modeled in detail [6], which is infeasible to commercial simulation software due to a heavy computational burden. Dynamic security assessment with electro-mechanical modeling in the phasor domain is prevalent regarding investigating the voltage, angle, and frequency stability of AC and DC grids [7], [8]. It is carried out at the system level where various types of converter-based energy sources and loads are lumped or simplified [9], [10]. Although it suffices to show the entire grid operation status, the positive-sequence transient stability (TS) simulation falls short of providing detailed device-level information, and the impact of an individual energy system with an actual configuration and an exact control scheme on the bulk AC network could also be unavailable. Additionally, a regular time-step of a few milliseconds excludes typical power converter transients.
The electromagnetic transient (EMT) simulation based on time-domain model discretization and linearization, in contrast, can capture fast transients even in sub-microseconds [11]. The detailed models incorporated enable it to be a highly accurate approach to evaluate electric power components and systems [12]- [14], especially a power converter controller. In the meantime, a mandatory time-step typically ranging from a few to dozens of microseconds implies that it is more time-consuming than TS assessment [15], [16]. Nevertheless, owing to a dominant sequential processing manner, the efficiency of commercial EMT simulation software plunges alongside an increasing number of components as well as an expanding scale of the network. To cater to the tremendous computing resource demand, multi-core CPUs and their clusters are adopted as a solution [17]- [19].
The graphics processing unit (GPU) with an improving computational capability supported by thousands of cores is an emerging platform for high-performance computing (HPC) of power systems and power electronics [20], [21], and remarkable speedups were gained with very limited computing hardware resources in some scenarios which possess an explicit homogeneity, or have an ideal system configuration consisting of multiple identical subsystems which topologically exhibit perfect symmetry that caters to coarse parallelism [22]. Nevertheless, the GPU is not superior to the CPU in handling inhomogeneity which is common in an electrical system. The evolving PCIe and RAM technologies in the meantime enable efficient data exchange with other processors [23], especially the CPU, making a thorough utilization of all available computing resources in a computer feasible for fast online and offline study of practical power grids.
Therefore, in this work, heterogeneous CPU-GPU computing of large-scale electric power systems with both homogeneity and inhomogeneity for more efficient simulation is investigated as an extension of pure-GPU application since it would be computationally overwhelming to conventional methods. A TS-EMT co-simulation taking their respective merits into account is formed for transient analysis of a regional energy network integrated with distributed grid-supporting BESSs. EMT modeling is carried out for power converters with fast transients, while its counterpart targets the bulk power system. A detailed power converter model based on the transmission line link (TLL) model and state-space equation is proposed for a matrix dimension reduction as well as converting the BESS inhomogeneity into its opposite so that fine-grained parallelism can be achieved and ultimately a faster simulation speed. In addition to the single-instruction multiple-threading (SIMT) paradigm, the multi-stream (MS) asynchronicity is explored for further efficiency improvement. Since the proposed heterogeneous HPC methodology can address different types of computational burdens accordingly with limited CPU and GPU resources, it provides a benchmark solution for the efficient and comprehensive offline simulation of future power grids which are topologically becoming increasingly complex.
The paper is organized as follows: Section II introduces the detailed parallel BESS model, followed by AC/DC power system modeling and TS-EMT co-simulation in Section III. Section IV describes heterogeneous computing architecture for simulation acceleration, and the results are given in Section V. Section VI presents the conclusions.

A. Vectorized Battery Model
The battery model can be depicted as a Thévenin equivalent circuit with a voltage source V Bat in series with an internal impedance Z B . The controlled voltage source V Bat contains 5 parts, i.e., the constant voltage E 0 , the polarization voltage E pol , the exponential zone voltage E exp , and voltages induced by the dynamic charge or discharge process E chg and E dsc , as expressed below, where S ch is a binary indicating the charging status with 1. As various types of batteries have different E exp and E chg , the battery voltage is vectorized for parallel processing. Arranged in the sequence of lithium-ion, lead-acid, and nickel-cadmium batteries, the simple 3-dimensional vectors for exponential zone voltage and the charging dynamics in an arbitrary EMT time-step take the forms of where A, B, K Q are coefficients, i andĩ are the battery current and its low frequency value, t is the time instant, K means the polarization constant, Q denotes the capacity, and i sat is current capacity [24]. Similarly, the discharging dynamics of various battery types can also be assembled as a vector that shows an identical expression for all elements, It can be inferred that the vectors in (2)-(4) expand alongside an increase of BESSs in the regional grid, with all elements duplicated by corresponding times. The presence of the s-domain function in (2) means vector E exp is not imminently available for EMT simulation. Application of the Inverse Laplace Transform L −1 yields the following differential equation in the time domain, which can then be discretized by approaches such as the Backward Euler method, i.e., where Δt is the simulation time-step.
It is noticed that Ae −Bi sat , the exponential zone voltage of the lithium-ion battery, can be obtained directly, whereas that of the other two types is iterative according to (6) and consequently a proper initialization of each E exp is needed.
The state-of-charge (SOC) defined as the percentage of remaining charge to its nominal value is a critical factor that determines the operation status of a BESS. Knowing the value at an initial time instant t 0 , it can be formulated as When all elements are grouped into corresponding vectors and matrices for parallel processing, element-wise operations are applied to their discrete forms. Take the dynamic charging voltage vector of an array of batteries, for example, each element is calculated independently, involving element-wise multiplication • and division , Similarly, the general expression for the charging and discharging processes of various batteries can be vectorized for parallel processing on the GPU, where V Bmin , V Bmax are vectors denoting the cut-off voltages and fully-charged voltages.
In the circuit EMT simulation, when nodal voltages, instead of mesh currents, are to be solved using Kirchhoff's current law, the Thévenin equivalent circuit of the battery model V Bat -Z B needs to be transformed into its Norton counterpart with an equivalent current source I Beq in parallel with the conductance G B . Assuming the internal resistances of all batteries under study are converted into the conductance and then grouped as a vector G B = [G B1 , G B2 ,... ], the current contribution of the batteries can be expressed as where the vector I Beq = [I Beq1 , I Beq2 ,... ]. The Norton equivalent circuit G B -I Beq of a battery can be extended to an array of batteries for design, online monitoring, and management purposes. As shown in Fig. 1(a), all the batteries of a BESS are organized in an N p × N s array so that each basic unit can be monitored. However, this micro-level modeling leads to a computational burden much higher than that of a lumped single unit. A scalable model is thus proposed, as given in Fig. 1(b), where an array of N p2 × N s batteries can be lumped, and in each of the remaining N p1 ) batteries need to be modeled individually. Then, the Norton equivalent circuit for the array can be expressed as where G BS0 -I BeqS0 and G BSj -I BeqSj represent the Norton equivalent circuit of the N p2 × N s array and a N p1 × N Sj array, respectively. As can be seen, the scalable model can flexibly be as detailed as an entire N p × N s array, and as computationally efficient as a lumped equivalent unit by simply adjusting N p1 and N sj . For a grid-wide study of an AC/DC grid, the lumped battery array model is sufficient when the main focuses are converter transient performance and the consequent impact on system stability.

B. Grid-Connected BESS
A power transmission and distribution network may incorporate a substantial number of utility-scale BESSs, which work as part of a virtual power plant (VPP) for grid resilience enhancement, or for temporary power supply. Despite different functions, they interact with the AC grid via 2 main structures [25], i.e., a single 3-phase voltage-source converter (VSC) denoted as Type 1, and the multi-section Type 2 which typically combines several serial or parallel bidirectional DC-DC converters and a VSC, as given in Fig. 2(a).
The VSC controller in the d-q frame as shown in Fig. 2(b) applies to both BESS grid-integration manners. When the SOC is below its threshold SOC 0 , the battery starts to charge or remain idle; otherwise, in addition to smoothing the power curve of synchronous generators, it may be utilized for rapid frequency control of the grid. In this case, the power order is dependent on the frequency deviation, where ζ is the grid frequency deviation tolerance, f nom is the nominal frequency, K fp and K kI are control parameters. The active power order is then taken as a reference of the outerloop controller in the d-q frame, and the output i * d is then used in inner-loop current control that generates the VSC switching signals V gate along with the q-axis regulation which is virtually identical. The VSC is also able to control its DC bus voltage by setting the d-axis reference d * R as voltage; similarly, in the q-axis, the control object q * R can be either the voltage or reactive power at the point of common coupling (PCC).

1) Voltage Source Converter:
The VSC in both schemes connects to the AC grid via a step-up transformer. In Fig. 3, taking its DC side as a Norton equivalent circuit G dc and J eq , the converter contains up to 11 nodes if the transformer is taken into consideration, briefly expressed as where J Tr and J L are the companion current of the transformer model and the 3-phase inductor, respectively, G Tr is the transformer admittance matrix, G L is the inductor conductance, and Matrix Y ij containing the status of converter switches has a minimal dimension of 5 × 5. The amount of computations arising from solving the 11-D matrix equation is dependent on the way that matrix Y ij is handled. Detailed VSC modeling methods can be categorized as the device-level model and two-state switch model (TSSM) [26].
To reflect the capacity limit as well as to reproduce the behaviors of a real converter under an actual configuration and control scheme, the diode unidirectional conduction should be considered, alongside the static I-V characteristics, which are two main contributors to the non-linearity of the power switch. Fig. 3 shows a complete model of the IGBT and its anti-parallel diode, which can eventually be converted into a Norton equivalent circuit.
The nonlinear semiconductor switch model determines that the entire VSC circuit undergoes Newton-Raphson iteration for numerical convergence and accuracy, thus prolonging the execution time. The TSSM-based converter model, on the other hand, omits the nonlinear static characteristics of the power switch. Instead, it uses a distinct small and large conductance for OFF-and ON-state, respectively. The consequent exemption from Newton-Raphson iteration accelerates the matrix equation solution as well as the simulation.
When bus voltages of the AC network are known, the VSC PCC voltage can be obtained instantly according to the transformer turn ratio. In this case, (14) takes the form of where Knowing the PCC voltage U 4−6 results in the direct solution of converter nodal voltages, 2) Generic TLL-State-Space Model: In the Type 2 scheme, a battery group has a small capacity and the bidirectional DC-DC converter has an ultrahigh switching frequency typically above dozens of kilohertz. Consequently, it brings a considerable challenge to computation efficiency, as the time-step should not exceed a few microseconds to maintain the accuracy, especially when pulse-width modulation which requires a high density of carrier data is utilized.
The state-space model is adopted for the DC-DC converter to ensure an alignment in the time-step. As shown in Fig. 3, the battery represented by its Thévenin equivalent circuit is taken as an input; on the other hand, the VSC is nevertheless incompatible with the state-space model. The transmission line link model [27] is thus introduced noticing that both converters share the same DC bus. Take battery discharging mode for instance, when switches S 1 and T 2 are under conduction, the mixed TLL-state-space model is Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
and when S 1 and T 1 are on, it is Averaging the 2 equations according to the switching duty D leads to the following general state-space equatioṅ where Discretization of (20) by the Trapezoidal rule yields the solution to proceed with EMT simulation, The bidirectional DC-DC converter is linked to the VSC by the TLL signal v i C . Following the derivation of output voltage v o at time instant t, the TLL reflection pulse at the DC-DC converter side which is also deemed as the incident pulse of the VSC at the next time-step can be formulated as As mentioned, the DC side of VSC is always represented by a Norton equivalent circuit, regardless of the component it connects to for a better model compatibility with various IGBT models. Therefore, for BESS Type 2 integration, a Thévenin-Norton transformation is conducted and G dc and J eq are dependent on v r C and G Cp . When the DC terminal voltage is solved by (17), the reflected pulse from the VSC should be updated and subsequently taken as the incident pulse v i C for its DC-DC counterpart solution in the next time-step. It is noted that a TLL decouples the two converters, which can be processed in parallel for simulation acceleration.  time-step is required to test the controller whilst ensuring numerical convergence. Thus, a hybrid AC/DC grid can be deemed as a multi-rate simulation system, and the general modeling is carried out for the AC and DC grids irrespective of their layouts or configurations.

A. Transient Stability Simulation
The AC grid under TS simulation can be divided into 2 categories from a numerical solution perspective, i.e., nonlinear components which are expressed by a set of differential equations, and the linear network with its nodal voltages solved by an algebraic matrix equation, i.e., where Y N represents the network admittance matrix. Linearization of the differential equation leads to the state-space form of (20) where A becomes the Jacobian matrix, The generator is such a nonlinear component that the state-space equation applies, with a fundamental sixth-order model [28], two of which, i.e., rotor angle δ and speed ω, contributed by the rotor mechanical characteristics, Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
where R fd , R 1d , R 1q , and R 2q are winding parameters, H and D are generator inertia and damping coefficient, ω R is the rated speed, and the remaining variables constitute the input vector u. The generator model order will have a further increase when its controller is taken into account. For example, 3 additional states induced by the following excitation system, where T 1 , T 2 , T R , T W , and K P SS are AVR and PSS constants, and Λ() denotes a constant as a function of initial values. Using (20) and (26), the state vector x in (27)-(35) can be solved. The subsequent acquirement of the current vector I enables the network nodal voltages to be derivable from (25) since the admittance matrix Y N is always known.

B. DC Links for Renewable Energy Integration
The high-voltage direct current (HVDC) link based on the modular multilevel converter (MMC) is favored in delivering renewable energies from remote areas to the main grid. The HVDC converters and renewable generation units can be modeled in detail in a similar vectorized manner for massively parallel processing; however, when their powerflow conditions are the primary focus, the lumped averaged model is sufficient. The rectifier acts as a grid-forming converter that provides a stable three-phase voltage for the wind farm and PV plant, while the inverter regulates the DC link voltage, as given in Fig. 5, where the controller has an identical d-q frame scheme given in Fig. 2(b) to yield the three-phase modulation signals m abc . On the rectifier side, the current reference is obtained by where quantities with superscript * denote reference values, V * d and V * q decide the AC voltage v MMC , and K vp and K vi are controller parameters.
The submodule capacitor dynamics can be omitted in TS analysis since the converter output active and reactive powers are the focus. Therefore, using phase-shift control, the instantaneous output phase voltage of the grid-forming MMC can be determined by the upper-arm and lower-arm modulation signals m u and m l , as where N L is the number of submodules per arm, c denotes the carrier signals and in each phase, Then, based on a DC link equivalent circuit on the rectifier side, the phase voltage at the PCC can be expressed as where i RW is the current injection of the WF and PV plant, G L f , G C f , I Leq , and I Ceq compose the discretized forms of the L f -C f filter in the AC yard. The 3-phase PCC voltage as the control object subsequently undergoes Park's transformation so that V d and V q in (36) can be obtained. The inverter stations, as well as BESSs, are connected directly to the main grid. Following the AC network solution of (25), the PCC voltage magnitude V pu and angle θ are known, which participate in the BESS's VSC control and solution, as in (17), and also in the MMC phase current derivation, where V nom is the PCC nominal voltage. Then, the power of each inverter station and BESS can be calculated and used for the main grid simulation.

A. Modular BESS Kernel Design
Since a grid-connected BESS contains multiple types of components, including the battery, VSC, DC-DC converter, and controllers, each of them is taken independently and designed into a CUDA C++ kernel [29], i.e., a global function that can launch a predefined number of threads corresponding to components of the same type when being invoked on the GPU using the SIMT paradigm. The IGBT/diode as a fundamental component of power converters, on the other hand, is taken as a CUDA C++ device function which can be directly called by a kernel, as shown in Fig. 6(a). This modular software architecture provides a kernel-based library that enables a flexible combination of batteries and converters in the high-performance simulation.
Based on the vectorized parallel EMT modeling, electrical quantities including inputs and outputs (I/Os) of each BESS kernel are represented by arrays in the CUDA C++ program design. For instance, the main I/Os of the battery kernel include its type, current, voltage, state of charge, and the Thévenin and Norton equivalent circuits. The VSC kernel and its controller kernel are shared by Type 1 and Type 2 utility-scale BESSs due to the proposal of the generic TLL-state-space model, where the DC-DC converter kernel is peculiar to Type 2.
Initialization of the I/O arrays is generally required, and since kernels are invoked from the host CPU, these variables are defined and initialized in the host before being copied to the device GPU. Nevertheless, not all signals shall be handled in the same manner -the simulation time instant and step size, as well as the VSC PWM carrier c, can be accessed directly by the GPU as an individual quantity, albeit they are not defined on the device. Fig. 6(b) shows that once a kernel is invoked, a CPU signal is shared by all threads, and elements in an array will be dispatched to each thread according to their addresses in the memory and corresponding thread indices.
The coexistence of both BESS types determines that the total thread number of the battery kernel N Bat is a summation of the number of DC converters and Type 1 VSCs, The battery internal conductance array G B is the input of both VSC and DC converter models. Thus, its elements are dispatched to all threads launched by these two kernels.

B. Heterogeneous Multi-Stream Computing Paradigm
Due to a lower clock frequency and thread synchronization, the GPU is not always more efficient than its counterpart, especially when handling a system with insufficient homogeneity or dominated by inhomogeneity. Hence, the heterogeneous CPU-GPU HPC paradigm which leads to a higher computing resource utilization is adopted for simulating the AC/DC grid more efficiently. The extent of homogeneity is taken as a core criterion in terms of task assignment to the processors, along with the expected simulation duration. CPU functions are designed for the IEEE 118-bus system and the 4-terminal DC grid including its source renewable energies since they demonstrate insufficient homogeneity and consequently are all inherently processed successively, as shown in Fig. 7.
Though the SIMT paradigm enables concurrency of threads launched by the same CUDA C++ kernel, different types of BESS components are still implemented sequentially. A pipelined scheme is applied to these kernels so that additional parallelism can be gained. Then, the three components, i.e., the VSC, DC-DC converter, and the battery, are invoked in separate CUDA C++ streams, while the controllers are grouped with the corresponding converters so that they are still implemented sequentially in the program. In the end, these streams are synchronized before stepping out of the device.
The converters and AC grid are processed with a time-step of 20 μs and 5 ms, respectively, and this multi-rate scheme is realized by adopting two indices t and T representing their time instants. Once the host takes over again, it either continues the converters computation when t is smaller than T , or, if the reverse is the case, proceeds with the AC grid dynamic simulation, which starts with solving the synchronous generators' differential equation (24), followed by the network equation (25) to derive bus voltages. A full simulation loop completes after all history variables are updated.
Simulations of a 20-second duration are conducted on a server with 20-core Intel Xeon E5-2698 CPU, 192 GB RAM, and an NVIDIA Tesla V100 GPU. The computing speeds with Type 1 BESS under different scales are summarized in Table I. With the nonlinear IGBT model, the default GPU simulation speeds are lower than that of the CPU when the number of BESS is small. The multi-stream GPU execution gains about 1.8 times of speedup over CPU when the computing objective is 100 BESSs, and the acceleration rate SP keeps increasing alongside the BESS scales, reaching up to 228 when 50,000 BESSs are simulated. A similar phenomenon is witnessed for the ideal TSSM-based systems, and without Newton-Raphson iteration, the simulation speed becomes faster. It can also be noticed that the multi-stream scheme brings approximately an extra 10% speedup over the default mode. With each VSC linking to 20 parallel DC-DC converters, the speedups could reach around 178 and 163 respectively for the nonlinear IGBT and TSSM cases when 200,000 batteries are involved, as listed in Table II. With the AC/DC grid taking merely 1.5 seconds for the same simulation duration, it becomes feasible to accurately study a large-scale system with numerous BESSs by the proposed heterogeneous multi-stream computing architecture, e.g., the total computational time for the AC/DC grid shown in Fig. 4 is around 20 secs when all BESSs are Type 1, or 27 secs even if all BESSs integrated into are Type 2.
As for the scalable model, the computational burden reaches its heaviest when all the battery units need to be simulated. Fig. 8 provides the speeds of computing all the battery arrays in 1 BESS and 100 BESSs, where each array is N p × N s . It can be seen that while a larger N p or N s leads to a higher speedup by the GPU, the actual execution time t exe is also longer. Since a more detailed model always exists, the modeling level is determined by the study purpose. Therefore, the individuality of a battery unit can be omitted in grid-scale analysis.

V. AC/DC GRID SIMULATION RESULTS
Since the TS analysis of an AC/DC grid integrated with massive quantity of BESS EMT models is infeasible by pure TS or EMT simulation packages or even conventional hybrid EMT-TS approach due to the simulation type constraint or a tremendous computational burden, the proposed modeling method and heterogeneous computing concept are validated in a two-step manner using commercial tools MATLAB/Simulink and DSATools TM /TSAT, respectively.

A. Model Testing
The battery parallel EMT model is validated by the charging and discharging processes of a group of lead-acid batteries with a total rated capacity of 1350 Ah. Since GPU programming allows threads to have different parameters, the proposed computing method only requires one simulation for various circuit conditions. In Fig. 9(a), 3 GPU threads representing a discharging resistance of 1 Ω, 0.67 Ω, and 0.33 Ω are plotted. It shows that both the battery voltages and SOCs have a good agreement with Simulink results. With the same resistances, the batteries are charged till SOC = 100% in Fig. 9(b), which also gives identical curves. The results demonstrate that the duration of both processes is approximately proportional to the charging or discharging resistance. Fig. 9(c) provides 1 MW step-up and step-down test results of a Type 1 BESS unit in the HPC program and MATLAB/Simulink. The identical curves indicate an accurately implemented power converter model, including its controller.
The power step test is also conducted for AC grid model validation and assuming that the BESS group at Bus 83 has a sufficient capacity, the results of a single-bus injection are given in Fig. 10. At t = 10 s, an extra 500 MW load is added to Bus 59. Consequently, the generators witness a dramatic perturbation with several bus voltages sagging below 0.9 p.u. momentarily, and decreasing generator frequencies due to a deficiency of active power. Then, 3.5 s later, 834 MW is injected into the network via Bus 83, the generator voltages and frequencies are generally restored after a few oscillations, and the rotor angles enter a new steady-state. The DSATools TM /TSAT results are also provided for comparison to verify the modeling and simulation method of the AC grid.

B. Post-Contingency Stability Restoration
As a single BESS group has limited capacity, power injections at the 5 buses which have BESS installations as indicated in Fig. 4 are then carried out, assuming that each of the 500 BESS units has a capacity of 2.0 MVA. When the same overload occurs, the BESSs are activated in a sequence of Buses 56, 63, 43, 33, and 83 to provide an adaptive active power compensation and a constant 0 MVAr injection, and Fig. 11 provides the results from the proposed heterogeneous simulation. Fig. 11(a), (b) give generator frequencies under various power compensation schemes. Frequency instability is witnessed if all 5 BESSs remain idle. Similarly, the 3 BESS groups in Zone 2 are inadequate to maintain the AC/DC grid stable as the bus frequencies continue to drop. The participation of Zone 1 BESSs enables the generators to restore their frequencies, and the remedial process can be expedited with an extra contribution from its Zone 3 counterparts. Additionally, Fig. 11(c), (d) show that the entire grid remains stable under a new equilibrium. The activation of an individual BESS unit in these 5 groups is given in Fig. 11(e), (f), which show that the subsequent group partakes grid recovery only when its immediate previous counterpart reaches its maximum capacity, and since a power surplus is available, the last group remains below its limit.
In Fig. 12, the corresponding stability reinstatement results from DSATools TM /TSAT are provided for validation under the condition that all 5 groups of BESS are available. The output power of each BESS unit from EMT simulation is averaged  with a window equal to the time-step of TS simulation and then injected into the AC bus. The fact that the generator quantities are virtually identical to those from the heterogeneous simulation in Fig. 11 demonstrates a sufficient accuracy of the proposed modeling and computing method.
The stochasticity of renewable energies also has a profound impact on the AC grid stability. Fig. 13(a) shows the active The inadequacy of active power caused a frequency instability, as well as potential voltage insecurity, if no further action is taken, as shown in Fig. 13(b), (c). The 5 groups of BESSs intervene in a reverse sequence for resilience enhancement. It is noted that with 1 group in Zone 3 or 2 groups in Zone 1 and 3, the grid is still insecure in terms of frequency stability; the involvement of the third group in Zone 2 yields abundant active power for remedy of the frequency as it begins to rise. Fig. 13(d) demonstrates that all bus voltages of this large-scale system can recover. It is observed from Fig. 13(e), (f) that once the contingency occurs, the first group immediately participates in the remedial action at t 1 ; nevertheless, it quickly reaches its reactive power capacity so that the second and third groups are activated at time instant t 2 and t 3 . The grid frequency begins to rise at around t 4 , and then at t 5 , when the grid frequency has already been within its deviation tolerance, the third BESS group reduces its output power so that the bus voltages can eventually maintain at the pre-contingency level.
When a severe contingency, such as simultaneous faults on Bus 33, 30, and 23, appears at t 0 , out-of-step and consequent zone separation could occur if Tie Line 19-34 has a small capacity. As given in Fig. 14(a), generators in Zone 1 have a rising frequency, whilst their counterparts in Zone 2 and 3 are slowing down. Due to the asynchronism, the relative angles of generators in Zone 1 keep climbing and therefore, the entire system is insecure in terms of angle stability, along with voltage instability since some bus voltages approach 1.08 pu, as shown in Fig. 14(c) and (e). In contrast, with 3 BESS groups on Bus 33, 43, and 83 activated, no violation of any of the 3 system stability indices is witnessed. Fig. 14(b) demonstrates that the generator frequencies are restricted within ±0.03 Hz. In the meantime, a maximum angle difference slightly above 200 degrees as given in Fig. 14(d) implies that the AC/DC grid does not have any concern on angle stability. Fig. 14(f) shows that with the d-q frame controller of BESSs, the bus voltages can restore their pre-contingency levels. Fig. 15 provides some power converter transients after the grid separation. The rising and falling frequencies in Zone 1 and Zone 2-3 respectively indicate a power surplus and deficiency in these zones. Therefore, each BESS unit in Zone 1 absorbs approximately 0.5 MW, and units in the other 2 zones are discharging with a power of 0.4 MW and 0.1 MW. Similarly, a BESS unit in Zone 2 and 3 also provides about 0.25 MVAr and 0.6 MVAr to maintain stable bus voltages. The DC grid, as expected, is not significantly affected by the grid separation since the MMCs possess the functionality of regulating the DC voltages so long as their PCC voltages are not subjected to severe distortions.

VI. CONCLUSION
This work illustrates heterogeneous sequential-parallel modeling and high-performance computing of an entire large-scale AC/DC grid integrated with massive utility-scale battery energy storage systems for accurate power system stability study. Prompted by its extensive distribution in the modern power system for resilience enhancement, various types of battery EMT models are vectorized and modularized for element-wise parallel multi-threading processing. The proposal of a generic TLL-state-space model improves parallelism so that prevalent BESS configurations can be programmed uniformly in the same CUDA C++ kernels once they are implemented on the manycore GPU. In the meantime, the high clock frequency of CPU suggests the deployment of less homogeneous components in the AC/DC grid on it, so that a twofold hybrid simulation with a dramatically improved efficiency is formed, i.e., EMT-TS and CPU-GPU co-simulation. Remarkable speedups over pure CPU processing were obtained by the GPU in handling BESSs; nevertheless, an extra margin can still be sought by utilizing the multi-stream architecture. Therefore, it demonstrates that the HPC methodology utilizing diverse processors expands the scope of AC/DC systems that can be simulated efficiently since it caters to an increasingly heterogeneous energy network for which a grid-wide study of the interactions among numerous energy systems requires that their components are modeled in detail and computed which would otherwise be computationally overwhelming for conventional simulation methods.