Vulnerability modellers toolkit, an open-source platform for vulnerability analysis

Vulnerability functions describe the expected loss for a given ground shaking intensity level and are an essential component in probabilistic seismic risk assessment. This manuscript presents a novel open-source platform for the derivation of analytical fragility and vulnerability models, covering state-of-the-art methodologies, and addressing critical issues in vulnerability modelling such as uncertainty propagation, validation/verification of results and sufficiency/efficiency of intensity measure types. This framework is divided into seven modules designed to guide users through the different stages of analytical vulnerability modelling from the selection of ground motion records to the validation and verification of the models. The platform was implemented in the Python programming language and it is freely accessible through a public GitHub repository. A graphical user interface is included with the toolkit and is intended to be a general-purpose method for modellers to interact with the vulnerability modellers toolkit (VMTK). Experienced users are encouraged to use Python’s scripting capabilities to explore all the features of the VMTK source code and to contribute to future releases of the toolkit.


Introduction
Fragility models define the probabilities of exceeding a number of damage states conditional on a ground shaking intensity measure, whilst vulnerability models establish the probability of loss ratio conditional on a ground shaking intensity measure (e.g. Yepes-Estrada et al. 2016). Such models, alongside the seismic hazard and exposure counterparts, are an essential component in probabilistic seismic risk analysis or in modelling earthquake scenarios. Considering that the majority of economic losses and casualties due to earthquakes are a consequence of poor performance of manmade structures, understanding seismic vulnerability is essential for the development and implementation of risk mitigation strategies.
In the last three decades there have been hundreds of studies that focused on the development of methodologies for vulnerability assessment, derivation of fragility functions, and more recently, on the creation of tools for vulnerability analysis. Some of these efforts have been described by Calvi et al. (2006), D'Ayala et al. (2015, Rossetto et al. (2014), Maio and Tsionis (2016) and Silva et al. (2019a, b). Some of the fragility and vulnerability models that resulted from past vulnerability studies have been compiled in an open database, as described Yepes-Estrada et al. (2016). The latter study indicated that while the availability of fragility and vulnerability models is particularly evident in the western world, there is still a clear paucity of models for some regions or particular types of construction. Moreover, some of the existing models have been developed for general building classes, whose applicability might be inadequate for building-specific analysis. For these reasons, earthquake engineers are frequently required to develop new fragility and vulnerability models. These functions are usually developed using scripts developed by the authors, which in the vast majority of the cases are not released along with the associated fragility and/or vulnerability functions. This lack of transparency might not allow users to fully comprehend the modelling options and assumptions involved in the development of the models, which might be fundamental to understand the range of applicability of the results.
It is thus important to improve the transparency in vulnerability modelling, through the employment of open-source tools whose methodologies and assumptions can be reviewed, and even adapted, by the users. Such approach is already followed in other fields such as probabilistic seismic hazard analysis (e.g. OpenSHA-Field et al. 2003, OpenQuake-Pagani et al. 2014, seismic risk and loss assessment (e.g. CAPRA Cardona et al. 2012;OpenQuake-Silva et al. 2014a) and structural modelling (e.g. OpenSees-McKenna et al. 2000). In fragility analysis, it is worth mentioning the SPO2IDA (Vamvatsikos and Cornell 2006), and SPO2FRAG (Baltzopoulos et al. 2017) tools. The former allows estimating the results of incremental dynamic analysis (IDA-Vamvatsikos and Cornell 2002) from a static pushover curve, while the latter leverages on the results of SPO2IDA to derive a set of fragility curves. Also in the field of fragility assessment, FRACAS by Rossetto et al. (2016) proposes a new methodology to develop fragility function based on the capacity spectrum.
The Vulnerability Modellers ToolKit (VMTK) described in this manuscript is intended to provide earthquake engineers with a comprehensive platform to develop vulnerability models, while allowing a wide flexibility in terms of seismic demand, structural capacity, damage criteria and damage-to-loss conversion. The toolkit was divided in seven modules intended to provide a logical workflow through the different stages of analytical vulnerability modelling using nonlinear dynamic analyses. These modules are organized into: (i) selection of ground motion records, (ii) definition of structural capacity, (iii) nonlinear dynamic analyses on single degree of freedom (SDOF) oscillators to estimate the response of the building to earthquake loads, (iv) conversion of the results from the dynamic analyses into fragility functions, (v) derivation of vulnerability functions, (vi) comparison with other models and (vii) validation of the results through the calculation of average annual losses and other common risk metrics.
The VMTK was developed in the Python programming language (Lutz 2006), in order to ensure that it can be easily integrated with other software that share the same programming language (e.g., OpenQuake, OpenSees), and to leverage on the various scientific libraries (e.g. SciPy, NumPy). To improve user experience, a graphical user interface (GUI) with plotting capabilities is included with the toolkit. Even though the GUI is capable of performing a complete vulnerability assessment, more experienced modellers are encouraged to develop their own scripts using the available source code, as well as to contribute with their own methodologies. The integration in a single environment of all phases of seismic vulnerability assessment is a unique feature of the VMTK. This platform also integrates in its workflow critical issues in vulnerability modelling related with the propagation of sources of uncertainty (e.g., record-to-record, building-to-building, damage criterion), validations/verification of the resulting functions through risk metrics (e.g., average annual loss ratios), and assessment of the performance of the chosen intensity measures in terms of sufficiency and efficiency.
This manuscript provides a description of the capabilities of both the source code and the GUI. The complete toolkit, including source code and GUI, is currently hosted in a publicly available GitHub repository. 1

Demand module
The ground motion demand is known to be one of the main sources of uncertainty in structural vulnerability assessment (Shome and Cornell 1999). For this reason, it was considered necessary to include in the VMTK a module to assist the vulnerability modeller in the selection of suitable ground motion records. In the current implementation of the VMTK, there are three options to select and scale ground motion records. The first is to simply use all the records provided by the user. This option leaves the record selection to the user's criteria (i.e., it is assumed that the user has already selected a collection of ground motion records that fulfils the requirements of the vulnerability analysis, or it is using a wellknown set of ground motion records-e.g., FEMA 2009).
The second option uses a list of intensity measure (IM) levels (defined by the user) to create bins of target ground motion intensity levels and populate them with ground motion records. The algorithm initiates by computing intensity measures for the complete collection of ground motion records provided by the user. For convenience, a list of IMs has already been pre-programmed in the GUI (PGA, SA (0.2 s), SA (0.3 s), SA (0.5 s), SA (0.6 s), SA (1.0 s) or SA (2.0 s)) and can be selected by the modeller. However, it should be noted that if a specific vulnerability assessment requires a different IM, this is available to the modeller through the VMTK's source code. An additional advantage of leveraging on Python's scripting capabilities combined with the VMTK's source code, is that it allows modellers to consider more robust IMs in their analyses (e.g. average spectral acceleration- Baker and Cornell (2006); Kohrangi et al. (2018)). An important note to consider is that each input ground motion file must be formatted as two comma delimited column arrays where the first column contains the time series (in seconds) and the second column contains the corresponding ground acceleration (in the units of g). Once the intensity measures for each ground motion record are computed, for each intensity bin the algorithm randomly selects suitable records until each bin contains a predefined number of ground 1 3 motion records. In case there are not enough ground motion records to reach the minimum number of records for a given intensity bin, the algorithm will attempt to fill the missing number of records by scaling, conditional on the minimum and maximum scaling factors input by the user. If after scaling there are still not enough records to comply with the minimum number of records per bin, the algorithm continues to the next intensity bin.
The third selection method is a translation to Python of the algorithm proposed by Baker and Lee (2018) 2 of the conditional spectrum method (CSM). The first stage of CSM involves performing a seismic hazard disaggregation to understand the distribution of magnitude, distance and epsilon for a given location. For convenience, the VMTK was made fully compatible with the hazard disaggregation analysis outputs (i.e. magnitude, distance and epsilon distribution) of the OpenQuake-engine (Pagani et al. 2014). The seismic hazard disaggregation is then used to produce an estimate of the conditional mean spectrum. For the intended intensity measure level, the records whose spectrum minimize the distance to the conditional mean spectrum are selected. It should be noted that record selection following this approach is performed for a given intensity level, and the hazard disaggregation and conditional mean spectrum estimation is repeated for each intensity measure level.
The graphical user interface for the demand module is illustrated in Fig. 1. Throughout this manuscript, the European Engineering Strong Motion Database (ESM) (Luzi et al 2016(Luzi et al , 2020 was used as the source of ground motion records necessary to perform the demonstration of the VMTK's capabilities. Figure 2 presents the resulting response spectra for three suites of records selected using the previously described options.

Capacity module
In the VMTK the capacity of the structures is defined through bilinear, trilinear or quadrilinear backbones (an explanation on the most suitable type of backbone curve for a given building class is available in Martins and Silva 2020). The capacity module processes capacity curves in the acceleration-displacement response spectrum (ADRS) format (i.e. spectral displacement vs spectral acceleration) that can either be directly introduced using Response spectra for ground motion records (Top) entire strong motion database, (Middle) selected based on intensity intervals from PGA = 0.05-2.0 g and (Bottom) selected based on the CSM conditioned on a single target SA value for Lisbon the GUI, or imported from a comma delimited file containing the coordinates of the capacity curves in columns arrays. A benefit of the latter option is that users can explicitly account for the building-to-building variability by introducing multiple capacity curves for the same building class (e.g., Silva et al. 2014b). The graphical user interface for the capacity module is illustrated in Fig. 3.
To illustrate all the features of the VMTK, it was decided to consider the structural capacity curves developed within the framework of the Horizon 2020 European project SERA (Romão et al. 2020a, b). For the purposes of this manuscript only low-code reinforced concrete structures with masonry infills were considered. According to Romão et al. (2020a), for each number of storeys, 300 prototype structures with varying geometry and material properties were generated using Monte Carlo simulation, to account for the building-to-building variability, and each was then designed to 'low-code' standards. Numerical models of each of the 300 structures were then developed, pushover analysis was undertaken, and the results were transformed to obtain SDOF capacity curves. Figure 4 Romão et al. 2020b) the median capacity curves, which were used to demonstrate the functionalities of the remaining modules of the VMTK. The GEM taxonomy (Brzev et al. 2013) was adopted throughout this manuscript to categorize each building class (i.e. CR = Reinforced concrete, LFINF = infilled frame and CDN = Low code level).

Structural response module
To compute the nonlinear response of the single-degree-of-freedom (SDOF) oscillators, the VMTK was integrated with the open-source finite element model software OpenSees (McKenna et al. 2000) through its Python implementation (Zhu et al. 2018). The nonlinear behaviour was introduced in numerical models through the uniaxial material Pinching4 and the capacity curves previously introduced by the user. The numerical model allows for cyclic strength degradation that is implemented in three stages: (i) unloading stiffness degradation, (ii) reloading stiffness degradation and (iii) strength degradation. It should be noted that the users can deactivate strength degradation by setting it to None on the GUI or False in the source code. The default degradation implementation was adapted from simpleSDOF4.tcl by Vamvatsikos (2011) and makes use of the energy option of Pinching4 (for more information the reader is encouraged to visit OpenSeesWiki 3 ). Advanced users are welcomed to edit the degradation implementation to better suit their applications. Moreover, due to the open-source nature of the VMTK, other numerical elements (already available within the OpenSees software) can be used for the assessment of the nonlinear response of the SDOF oscillators.
In addition to energy dissipation through hysteresis, the toolkit also allows modellers to include damping in their analyses. The default implementation is a mass proportional Rayleigh damping, but other damping options are available, as described in the OpenSees manual. When using the GUI, damping should be introduced through a comma delimited file containing two columns in which the first holds the building classes and the second the respective damping ratio. This format allows for batch computation of structural response across different building classes, only requiring the user to input the path to the directory containing the capacity curve files. The graphical user interface for the structural response module is illustrated in Fig. 5.

3
After performing the structural analyses, the VMTK produces a distribution of Engineering Demand Parameters (EDPs) versus intensity measure (IM) levels (see example in Fig. 6), which are then used for the derivation of the fragility functions. Users can export either the maximum displacement (Max displ.) or the maximum acceleration (Max accel.) as the EDPs.

Fig. 6 EDP vs IMLs distribution for case study and linear regression
Plotting these distributions can be useful to identify unexpected numerical results (e.g. unrealistically large deformations). Section 2.4 presents a methodology that aims at minimizing the influence of some of these effects. Nevertheless, early identification of errors in the numerical analyses is still a valuable feature, and therefore it was deemed necessary to provide the GUI with plotting capabilities that allows users to visually inspect their results for potential errors (see example in Fig. 5).

Fragility module
The fourth module of the VMTK is devoted to the derivation of fragility functions. After the definition of the relationship between structural response and ground shaking intensity (see Fig. 6), fragility functions are derived using the cloud analysis approach proposed by Jalayer et al. (2015). Two regression algorithms were implemented in the toolkit. In the first option, a simple linear regression is used, in which all data points are equally used for the best-fit curve. In the second case, displacements deemed excessively high are treated differently through a censored regression analysis (Stafford 2008). The reasoning for the second regression method lays in the fact that in numerical analyses convergence can be attained for levels of displacement that are incompatible with structural stability. This occurs mainly due to limitations in numerical modelling and it can introduce bias in the best-fit curve. Censored regression analysis requires the definition of a threshold (i.e. maximum displacement or acceleration), after which the data points are assumed to be affected by numerical errors. This threshold is defined by a censoring factor defined as the ratio between the maximum admissible EDP and the threshold for last damage state considered (e.g., complete damage). For example, a censoring factor equal to 1.5 means that EDPs 1.5 times above the threshold for the last damage state will be treated differently. A step-bystep description of this process (with or without the censored regression) can be found in Martins and Silva (2020).
If the user has not explicitly accounted for the building-to-building variability (i.e. only one capacity curve has been provided per building class), it is possible to increase the uncertainty to account for this source of variability. The VMTK performs this adjustment by adding to the dispersion of the record-to-record variability (σ rec_to_rec ) the contribution of the building-to-building variability (σ bld_to_bld ) according to Eq. (1). Gokkaya et al. (2016) and Casotto et al. (2015) recommend a value of 0.30 for this parameter. Nonetheless, this parameter can be set to any value in both the GUI and the source code to better accommodate the modellers needs. It should be noted that, natively, this operation is performed to the dispersion of EDPs conditional on IM. If the users wish to add the contribution of building-to-building variability to the final fragility function, then they should set this parameter in the VMTK to zero and later add the contribution of building-to-building to the sigma of the fragility function through square root of sum of squares.
The graphical user interface for the fragility analysis module is illustrated in Fig. 7.
For demonstration purposes, the damage model proposed by Martins and Silva (2020) was adopted herein. The damage model consists of four damage states (DS): Slight damage (DS1), Moderate damage (DS2), Extensive damage (DS3) and Complete damage (DS4). The damage thresholds are defined using the yield (Sd y ) and ultimate (Sd u ) displacements of the capacity curves, as depicted in Fig. 8.
The resulting fragility functions are defined using cumulative lognormal functions, as presented in Fig. 9 for the previously described building classes.

Vulnerability module
For the derivation of vulnerability functions, two methodologies was implemented in the VMTK. In the first approach, the probability distribution of loss ratio is computed using the resulting fragility model and a damage-to-loss model (i.e., relation between each damage state and the corresponding loss ratio). The damage-to-loss model is For the demonstration of this module, we adopted the damage-to-loss model described in Martins and Silva (2020), whose mean loss ratios for each damage state are illustrated in Fig. 10.
The mean loss ratio of the vulnerability functions at each intensity measure level (LR|IM) is equal to the product between the probability of each damage state (P[DS = ds i |IM]) and the associated mean loss ratio (LR dsi ), as described in Eq. (2).
Following this procedure and using the fragility functions (see Fig. 9) and damageto-loss model (see Fig. 10) presented previously, the vulnerability functions depicted in Fig. 11 were obtained.
The second methodology included in the toolkit follows the procedure described in Silva (2019), in which the loss ratios (LR|IM) are computed directly from the results of the nonlinear dynamic analyses using a relationship between the engineering demand parameter and the expected loss ratio (hereafter referred as EDP-to-loss model). This approach allows propagating directly the record-to-record and building-to-building variability into the distribution of loss ratios, and avoids approximation errors related with the use of cumulative lognormal functions to define each fragility curve. The EDP-to-loss model is defined in a two-column array comma delimited file. To demonstrate this workflow, the EDP-to-loss model presented in Fig. 12 was used, leading to the vulnerability functions depicted in Fig. 13. A cumulative lognormal function was used to define each curve.
The graphical user interface for the vulnerability analysis module is illustrated in Fig. 14. (

Comparison and validation modules
The final two modules of the VMTK offer the possibility to compare the results with other available models, or to estimate risk metrics that can either be compared with the results from other models, or to assess the relative risk between the various building classes. For convenience, the toolkit is provided with the complete set of fragility and vulnerability functions from the Global Earthquake Model vulnerability database (Martins and Silva 2020) (see example in Fig. 15). The user can also compare their results with other existing models by providing the adequate fragility or vulnerability arrays. When utilizing the graphical user interface, the modeller can visualize the comparison directly in the toolkit's environment, as illustrated in Fig. 15. The toolkit also allows estimating common risk metrics using the resulting fragility or vulnerability functions and seismic hazard curves defined in the OpenQuake format. In this process, average annual loss ratios (AALR) or the annual probability of collapse (e.g., Eads et al. 2015) for each building class are estimated and presented in the same plot. This comparison can be particularly useful to compare fragility or vulnerability defined in terms of different IMs. Moreover, these risk metrics can shed light to whether the associated fragility or vulnerability functions are realistic. For areas with significant seismic hazard, an annual probability of collapse lower than 10 -6 might be unusual, as such levels of safety are more common for critical facilities such as power plants.
On the contrary, annual probability of collapse above 10 -3 should be plausible only for informal non-engineered construction, and with clear evidence of widespread damage from past events. Risk metrics that fall outside of this range could be due to an under-(or over) estimation of the structural capacity of the associated numerical models, or a rather conservative (or optimistic) damage criterion. This module of the VMTK aims at supporting users in these sanity checks. The graphical user interface for this module is illustrated in Fig. 16.

Final remarks
It is clear that the assumptions and methodologies used for vulnerability assessment can influence the final results (e.g., Strasser et al. 2008;Silva et al. 2013). However, the fact that some of these assumptions are often not disclosed and the code used for the vulnerability derivation is rarely released, users are left without sufficient information to properly understand the applicability of the resulting vulnerability functions. The use of an opensource platform for fragility and vulnerability analysis can improve the transparency of future models, and allow an unbiased comparison between models developed by different researchers. This study describes an open-source toolkit aimed at earthquake risk analysts. The toolkit is currently implemented in Python language and fully compatible with all the major operating systems. For the seismic hazard and structural analyses, the toolkit leverages on existing and well-tested open-source software such as the OpenQuake engine (Pagani et al. 2014;Silva et al. 2014a, b) and OpenSees (McKenna et al. 2000). The current implementation of VMTK is organized in seven modules (ground motion selection, capacity curve, structural analysis, fragility assessment, vulnerability assessment, comparison and verification) intended to assist the modeller through an analytical vulnerability assessment. Due to the open-source nature of the toolkit, users can modify and expand any of the modules, and contribute with their code to the scientific community (similar to what is currently done with the development of the OpenQuake engine). This possibility might be particularly useful for the integration of more sophisticated numerical models to simulate structural response, a feature that is often necessary for the assessment of damage in non-structural components or for vulnerability in terms of fatality rates (e.g. Beck et al. 2002;Formisano and Marzo 2017;Zhang 2019).
In order to demonstrate the capabilities of the toolkit, this study described the development of vulnerability functions for European reinforced concrete infilled structures from the recent European SERA project (Romão et al. 2020a, b), while illustrating the various modules of the graphical user interface. It should be noted, however, that advanced users might be interested in using directly the source code of the toolkit, which contains additional functionalities. For example, the source code allows users to verify the efficiency and sufficiency (Luco and Cornell 2007) of combinations of intensity measures and engineering demand parameters, which are fundamental indicators to evaluate the expected performance of fragility and vulnerability models. Finally, the VMTK aims at going beyond the derivation of vulnerability functions, and support users in the validation and verification of the results, which is a step frequently neglected in vulnerability analysis, but critical to improve eventual probabilistic risk analysis or earthquake scenario simulations using the resulting functions.