On the Employment of Machine Learning in the Blockchain Selection Process

Given the growing increase in the number of blockchain (BC) platforms, cryptocurrencies, and tokens, non-technical individuals face a complex question when selecting a BC that meets their requirements (e.g., performance or security). In addition, current approaches that aid such a selection process present drawbacks (e.g., require specific BC knowledge or are not automated and scalable), which hinders the decision process even further. Fortunately, techniques such as Machine Learning (ML) allow the creation of selection models without human interaction by identifying the BC features that match the requirements provided by the user in an automated and flexible manner. Thus, this work presents the design and implementation of an ML-based BC selection approach that employs five ML models to select the most suitable BC given user requirements (e.g., BC popularity, fast block inclusion, or Smart Contract - SC support). The approach follows an ML-specific data flow and defines a novel equation to quantify the popularity of a BC. Furthermore, it details the models’ accuracy and functionality in two distinct use cases, which shows their good accuracy (>85%). Finally, discussions on (a) the ML usefulness, (b) advantages over rule-based systems, and (c) the most relevant features for the BC selection are presented.

where more than twenty thousand cryptocurrencies, tokens, and BC platforms are listed. With such a myriad of platforms, selecting the most appropriate BC based on specific requirements becomes a complex and cumbersome task for services that rely on the management of data stored in different BC platforms and require BC interoperability [37].
The combination of the myriad of BC platforms and often lack of in-house BC capabilities (i.e., skills and understanding) of companies [11] catalyzes such a BC selection problem. In addition, the scarcity of standards regarding the development of BC platforms [32] results in a variety of requirements, performance characteristics, and security features. Also, the volatility of cryptocurrencies' price [19] needs to be evaluated not only in technical but also in economic BC dimensions. A prominent example is the need for interoperability within Decentralized Exchanges (DEX), which requires knowledge of different underlying technology, security, legal implications, and economic implications to perform crosschain swaps [2]. Another example is found in the healthcare sector, in which the increasing digitization of processes has led to several BC proposals [18], [27] to cope with the fragmentation of individually maintained Electronic Health Records (EHR). Similarly, the myriad of available solutions creates an additional challenge to handle different EHRs siloed across BC solutions, further emphasizing the need for interoperable solutions.
As the number of distinct BCs and their characteristics increase, the technical knowledge required to identify the best suitable BC for a given use case and make the BC selection becomes a non-trivial task [2], [3]. Thus, based on the current BC scenario, there is a need for specialized solutions that support the BC selection task in a straightforward manner (i.e., reducing management complexity to an end-user). Thus, concepts widely established in network management can be used in the selection process to (i) provide a simplified interface in which a user determines, through policies, highlevel guidelines driving their preferences, and (ii) automate selection processes based on historical events recorded onchain, previous user-configurations, and economic data. While the use of policies in the selection process had been previously explored by the authors [35], [38], this work lays a specific focus on solutions that rely on Machine Learning (ML) techniques to understand requirements and patterns that can be used to simplify the process of BC selection and, consequently, contribute to an adequate usage of BCs by the research community and industry.
Recent studies on ML applications for communications and networking systems [15], [26] highlighted their ability to process large amounts of structured/unstructured data, extract valuable patterns, learn from historically collected records, and make accurate predictions. Examples of applications exploring BC and ML include (a) solutions that contribute to the increase reliability and trustworthiness of mobile networks [21], (b) novel decentralized network security applications [33], and (c) cost-effective cloud management [46].
Given the characteristics of learning and identifying patterns provided by ML, solutions that employ such a technique can be an ally for conducting an analysis of BC features during the selection process. The current state-of-the-art on BC selection comprises schemes to decide the best suitable BC platform and type. However, either they (a) follow a manual approach with flowcharts [3], [45] or (b) implement a rule-based selection [23], [35]. Although addressing the BC selection problem, these approaches present drawbacks, such as the need to revise that flowchart upon the inclusion of a new BC platform or the addition of new rules to consider new parameters. Hence, they are not flexible and require extensive human interaction to be developed and readjusted.
This article addresses such issues by presenting the employment of ML to select the best suitable BC driven by userdefined requirements expressed as policies [35]. Based on that, the key contributions of this article are the following: • The design and implementation of an ML-based approach to select the BC that is the most suitable to address user requirements (e.g., popularity, BC tps, SC support, and type) using five different ML algorithms instead of a rulebased system. • The definition of a novel metric to quantify the subjective popularity of a BC platform, which is used as a feature during the selection process. • An in-depth evaluation, using a dataset containing real-world BC platform data, of the ML models in terms of accuracy, performance, and their feature correlation. • A discussion, based on the evaluation results, regarding (i) the usefulness of ML in the BC selection, (ii) the comparison between rule-based and ML-based selection, and (iii) the key features for the selection process. The ML-based BC selection solution proposed provides the users with the choice of selecting which ML model to employ in the selection and provides recommendations of BC that fit the requirements if there is no exact match. Hence, the user is not limited to a single selection approach, and the solution can be used as a recommender system, which shows its better flexibility compared to the current state-of-the-art. The solution's code and dataset are available at [20].
The remainder of this article is structured as follows. Section II describes current efforts on the BC selection topic. Section III presents the solution's design, key selection features, and those ML models. Section IV focuses on the evaluations regarding the model's accuracy and presents discussions. Lastly, Section V concludes this article and lists future work.

II. CURRENT BLOCKCHAIN SELECTION EFFORTS
Due to diverse technical details, not only the process of selecting a BC is tedious, but also whether a BC is an appropriate technical solution to the application is key. Thus, the following approaches address this decision question with different methods, such as decision models and flowcharts.

A. Current Efforts
Flowchart-based BC selection was first presented by [45]. A simple flowchart was proposed based on six main questions that can be answered by Yes or No, which leads to an answer to the central question of whether or not a BC is an appropriate solution to solve a problem, and which deployment type (i.e., Permissionless, Public Permissioned, Private Permissioned) is the most appropriate. The questions can be understood by non-technical users and help to avoid BCs being employed in applications that do not necessarily need them; these include applications that (i) do not require state storage or (ii) can rely on an always-online TTP to provide trusted information.
Similarly, [3] does not present a decision model to select the most suitable BC platform based on user requirements, but instead proposes a flowchart, similar to [45], to guide the user through the definition of when to use a BC and which type (e.g., open permissioned, full-permissioned, and permissionless) to select. Although the flowchart does not recommend a specific platform, [3] presents technical background on BC necessary to help users in the selection.
Reference [31] proposes a criteria catalog-based approach to help users select the most suitable BC implementation for a use case. The criteria considered include, besides BCspecific ones, (i) software quality, (ii) open source software, and (iii) software maturity, since all BC platforms are intrinsically a software implementation. Thus, [31] claims that software-specific quality criteria must be taken into consideration during the selection. Even though the catalog covers the main aspects of almost all of those criteria, there is still the need for users to manually quantify each BC platform's criterion of the catalog without any interface being available.
Further, [29] proposes a methodology for the selection of a suitable BC platform to develop an enterprise system. The methodology consists of a MultiCriteria Decision-Making (MCDM) approach applying the Simple Multi-Attribute Rating Technique (SMART) to perform the selection. The methodology consists of five manual steps, (i) identifying the BC platforms alternatives (e.g., Ethereum, EOS, Hyperledger Sawtooth, Multichain, or IOTA), (ii) identifying the criteria used to evaluate the alternatives (e.g., consensus mechanism, security, Application Programming Interface (API) support, and cost), (iii) determining weights for each criteria depending on their importance in the project, (iv) assign values for the criteria of each BC platform and multiply by their respective weight, and (v) apply the SMART method to rank the platforms. This method can be applied to different scenarios. However, it requires experts to manually define the criteria and weights, calculate the values for each BC, and apply the SMART value.
Following such an MCDM approach, [22] proposes a method for the selection of the optimal BC software for a business using a so-called Buckley's Fuzzy Analytical Hierarchical Process (AHP). AHP is employed to determine the selection criteria and weights. The criteria considered are BC costs, speed (e.g., transaction speed), privacy (e.g., security and anonymity), logistics issues, functionality, and developer availability. The method requires BC experts to provide opinions and weights for the criteria and manually apply the AHP to calculate the most important ones. Even though the method provides the relevant criteria for the selection process, the method does not provide a suitable BC platform.
Reference [23] applies an MCDM approach in a particular context, the selection of a BC-based solution to a maritime organization. The MCDM approach considers criteria such as fast and inexpensive transactions, transparency and speed, security and privacy, and operation without a TTP, for the selection. Decision-makers from the organization must evaluate each criterion based on their importance to the organization (e.g., 1 meaning "Very Low" and 5 meaning "Very High"), and a set of 12 BC-based solutions are evaluated using the Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS) to calculate which is closer to a perfect solution.
Likewise, [12] describes the problem of selecting a BC platform as an MCDM problem. Reference [12] presents a comprehensive list of BC features (e.g., Turing-complete SC support, transaction speed, and PoW-based consensus) mentioned by BC experts during interviews. Several BC platforms were mapped based on these features to serve as a knowledge base for that model. To select the BC platform, users prioritize their desired features using the Must, Should, Could, Won't (MoSCoW) technique. Then, these features and priorities are input to the decision model, which outputs feasible BC platforms, which must be manually selected and deployed by the user.
Reference [25] provides a method based on Multi-Attribute Group Decision Making (MAGDM) to aid the decision and ranking process of which BC service provider fits an enterprise's BC project. This requires BC experts (i.e., showing advanced BC technical knowledge) to provide evaluations according to different criteria, such as (i) service support capability (e.g., security, anonymity and privacy, and technical barriers), (ii) green development capabilities (e.g., effectively use of energy and reduced emissions), (iii) integration capabilities of the provider with Internet-of-Things (IoT) scenarios, and (iv) technological capacities.
A framework to automatically determine the most beneficial BC platform to a user and migrate (i.e., copy) data from one BC to another is proposed by [16]. The framework allows users to define their BC demands based on a set of metrics and weights, which are used by the framework to calculate the most beneficial platform. Furthermore, it monitors and reacts to changes in BC metric, switching to the newly selected BC. Still, users have to manually define weights for each metric, which is cumbersome and requires technical knowledge.
Reference [35] focuses on policy-based management, requiring users to define specific technical details of their BC requirements, such as target block interval, tps threshold, and data size. Hence, requiring more technical knowledge compared to an ML-based solution (i.e., this article). Nevertheless, such an approach is able to interact with different BCs using a notary-based interoperability solution [34], and the selection of the most suitable BC is performed automatically.

B. Comparison and Key Findings
Table I summarizes the current efforts on BC selection. Flowchart-based approaches [3], [45] are useful to consider in early stages of a project's design to verify, if the BC is the best solution and which deployment type is the most suitable one, without considering a specific BC platform. It can be seen that most of these solutions [12], [22], [23], [25], [29], [31] follow a criteria-based approach using criteria-based methods (e.g., MCDM or MAGDM) to select the most suitable BC platform. Moreover, such approaches often depend on manual inputs from either (a) experts or (b) decision-makers. Only [12], [31] abstract technical details from the process, allowing users with basic BC knowledge to perform such a selection. Reference [12] is able to recommend possible BCs to be employed. However, it does neither select a suitable one automatically nor react to changes in an automated manner as related solutions [16], [35] react. Hence, non-automated approaches are not suitable, given the increasing number of BC platforms.
From this related work research, this paper determined the following two novel findings, relating to (1) the approach used for the selection solution and (2) the automation of such a process given the increase in the number of BCs.
Finding 1: Compared to the state-of-the-art of BC selection approaches, employing Policy-based Management (PBM) provides a more flexible approach to allow users to express high-level requirements.
When applying PBM in the BC selection context, policies express user requirements that a BC must satisfy in order to be selected. Although such a selection process can be accomplished with different approaches (e.g., based on criteria and weights), policies do not need to be modified, if underlying values of a BC change. In contrast, other approaches require (a) users to change their criteria to reflect the BC platform or (b) an expert to adjust weights. Hence, combining policies defining user requirements and an ML-based selection approach is viable for creating selection models without expert intervention and reducing the required technical knowledge to perform the selection. Finding 2: Automating the BC selection process is crucial to guarantee scalability for the BC selection approach and system, since the number of BC platforms grows.
Criteria-based and flowchart-based BC selection approaches are not automated, and either are statically defined or require manual inputs from experts. Further, given the fact that the number of BC platforms tends to grow, such non-automated approaches do not scale to accommodate this myriad of platforms. Further, new parameters must be included and the model redesigned to reflect the new selection flow. Thus, it is imperative that a BC selection is automated (i.e., no human interaction required in the selection process) to be feasible considering the BC scenario, which is the case of an ML-based one that is able to automatically adapt (i.e., with retraining) to new parameters and scale to accommodate several BCs.

III. AN APPROACH TO MACHINE LEARNING-BASED BLOCKCHAIN SELECTION
Based on the conducted research on BC selection, the employment of ML is promising to address this selection process in a flexible and scalable manner. Flexible as new ML models can be added and scalable because the ML models adapt to new parameters and data (i.e., BC platforms) without requiring the addition of new rules as in rule-based approaches.

A. Solution Workflow
The building process of the ML model follows the workflow depicted in Figure 1. It involves two main artifacts: Data and Model, which are generated in three different phases: Data Acquisition, Data Processing, and Model Engineering.
Firstly, in the Data Acquisition phase, the block time of the available BCs is collected from external APIs, such as Bitinfocharts [6] and Blockchair [7] (1). Besides collecting raw data, this phase includes identifying and generating features, and as data labeling. The result of this data collection is the initial dataset, which is detailed in Section III-B. This data is further processed in an intermediate Data Processing phase, which prepares the data for model training (2). Since raw data typically cannot be used directly for model training, it needs to be processed and transformed into a numeric representation. The processed data is divided into a training and a test set (3). This split is necessary to estimate the performance of these ML models, because it allows to make a prediction on data that was not used to train the model. In the Model Engineering phase, different ML algorithms are applied to the training data to obtain evaluations of ML models for the BC selection (4). In the last phase, the Deployment phase, these models are deployed through a REST API and integrated into PleBeuS [35], where the framework asks the model for predictions by passing feature values through an API call (5).
PleBeuS is a PBM-based BC selection framework allowing users to store arbitrary data in multiple BCs given a set of requirements, e.g., users can create policies stating that data from IoT sensors will be stored in different BCs during the day and the night, for example, to optimize costs. Its current BC selection is based on rules and two algorithms, with the drawbacks presented in Section II. Thus, the addition of ML-based selection to PleBeuS was investigated in this article.

B. Features Considered in the Selection
A feature represents an individual, measurable attribute or characteristic, which is commonly depicted by a column in a dataset. It generally represents an attribute plus its value (e.g., "Popularity = High", where high determines the choice for the highest relative popularity across the dataset of BCs). The features included in the dataset are used as input to build an ML model and predict target values. Thus, by being the building blocks of a dataset for model training, such features ultimately impact the quality of the ML model. The BC selection process can be regarded as a classification-based task that falls into supervised ML. Hence, the main goal is to predict a suitable BC based on multiple input data that consist of different attributes or features. This section provides an overview of the features used for model training.
1) Platform Transaction Speed: For this feature, the block time of a BC was considered. The block time defines the time that it takes to produce a new block in the BC network [36]. The platform transaction speed was grouped into three categories with an ordinal scale: low, medium or high. For the categorization of the public BCs into these categories, the block time was retrieved from external APIs that provide information about the current state of the BC. In total, 40 requests were sent to these APIs, and the average block time was computed based on these values, which are depicted in Table II. Finally, the category which the BC fits was determined based on the percentiles. Numbers below the 33 rd percentile (5.30 seconds) were assigned a high, numbers between the 33 rd and 66 th percentile a medium, and numbers above the 66 th percentile (19.10 seconds) a low transaction speed.
For private BCs (e.g., Hyperledger, Multichain, Corda or Stratis) which are configurable in terms of consensus mechanism and transactions rules, the block time cannot be determined per se, as it depends on how the consensus mechanism is configured. However, due to the limited amount of network participants in private BCs, it they have higher throughput and are able to process more transactions per second than public BCs. In [30], a performance analysis of Ethereum and Hyperledger has been performed, where the authors measured the execution time, latency and throughput of both BCs in varying scenarios. They observed that Hyperledger outperformed Ethereum across all scenarios. Therefore, all supported private BCs were assigned a high platform transaction speed. The final classification is presented in Table II.
2) Popularity Score Calculation: The final popularity score in Table III is calculated using Equation 1. This equation considers the number of Twitter followers of the BC platform (N follow ) retrieved using the official Twitter API [42], the 12month average number of monthly Google searches (G search ) for the BC name as keyword retrieved from [1], and the number of academic papers (C papers ) mentioning the BCs published in IEEE Xplore. Each variable is multiplied by an arbitrary weight ({W 1 , W 2 , W 3 }) defined by the user, where W 1 + W 2 + W 3 = 1.0. For example, if the user defines that Google searches impact in 50% of the score (i.e., 0.5), W 2 = 0.5 and W 1 and W 3 must sum up to 0.5, e.g., W 1 = 0.25 and W 3 = 0.25, showing 25% of impact each.
The weights used to calculate the overall score in Table III were empirically determined, based on experience from this article's authors, as follows for evaluation purposes. Twitter followers impact 15% of the score (W 1 = 0.15). This factor is considered relatively low because private BCs tend to have a significantly lower number of followers, but that does not mean they are not popular among enterprises. Google searches impact 50% (W 2 = 0.5) of the score as it indicates clear popularity among regular users and developers. Finally, academic papers have more impact than Twitter followers because they show the popularity among academia and researchers. Therefore, its factor is set to W 3 = 0.35. The final scores (i.e., low, medium, and high) followed the same reasoning as the scores for the transaction speed, using the 33 rd percentile and 66 th percentile described in Section III-B1.
Quantifying the popularity of a BC is not a straightforward task as it is a subjective parameter; hence, the metrics used to calculate the scores presented in this article are not exhaustive nor the equation definitive. Reference [41] only focuses on Twitter parameters, [48] considers Twitter followers and Google searches but includes market capitalization and Github aspects. However, they did not consider the academic perspective that Equation  Table III with the values for the respective BC platforms. Such a table is not exhaustive, with only an arbitrary subset, from twenty thousand [10], being selected for this article's prototype.

C. Decision Model
Five ML algorithms were selected and individually evaluated in terms of performance and accuracy (cf. Section IV) to build a decision model: Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Naive Bayes (NB), and k-Nearest Neighbors (kNN). These models have been trained for 16 real and heterogeneous BC platforms (cf. Table III) that served as true labels.
For the dataset used in training, for each BC, a new observation was added with the amount of data that other BCs store and that the BC can also store. For example, Bitcoin will have 4 occurrences, with 80 Byte, 40 Byte, 28 Byte, and 20 Byte. The resultant dataset had 124 observations, and removing duplicate observations resulted in 107 observations. To deal with data unbalance, it was performed Random Oversampling to duplicate examples from the minority class randomly. The final dataset contained 208 observations, where each of the 16 BCs had 13 observations (16 × 13 = 208).
In a first step, these algorithms were trained by providing them with the relevant features and their corresponding true labels for selected observations. Followed by the assessment of the algorithms, where only the features were provided, expecting them to predict the respective labels. Different tools provided by ML libraries are available to split data into training and test sets. For example, the train_test_split method available in [39] divides the data into two parts according to a specified partitioning ratio. This method was used for model training, where the dataset was split between training and test with an 80:20 ratio (i.e., 80% for training and 20% for test). The training set was used to train the model and the test set to measure the performance of the models [17] presented in Section IV.
By using the DecisionTreeClassifier algorithm available in [39], a DT was built. The DT model is instantiated with the default parameters, i.e., criterion parameter set to "gini" and fitted (i.e., trained) on the data attributes and labels. The DT can be visualized as a flow chart, which makes the decision process easy to understand. For example, Figure 2 depicts an excerpt of the DT model used in this work. Each node (white rectangles) in the DT represents a test case for an attribute/feature, and each edge (light gray and dark gray rectangles) descending from the node corresponds to the possible answers to the test case (e.g., Stratis or R3 Corda). The root node starts with all sixteen data points (i.e., features and attributes). In this model, the feature which best split the different classes is the Turing Completeness.
Instead of using the entire dataset to train a single DT, the RF method randomly picks a sample of the dataset to train each DT of the RF. The RF classifier RandomForestClassifier was instantiated with the default parameters, i.e., n_estimators set to 100, which defines the number of trees in the forest.
For the SVM model, the Support Vector Classification, SVC from [39] was used. The model was built with different kernel functions, e.g., linear, polynomial, radial and sigmoid. Additionally, the type was specified to one-versus-one. The penalty term C was set to its default value (i.e., 1), as increasing or decreasing the value did not further increase the performance of the algorithm. The performance measure  [39], which provides a probabilistic learning approach for classification with discrete features. The model is instantiated and fitted to the data. Once the model was trained, further tests and evaluations were performed to measure its applicability to the BC selection process, described in Section IV. For the kNN model, the KNeighborsClassifier was employed; in this model, a class is predicted based on the distance to its k neighbors. For the accuracy tests in Section IV, the value of k was set to 1. Although the value of k = 1 might lead to overfitting because only one occurrence will be considered for the distance, it was the value that yielded the best accuracy.

IV. EVALUATION AND DISCUSSION
The results of the performance evaluation of those ML algorithms selected are based on two use cases scenarios, in which the practical application of trained ML models in regard to the BC selection process was evaluated. Based on the evaluations' setup with an Intel R Core TM i7-6600U CPU at 2.60 GHz and 16 GB of RAM, different aspects are discussed, such as (i) the ML's usefulness, (ii) a comparison with a rule-based approach, and (iii) feature importance.

A. Comparison of ML Algorithms
Five ML algorithms have been trained to aid in the BC selection process, as described in Section III-C. kNN presented the worst accuracy (cf. Table V) among them; hence, it was not included in the use cases as its results would not be relevant given its low accuracy. For evaluation and comparison of these algorithms, different performance metrics were considered, e.g., confusion and correlation matrices. To calculate a confusion matrix, the dataset was split into two sets, (i) the training set and (ii) the test set. The algorithms were trained using the training set to form the decision models and then tested against the test set. While testing, the models are used to predict the class labels (predicted labels), which in turn are compared to the actual labels (true labels). Figure 3 depicts the confusion matrices for the considered models. It is observed that the DT, RF and SVM correctly predicted the classes in most cases (i.e., above 86%). In some cases, samples belonging to IOTA have been predicted as Cardano or vice-versa. Similarly, EOS samples have been classified as Wanchain and vice-versa. This is due to these BCs having the same characteristics. IOTA and Cardano, and EOS and Wanchain, share the same properties, besides the data size variable, which is apparent on the dataset depicted in Table III. Overall, these models performed with a high accuracy score (cf. Table V). The accuracy scores of DT, RF, and SVM algorithms are similar. In addition, the RF and the SVM model slightly outperformed the DT model by 2%. In contrast, the NB and kNN models delivered the worst performance as they misclassified test samples in more than 30% of the cases.
The poor performance of the NB model is explained due to its sensitiveness to correlated features, since it assumes independence of all attributes, which does not hold in this dataset. This is evident in the correlation matrix (cf. Figure 4), where a high correlation (0.63) between Turing Completeness and SC Support, and between the Popularity and BC Type (0.49) exists. Further, there is a negative correlation (−0.39) between the Transaction Speed of a BC and its Type, and a positive  The performance of the training time was also evaluated. The models were trained thirty times on the dataset, and the respective mean time (in seconds) has been measured for each algorithm. The evaluation results are depicted in Table V. kNN presented the fastest training performance 1.7 ms, but DT and NB classifiers also presented fast training time of around 2.5 ms. The RF algorithm has a higher training time than a single DT, as in each run, 100 DTs are built for the RF classifier. The SVM classifier presented the slowest training time, which was around 1.6 seconds. This can be attributed to its high computational complexity, in which typical SVM requires O(n 2 ) time complexity, being impractical for large datasets [43]. Thus, models can be automatically trained periodically to reflect changes in dynamic features (e.g., block time or popularity) without a major impact.

B. Use Case Scenarios
For the evaluation of the solution presented herein, two scenarios were considered. In the first scenario, BC features that exactly match an underlying BC each are used as input for the models, hence, expecting the models to output the correct BC. In the second scenario, BC features that do not correspond to a specific BC are taken as input for the models; hence, expecting the models to predict the closest BC that match the input parameters. These predictions of the second scenario can be used as a suggestion and evaluated by the user to verify if the BC matches his/her requirements. In summary, these scenarios evaluate how accurate are the models in selecting a BC and which is the variance of BCs suggested by the models.
1) Scenario #1: The results of the first scenario are presented in Table VI. This table presents the prediction made by each of the models, after being trained, and whether the prediction of the BC of interest was correct, indicated by "✓" mark, or if the prediction was wrong, indicated by a "✗" mark. The DT and RF models correctly predicted the BCs in all cases, while the SVM model rightly classified the BCs for the most part, except in the cases of Cardano and Wanchain, where it predicted IOTA and EOS, respectively. However, this classification is correct because these BC share the same defined properties and fit the input parameters. Hence, IOTA can be a replacement for Cardano and EOS can replace Wanchain. Further, compared to the generalization capabilities of the DT, RF and SVM model, the NB model achieved relatively bad accuracy results once more. It incorrectly classified the instances in half of the cases, which was highly anticipated, considering the low accuracy score of the NB model.
2) Scenario #2: For the second scenario, BC characteristics were chosen that do not fully meet the properties of one of the supported BCs. This scenario details a real-world scenario where the user has a set of requirements but is not aware of which BC is the most suitable. This was done to evaluate if the models' predictions are reasonable and if they could serve as user recommendation, allowing this ML-based solution to be used as a recommender system, such as [13], [14]. It is worth mentioning that the regular selection algorithm in [35] does not offer such a flexible recommendation mechanism, as the user would be prompted with an error message, stating that no BC with provided parameters is available. For this scenario, the NB model was omitted due to its inadequacy and the poor quality of its predictions. Five cases were defined and used as input to the models to predict a BC. Even though it is a concise evaluation, this scenario indicated that the methodology is feasible and ML can be used for BC selection and recommendation, which is the article's focus. The cases and corresponding predictions are presented in Table VII.
In case No. 1, four different BC implementations (Ethereum, EOS, Neo, and QTUM) would be eligible if only the type, SC, Turing completeness, and supported data size variables are considered. However, none of these BCs has both a high Transaction Speed and a high Popularity. The predicted BC was the same across all models, namely EOS, which covers all variables except for having a medium Popularity score. Neither of these models did choose Neo or QTUM, because neither of them has a high Transaction Speed nor high Popularity. The other option would have been Ethereum, which shows a high Popularity, but a medium Transaction Speed. To comprehend why these models favored EOS over Ethereum, the feature importance of the DT and the RF model was computed, which indicates the relative importance of each feature toward the output variable. The DT and RF model provide a built-in feature_importances property that can be accessed to retrieve relative importance scores of these features, which are illustrated in Figure 5. Transaction Speed is the top feature contributing to the prediction in both models, thus favoring BCs with a matching Transaction Speed.
In case No. 2, three BCs qualify for selection (Stellar, ICON, and VeChain), when considering all parameters except the Transaction Speed and the Popularity score. The RF model predicted Stellar, but weighting the Popularity more than the Transaction Speed given that Stellar has a high Popularity, but medium Transaction Speed. This is due to the importance of Transaction Speed and Popularity, which for RF are almost equally high. The DT and the SVM models predicted ICON, which has a high Transaction Speed but medium Popularity. The models did not predict VeChain, because it neither has a high Transaction Speed nor high Popularity.
In case No. 3, the input variables are chosen in such a way that two BCs, IOTA and Cardano, would qualify for selection except that they both do not support 2000 bytes in one transaction. All models returned IOTA, which is presumably due to IOTA's higher data size support of 1300 bytes, compared to Cardano's 500 bytes. Generally, if the defined input corresponds to one specific BC apart from the data size, that BC is selected nevertheless, which might be a disadvantage, e.g., when the specified data size is crucial for the user.
In case No. 4 and 5, predictions of private BCs were the main focus of the analysis. The only private BC that supports a data size of 100 Byte, as defined in case No. 4, is Corda. However, it does have low Popularity in opposition to the medium Popularity specified in the input. In this context, the DT and RF predicted Corda as the most suitable BC, whereas the SVM returned Wanchain, which conforms with all input variables besides the BC Type. In consequence, it is possible that a model will output a public instead of a desired private BC, which could potentially be a significant drawback, especially when the privacy is required.
In case No. 5, all models correctly predicted a private BC. DT and RF returned Multichain as the best fitting BC, while SVM forecasted Stratis. Multichain satisfies the SC and Turing completeness constraint, but provides a higher Transaction Speed and Popularity than defined in those input values, whereas Stratis does not fulfill the SC constraint, but meets the low Popularity score and offers a higher Transaction Speed.

C. Discussion
Three discussion points were identified from the performed evaluation, (i) how useful the BC selection process is, (ii) how such a system compares to a rule-based one, and (iii) which features are key for the selection.

1) ML Usefulness and Suitable Algorithms:
The research on the applicability aspect of ML to the selection process showed that the models could be highly accurate and used for a multi-class classification task such as the BC selection process. However, it showed that the NB algorithm is not suitable for this dataset due to its low accuracy score and bad performance on the use case scenario in Table VI. Nevertheless, the three other models (i.e., DT, RF and SVM) performed significantly better than the NB models across all tests, except for the evaluation of the training time, where the NB model was the second-fastest learning algorithm. There was no essential or significant difference in terms of accuracy between these three models. Their corresponding confusion matrices, as well as their accuracy scores (cf . Table V), demonstrated that most of the instances of the test cases were correctly classified. In particular, if the input variables completely match an underlying BC, the models predict that specific BC.
In cases where two different BCs share the same properties, e.g., EOS and Wanchain, and both meet the criteria defined in the parameters, only one is outputted by the model. As seen in the model evaluation, this led to misclassification in a few cases, despite the models predicting one of the two possible BCs, which was also why the models did not yield a higher accuracy score. This could be addressed by including a multi-label classification algorithm, which is used to predict properties of samples that are not mutually exclusive, i.e., each instance can be assigned simultaneously into multiple categories instead of only one like in the case of multi-class classification. Such an approach could also be applied to cases, where input variables do not fully match an underlying BC's implementation providing users with a recommendation of multiple BCs instead of only one based on their requirements.
2) ML-Based Versus Rule-Based Approaches: Concerning the question of how does an ML-based approach (i.e., probabilistic) compares to a rule-based system (i.e., deterministic) when converging to a decision, it is known that a rule-based approach determines a suitable BC based on the required features by using a set of decision rules (e.g., if-then-else conditions), which is the case of the selection algorithms used in [35]. These rules instruct the system to use relevant policy features that a user has configured to identify the most appropriate BC. Although a rule-based system is human-comprehensible, explainable, and can be improved over time, it presents two disadvantages. Firstly, although a MLbased approach also requires domain knowledge, a rule-based approach requires a more in-depth knowledge of the relation between parameters to manually create new rules for new parameters, which can become a time-consuming task, especially when the system scales to several rules. Secondly, it can result in low scalability, as adding new rules can impact existing rules if filtering is not performed to reduce the complexity.
In contrast, a supervised ML approach learns how to classify data based on a dataset (cf . Table III) without human interaction required to define the rules. Further, new instances or features can easily be added to the dataset, such that new tasks can be learned from the new data without the encoding of new rules. However, the data acquisition and preparation process is a crucial part of ML models, usually involving multiple steps and essential decisions (e.g., how to assign values, the feature values, and which dataset to use) that will affect the overall model quality and performance. This process is prone to become a time-consuming and tedious task involving many challenges, such as lack of necessary data or unbalanced data. Moreover, the performance of the models needs to be closely evaluated to validate the models' effectiveness and whether the outcomes are acceptable and could be applied in a real-case scenario (i.e., user feedback required). As shown in the NB model, ML models can have poor applicability and might not be suitable for a specific use case.
An ML-based solution can provide valuable recommendations to users when the input variables do not fully match an underlying class in a multi-class classification task, such as the BC selection process. As opposed to a rule-based algorithm, which would not make a classification (selection) in such a case. For example, the use case in Scenario #2 (cf . Table VII) showed that the MLbased selection algorithm can make appropriate predictions when no applicable BC implementation fits all the chosen parameters. In contrast, the existing rule-based algorithm of [35] would inform the user that there is no matching BC implementation.
Even though ML and Artificial Intelligence (AI)-based systems can provide such recommendations, users must evaluate if they trust the decision performed by the models not only in the specific BC selection use-case but in general. Trust in ML and AI models is a widely researched and discussed topic in the literature; researchers have proposed using visualizations to foster trust in such a context [47] and identified the relation between trust in AI and trustworthy ML technologies [40]. Although not in the scope of this article, aspects such as accuracy, fairness, robustness, accountability, and explainability, contribute to providing trust in the decision for the user [44]. In this context, the feature correlation (cf. Figure 4) and feature importance (cf. Figure 5) help users understand how the decision was reached.
3) Key Features for the Selection Process: Concerning the key features, the most common explanation technique for classification models is the feature importance score, which provides insights into model behavior [5]. Feature importance describes how important a feature was for the classification of a model, i.e., how discriminant a feature is in a dataset. A specific feature might be more important for one classification model than another. This is evident when comparing the feature importance scores of the trained DT classifier and the RF classifier depicted in Figure 5. Not all the features are equally important to predict the BC, i.e., the target value. While the Transaction Speed feature has relatively high importance in both models, other features vary in importance depending on the considered model. However, none of the features has zero importance on the target variable, and thus, all features contribute to the models' prediction. Figure 4 illustrates the correlation between the features. There was a considerable correlation between Turing Completeness of a BC and its SC Compatibility. This is logical, as Turing Completeness is only available when the BC supports SC. The platform Transaction Speed is correlated to the BC Type, suggesting that private BCs do generally have a higher Transaction Speed than public BCs. Its relative high correlation to the SC feature indicates that BCs, which support SCs tend to have a higher Transaction Speed. Moreover, it seems that the type of a BC shows an impact on the Popularity score, as they are positively correlated. However, since there is no strong (i.e., correlation score ≥ 0.9) or perfect correlation (i.e., correlation score = 1) between any of the features, no variables would convey redundant information and could be removed without losing valuable information. Hence, all features are important to the BC selection process.

V. SUMMARY, CONCLUSION, AND FUTURE RESEARCH
The number of blockchain (BC) platforms is growing at a fast pace with the increasing popularity of BC and cryptocurrencies. In addition, different use cases, such as supply chain tracking and decentralized notaries, benefit from the fact that BCs can be viewed as a service that provides a public and immutable database. Each BC claims to address particular issues, e.g., allowing more data to be stored in a transaction or providing higher transactions per second. Although such platforms provide benefits, identifying which one of the myriad of platforms fits given requirements is a complex task, as the technical BC knowledge required to perform this decision increases as the number of BC increases. Hence, this article detailed the design, implementation, and evaluation of a Machine Learning (ML)-based BC selection approach that automates such a decision, providing a flexible and accurate BC selection approach, which can be used in systems where different stakeholders provide different requirements regarding their preferred BC platform.
Based on the evaluations conducted on the accuracy of the models and their performance, it was identified that the employment of ML is viable and three out of five ML algorithms (i.e., Decision Tree -DT, Random Forest -RF, and Support Vector Machine -SVM) trained and evaluated were able to achieve high accuracy of more than 85%. Further, the comparison of the ML-based BC selection solution presented in this article with a solution based on rules [35] revealed that the ML-based solution is able to provide a recommendation of a BC platform even when the requirements do not entirely match with a BC. In contrast, a rule-based system would not perform the selection given its inflexibility. In addition, as the approach is not limited by the dataset, it can be extended to include new BC platforms or parameters, requiring only the retraining of the ML models to predict such new platforms. Finally, it was identified that all features considered in the dataset are essential for the selection, with the Transaction Speed of the BC platform being important across all models and others, such as Popularity and BC Type, vary in importance depending on the model. Hence, it can be concluded that ML does provide a suitable mechanism to address the BC selection issue in an automated and flexible fashion.
Future work encompasses (a) the quantification of additional BC parameters, either subjective metrics (e.g., security or code maturity) or objective metrics (e.g., costs, interoperability or transaction anonymization support), (b) refinement of the popularity metric, including aspects such as GitHub parameters and market capitalization, (c) evaluation of multi-label classification algorithms, and (d) conduct evaluations with a more extensive set of BCs and parameters.