Machine Learning Approach for Metal Oxide based Polymer Composites as Charge Selective Layers in Perovskite Solar Cells

: A library of metal oxide-conjugated polymer composites was synthesized, encompassing of WO 3 -polyaniline (PANI), WO 3 -poly( N- methylaniline) (PMANI), WO 3 -poly(2-fluoroaniline) (PFANI), WO 3 -polythiophene (PTh), WO 3 -polyfuran (PFu) and WO 3 -poly(3,4-ethylenedioxythiophene) (PEDOT). These composites were probed as hole selective layers for perovskite solar cells (PSCs) fabrication. We adopted machine learning approaches to predict and compare PSCs performances with the developed WO 3 and its composites. The experimental and theoretical results are coherent, when the electro-optical properties of PSC were computed. Notably, for the evaluation of PSCs performance, decision tree model is the ideal for WO 3 PEDOT composite, while random forest model was found to be suitable for WO 3 -PMANI, WO 3 -PFANI, WO 3 -PFu. While in the case for WO 3 , WO 3 -PANI and WO 3 -PTh, K Nearest Neighbors model was appropriate. Machine learning models can be a pioneering prediction models for the PSCs performance and its validation. ) (cid:2870) (cid:3041)(cid:3036)(cid:2880)(cid:2869) Instead of traditional train and test split, the data set is divided into K pieces and while K-1 train set is created in each iteration in k-fold cross validation which promotes objectivity. After deriving the performance metrics for each iteration, the final metric of the model constitutes by the average of K piece performance metrics.


Introduction
Perovskite solar cells (PSCs) are being intensively research due to broad light absorption (300-800 nm), high absorption coefficient, long carrier diffusion length, high charge carrier mobility and tuneable bandgap. [1][2] Organic-inorganic halide PSCs have gained significant attraction due to simple fabrication process and high-power conversion efficiency (PCE). [3] The Perovskite is represented by a typical formula of ABX3, here A is an organic cation such as methylammonium (MA), formamidinium (FA), etc.; B is a metal (typically Pb); and X is a halogen anion (I, Br, Cl, or a mixture of these). [3] MA based PSCs lacks behind desired characteristics of thermal stability, moisture-induced degradation, and hysteretic I−V behaviour, which limits its applications. [3] Mixed cation and anion based perovskites showed advantageous properties [3][4][5] and mixed cation of formamidinium/methylammonium (FAMA) gave enhanced performance due to an intense light absorption, stability and reduced J−V hysteresis. [3] Further, the incorporation of cesium (Cs) can also efficaciously decreases the crystallization temperature during the annealing process and induce reliability. [3,5] Depending on the light incidence, the architect of PSCs can be either n-i-p or p-i-n type, where n-and p-type are electron and hole selective materials respectively, and i denotes to the light harvesting layer. [6] PSCs with p-i-n planar architecture display negligible hysteresis, solution processability at low temperature, and the potential for scale up using a continual coating method. [7] Fullerenes type acceptor, especially [6,6]-phenyl-C(61)-butyric acid methyl ester (PCBM), are typically used as n-type charge transport layer [7] , while poly (3,4ethylenedioxythiophene):poly(styrene sulfonate) (PEDOT:PSS) as a transparent hole transport material in lieu of Spiro-OMeTAD, was employed. [8] However, PEDOT:PSS possesses undesirable features such as hygroscopic nature, inferior thermal stability and inability to block electrons [8] , thus an effective hole transport layer (HTL) is paramount. Cogal et al. reported graphene-based polymer composites as HTL. [9] Recently, transition metal oxides such as WO3, owing to its high work functions, high carrier mobility, and excellent thermal stability are being used as HTLs. [8] Comprehensive study on metal oxide based HTLs suggests that the hybrid organic-inorganic composite can be promising candidates as injecting carriers from perovskite absorber to electrode. [10][11][12] Machine learning (ML) is a set of methods that can acquire the data pattern without explicit programming and predict the imminent data with uncovered patterns. [13] It will replace the traditional trial-error method which demands longer time and resources to predict the performance and reliability of PSCs (stability, PCE, fabrication techniques and material synthesis). [14][15][16] ML based data-driven is necessary to be coupled with experimental data for the modelling. [17] An extensive amount of data is not required and computing time is fast as compared to other complicated models. [18] By employing ML technique in solar cells, material properties, optimized device architects and fabrication processes can be predicted, and data reconstruction is attracting significant interests for research and development. [18] Reports dealing with PSCs through ML approach are in scarce [19] , the majority of prediction of materials entails, electro-optical features, J-V performance, which are associated with dyesensitized and organic solar cells. [18] Thermodynamic stability of 20,000 randomly selected materials using seven different ML methods was predicted. [20] Random forest (RF) model predicts band gap of perovskites by employing 18 descriptors and 10 stable perovskites and band gaps were elucidated. [21] Similarly, discovery of materials with an optimal bandgap for single junction was predicted. [22] 300 octahedral oxyhalides with geometric and electronic data were used to sequence ML model and the band gap predictions were made on 5000 oxyhalides test data. [23] ML methods were used to develop model that contains 333 data points from 2000 scientific articles to guide the designing of new perovskite to predict model performance. [16] Using alternating conditional expectations -ML approach, nonlinear mapping between band gap and properties of constituent elements were studied. [24] Methodology was presented to predict bandgap of undiscovered 5158 hybrid perovskites for PSCs, for this, six different ML regression algorithms was constructed and implemented.
Gradient Boosting Regressor (GBR) achieve higher performance during the train-test process, and was used to predict 5158 band gap values. [25] Saidi et al. developed a dataset consisting of structural and band gap features of 862 halide perovskites to improve a predictive ML model that receive the complex trends and correlations of this chemical space. This study showed that a well-designed hierarchical ML approach has superior accuracy in predicting the features of halide perovskites. The root-meansquare errors for the lattice constants, octahedral angle and bandgap for halide perovskites were calculated as 0.01 Å, 5°, and 0.02 eV, respectively. The hierarchical convolutional neural network (CNN) was also reported as suitable approach in materials design. [26] In another study, a gradient boosting regressor (GBR) ML model was applied onto structural and elemental features for perovskite formation energy prediction. The bigger training set is then utilized to train a convolutional neural network model (the screening model) with the generic Magpie elemental properties with high prediction power. The root mean square error (RMSE), mean absolute error (MAE), R 2 for descriptor with Magpie element was 0.11, 0.25, 0.83, respectively. The screening model was used to filter out promising perovskite materials out of 21,316 hypothetical perovskite structures with a large portion evidenced from the previous literature. [27] Automated identification tool via machine learning methods to situate the dominant loss via the light intensity-dependent performances as an input was made. The highest accuracy of the prediction using >2 million simulations with a RF classifier was obtained >82% when utilizing the performance of all of the simulated light intensities and the mobility of the layers. The prediction of the dominant recombination using ML approach can be enhanced by adding the performances under different light intensities, doping, and mobilities. [28] As a result, machine learning approach has been applied in prediction properties of perovskite materials (such as phase stability, band gap, electronic transport features, etc.) and perovskite design that can be beneficial for the development of innovative perovskite. [29] Our aim is to put forward a direct appraisal between experimental data and machine learning algorithms for PSCs. Radio frequency (rf) plasma-enhanced method was used to synthesize WO3-conjugated polymer composites as charge selective layer. A library of composites including tungsten trioxide-polyaniline (WO3-PANI), tungsten trioxide-poly(N-methylaniline) (WO3-PMANI), tungsten trioxidepoly(2-fluoroaniline) (WO3-PFANI), tungsten trioxidepolythiophene (WO3-PTh), tungsten trioxide-polyfuran (WO3-PFu), tungsten trioxide-poly(3,4-ethylenedioxythiophene) (WO3-PEDOT) were prepared using a rotating capacitively-coupled rf plasma process. The composites prepared through rf rotating modification is advantageous due to solvent-free, nontoxic, and well-controlled deposition. [9,[30][31][32] The fabricated PSC architect and energy level diagram using WO3 as charge selective layer is shown ( Fig.1 a & b). Four different machine learning algorithms were used to build the models for UV-Vis spectra, J-V and external quantum efficiency (EQE) spectra prediction. The values in the dataset are continuous, and the ML algorithms were built on regression models and not on classification. Hereto unreported, we report our findings on composites of WO3-conjugated polymer as hole transport layer in triple cation-based PSCs and the validation between experimental data and simulation using the machine learning model (Figure 1c).

Results and Discussion
Machine Learning Model Performance Machine learning was implemented in a systemic manner involving four steps which are (i) defining research objectives for UV-Vis spectra, EQE spectra and J-V spectra predictions, (ii) constructing dataset with experimental results, (iii) applying preferred machine learning algorithms and (iv) evaluating models based on dataset in Table 1 with R 2 and RMSE scores. Machine learning algorithms were applied on experimental results from seven different composite based films. In the first two models, we input wavelength (nm) to predict absorption and external quantum efficiency (EQE) respectively. Current density (mAcm -2 ) Train was predicted based on voltage (V) in the third machine learning model. In the constructed dataset, there were no missing values but there were outliers, which may affect the prediction performance. Thus, no imputation was required to fill missing values while outliers were removed from the dataset. Before building the machine learning models, the data was subject to scaled,  = 0 and  = 1 with standard scaler. For fitting and evaluating models, dataset was separated into two parts as 80% train set and 20% test set. The five-fold cross validation method was applied on train set to monitor the over fit and under fit problems and the final evaluation was made with unseen 20% test set which is usually called hold-out set.  To evaluate the electrical properties of WO3 and its composites, we measured electrical conductivity at room temperature by fourprobe method. Variation in the conductivity value was noted due the formation of composites and the pristine WO3 gave the lowest conductivity value (7 × 10 -5 S/cm) among the samples, while the WO3-PANI based composites showed highest conductivity value (36×10 -5 S/cm). WO3-PMANI, WO3-PFANI, WO3-PTh, WO3-PFu and WO3-PEDOT based composites measured 29, 3, 13, 2, 24 × 10 -5 S/cm respectively. The conductivity values are in accordance with the device photovoltaic properties, and WO3-PANI gave higher performance among them. This we ascribed due to the high charge carrier concentration built in the devices, which increase the performance. Figure 3 displays UV-Vis absorption spectra and the bands at ca. 430 and 800 nm can be attributed to the doping level and formation of polarons and bipolarons. [33][34] The WO3 based composites gave similar response and resembled that of pristine WO3 film due to the deposition of very thin polymer coating onto WO3 particles during plasma polymerization process. [30][31] We noted higher performance for the GBR and KNN than the rest of the R 2 methods (Table 2). Gradient boosting gave good results for WO3, WO3-PFANI, WO3-PMANI, WO3-PFu and WO3-PTh with 0.971684, 0.903560, 0.968168, 0.961903 and 0.973991 R 2 scores respectively. WO3-PANI and WO3-PEDOT were fine predicted by KNN with the scores of R 2 0.965050 and 0.970469 respectively. The predictions of best models fit into the original dataset for each PSCs and can be deduced from Figure 3. Table 2. UV-Vis prediction performance of machine learning models for WO3 and its composites.  EQE response prediction for perovskite solar cells based on WO3 and its composites Figure 4 illustrates the experimental and machine learning based external quantum efficiency (EQE) spectrum of triple cation based PSCs using WO3 and its composites-as HTL. It can be deduced (Figure 4), the overall EQE response of the fabricated PSCs are in agreement with machine learning approach. We noted >80% conversion in the EQE spectrum in the range from 400 − 700 nm for the WO3-based composites as hole selective layers in PSC. Among the WO3 composites HTL -based PSCs, the higher EQE values were represented by WO3-PANI, WO3-PTh-based PSCs that is supported by its higher photovoltaics performance. In contrast to the UV prediction, a single machine learning approach of random forest outperformed for all PSCs while making predictions on EQE. Each decision tree has a high variance, but low bias. Random forest, which is consist of various decision trees that are trained by different portion of the train set, corroborates its flexibility and data adaption ability on EQE data. It constitutes well-balanced model in terms of bias-variance tradeoff. The R 2 scores (Table 3)  Prediction of J-V curves based on WO3 and its composites as HTL Figure 5 shows the comparison between the machine learning approach and experimental J-V curves of PSCs based on WO3 and its composites. To evaluate the PSCs performance prediction, we inputs data points from experimental results. The theoretical predictions from machine learning methods with high accuracy were supported by the experiment results. The highest PCE was displayed by WO3-PANI composites and its derivatives, and 7.56% of PCE was measured, which is in accordance with the WO3-PANI electrical properties. The enhanced PCE performance was ascribed due to its high fill factor (FF) values owing to faster charge transfer. [35] PANI with intriguing hole extraction feature provide significant contribution onto PSCs based on WO3 composites. The introduction of methyl as electron-donating group or fluorine as electron-withdrawing group influences the performance of PSCs including WO3 composites with PANI and its derivatives. Variation in the performance of PSCs with WO3 composites can be ascribed to the difference in electronwithdrawing features and aromaticity behaviors of conjugated polymers as donor types in WO3 composites. The low performance for WO3-PFu-based PSCs was displayed due to the poor interface, which inhibits the exciton diffusion leading to charge recombination pathways. [36]     The power-voltage characteristics estimated from both experimentally and theoretically displayed similar trend as of J-V characteristics. The output power increases linearly in the low voltage range, achieved maxima and then shows steep declined at the higher voltage. Similar to the PCE, the power-voltage characteristics ( Figure 6) display a point of maximum output power. [37]

Conclusion
We report a library of WO3 and its composites (WO3-PANI, WO3-PMANI, WO3-PFANI, WO3-PTh, WO3-PFu, WO3-PEDOT) prepared using rotating radio frequency plasma-method. The derived composites were then employed as hole transport materials for perovskite solar cells fabrication. The influence of WO3 and its composites on the photovoltaics performance was validated with machine learning methods. Machine learning approaches are paramount to uncover concealed patterns and decipher the relationships of the variables. By adopting the modeling of UV-Vis spectra, performance-related curves such as J-V, EQE can be deducted to validate solar cells performance. We demonstrated pathways to build machine learning models for solar cells parameters such as EQE, J-V and UV prediction. This will reduce associated laboratory cost and the presented ML models will accelerate the advancement in perovskite solar cells. Four machine-learning methodologies, from a simple to complex were adopted to predict three key characteristics. K nearest neighbors and gradient boosting gave better estimation of UV data while K nearest neighbors, random forest and decision tree computed successful J-V measurements. Furthermore, random forest was noted to be suitable and outperformed other models for EQE prediction. During the modelling, five-fold cross validation were applied on 80% training data set and evaluation of the models made on 20% test data. To calculate power output of solar cells, both experimental and predicted J-V datasets were utilized. Our predicted results are in agreement with the experimental results, and validate the potential of machine learning models for predicting performance with promising R 2 and RMSE scores. Arguably, machine learning models can drastically reduce experimental process, work force and associated material design expenses with quick performance prediction of perovskite solar cells and charge selective layers.

Perovskite Solar cells fabrication
Materials: The perovskite precursors were purchased from Dyesol except PbI2 and CsI2 procured from Tokyo Chemical Industry (TCI) and were employed as such.
[60]PCBM >99.5 % and Bathocuproine (BCP) were purchased from Solenne BV and TCI respectively. Clevious PV Al 4083 PEDOT:PSS was acquired from Heraueus Germany and used after filtration through a 0.45 μm PVDF filter. Device characterization: Current density−voltage (J−V) curves were performed using an AAA Oriel solar simulator (Newport) producing 1 sun AM1.5G, and were recorded by applying an external potential bias to the devices. The generated photocurrent was recorded at a scan rate of 10 mV/s (pre-sweep delay: 10s) with the help of Keithley 2400 source meter and 0.09 cm 2 black metal mask as the active area. IPCE measurements were carried out using a 150 W xenon lamp attached to a Bentham PVE300 motorized 1/4 m monochromator.
The electrical conductivity of WO3 and its composites was measured in a standard four-probe method using PCIDAS6014 current source, a voltmeter and a temperature controller at room temperature. Dry powders were pressed into pellets using a steel die having 13-mm diameter in a hydraulic press under a pressure of 700 MPa. X-ray diffraction (XRD) analyses were made in powder form and performed on Bruker D8 Advance diffractometer with CuKα radiation (λ: 1.54 Å).

Prediction Models
Machine learning can be classified into three main categories such as supervised, unsupervised and reinforcement learning. [13] Supervised learning is applied when the specific target labels are known. Unsupervised learning, however, clusters the data since there is no information about the target labels in the dataset. Reinforcement learning utilizes punishment and reward system to improve the model over the iterations.
We adopted supervised learning roof since the target values, which are continuous and known. K Nearest Neighbours (KNN), Decision Tree (DT), Random Forest (RF) and Gradient Boosting (GBR), are briefly represented (Figure 7) as regressors and are preferred machine learning algorithms. KNN and GBR were selected to unravel the influence of model complexity on the predicted results. Besides these algorithms, tree-based structures which represent flexibility, are simple but can be complex when having various hyperparameters, were applied. KNN algorithms works on a defined k numbers of nearest neighbours to make prediction about the output [38] . Although it is one of the simplest machine learning approach, it performs well since its working principle is based on similar observations behaves in a similar way. Observations, which need to be predicted in the test data set receive a class label according to the k value and assigning a proper k value is essential to avoid over-fit and under-fit problems in the model. DT creates little subspaces which are not overlapped and these regions are represented by the corresponding nodes. [38] To shorten the learning process and achieve a high-accuracy tree, it's critical to decide which attribute in the data will serve as the root node and inner node of the tree. Features that contribute most to the model selected on initial nodes to generate proper trees. Information spread through internal nodes with basic Yes/No or True/False questions where the tree starts with the root node and finalize with the leaf nodes. Combination of N number of trees which is user-defined creates the RF where each tree has a single vote on an input vector. [39] It has a lot more modular framework and a function to improve model efficiency due to its structure as it incorporates the basic structure of decision trees. Each tree in the algorithm is made up of randomly chosen data. Subsequently, each tree is unique in its own right and makes its own decisions. After acquiring the training data to construct a forest structure, the Random Forest algorithm attempts the test data for each tree structure it generates. Final prediction value is determined after the majority voting process. Another tree based yet complex algorithm is GBR, it experienced by each tree it built. The logic behind this algorithm is to achieved higher accuracy after each iteration by reducing loss function. [40] The previous tree is used to create the next tree structure by reducing the error rate. The algorithm continues to run until it reaches the defined error rate or the number of iterations.

Performance Metrics
R 2 and Root Mean Squared Error (RMSE) are known and applicable metrics to evaluate the performance of regression models. There are some key differences between these two metrics. RMSE gives equal attention to each data point where R 2 is more sensitive to the outliers while calculating the prediction error. Equations respect to R 2 and RMSE are given in equation 1 and equation 2. Instead of traditional train and test split, the data set is divided into K pieces and while K-1 train set is created in each iteration in k-fold cross validation which promotes objectivity. After deriving the performance metrics for each iteration, the final metric of the model constitutes by the average of K piece performance metrics.