Predicting Perovskite Bandgap and Solar Cell Performance with Machine Learning

Perovskites as semiconductors are of profound interest and arguably, the investigation on the distinctive perovskite composition is paramount to fabricate efficient devices and solar cells. The role of anion and cations and their impact on optoelectronic and photovoltaic properties is probed. A machine learning (ML) approach to predict the bandgap and power conversion efficiency (PCE) using eight different perovskites compositions is reported. The predicted solar cell parameters validate the experimental data. The adopted Random forest model presents a good match with high R2 scores of >0.99 and >0.82 for predicted absorption and J−V datasets, respectively, and show minimal error rates with a precise prediction of bandgap and PCEs. The results suggest that the ML technique is an innovative approach to aid the preparation of the perovskite and can accelerate the commercial aspects of perovskite solar cells without fabricating working devices and minimize the fabrication steps and save cost.

The ORCID identification number(s) for the author(s) of this article can be found under https://doi.org/10.1002/solr.202100927. DOI: 10.1002/solr.202100927 Perovskites as semiconductors are of profound interest and arguably, the investigation on the distinctive perovskite composition is paramount to fabricate efficient devices and solar cells. The role of anion and cations and their impact on optoelectronic and photovoltaic properties is probed. A machine learning (ML) approach to predict the bandgap and power conversion efficiency (PCE) using eight different perovskites compositions is reported. The predicted solar cell parameters validate the experimental data. The adopted Random forest model presents a good match with high R 2 scores of >0.99 and >0.82 for predicted absorption and JÀV datasets, respectively, and show minimal error rates with a precise prediction of bandgap and PCEs. The results suggest that the ML technique is an innovative approach to aid the preparation of the perovskite and can accelerate the commercial aspects of perovskite solar cells without fabricating working devices and minimize the fabrication steps and save cost.
in light emission and harvesting. The bandgap of the perovskites can be varied from 1.5 to 3.2 eV with the anion selection, varying composition, and the A-site cations (Cs, Rb formamidinium [FA], methylammonium [MA], [3] etc). A random forest (RF) model has been used to predict the bandgap of Li-and Na-based perovskite using 18 physical descriptors, and 9328 types of materials with ideal bandgaps to capture solar light were estimated. [14] Similarly, a linear regression model was developed to predict the bandgap of the mixed-halide hybrid perovskite with higher accuracy (root mean squared error [RMSE] of 0.05 eV). [3] Zheng and co-workers compared different ML models, that is, RF, ridge regression (RR), and support vector regression (SVR), to predict four target variables including perovskite bandgap from seven descriptors, and the authors noted the high accuracy of the RF model. [15] Compositional engineering of the perovskite is an effective approach to fabricate efficient PSCs. It is imperative to track how the cationic and anionic engineering of perovskite will influence the optical bandgap and impact device performance. Unraveling such information will be crucial for understanding and is paramount to predict or design materials with added merits. To our knowledge, most of the ML approaches for PSCs have been carried out using the literature data as the input variables.
Arguably, the data from different laboratory conditions will increase the error factor and may end up with less accuracy in the prediction. Here we use the descriptor datasets obtained from single (our) laboratory conditions which could improve the performance of the ML model.
Here, we applied the ML approach in two different steps: to predict the i) bandgap and ii) PSCs' performance using eight different perovskites. First, we derived the bandgap of perovskites from Tauc plots (UVÀvis spectroscopy) using both the experimental and ML approaches. Second, we built the model for JÀV spectra prediction to evaluate the PSC performance. Our work suggests solar cells' performance prediction and eliminates the need to fabricate working devices, which in turn save costs and avoid environmental hazards.

Prediction Models
ML can unravel concealed patterns and generate representative models from the data without assigning specific instructions to www.advancedsciencenews.com www.solar-rrl.com the machines. [16] ML focuses on prediction using generalpurpose learning algorithms to uncover patterns in occasionally complex and cumbersome datasets. Even when data are collected without a tightly controlled experimental design and in the context of complex nonlinear interactions, it returns effective results. In contrast, statistical approaches emphasize on inferences that are performed through the design and fitting of a project-specific probability model. [17] In the chemical processes, typically each experiment builds data to explore, and the task of revealing the patterns becomes demanding when the number of experiments and datasets increases. Arguably, ML develops as an efficient means to monitor the data in the chemical processes. Numerous ML-based approaches can be used for processing different datasets from simple to complex. RF consists of several individual tree structures, which is a collective method that could be run for different tasks such as classification and regression. When the workload is to classify, the RF interprets each tree and makes decisions based on majority voting, while the mean or average of the individual trees is returned for the regression task. The basic principle behind RF is the wisdom of crowds, which is a simple yet effective approach. Any of the individual constituent models would be outperformed by a large number of reasonably uncorrelated trees working as a committee. The key point is to have a low correlation between the trees to have a more generalizable model. Therefore, the RF is an attractive model that validates its performance with a flexible structure. In this method, we constructed randomly more than one decision tree (Figure 1b), to allow robust prediction, and the default hyperparameter configuration used is already defined by sklearn library. For instance, the number of trees, min_samples_split, and min_-samples_leaf were selected as 100, 2, and 1 accordingly while fixing the seed of random_state to have a reproducible model. Combining multiple randomly structured decision trees into one model results in enhanced predictions. [18] To accurately evaluate model performance, we adopted both k-fold cross validation and the typical train test split. The presented datasets ( Table 1) were first split into 80% train set and 20% test set and k-fold cross validations were applied on the train set to note the overfitting. The final evaluation measure was obtained on a 20% test set. For this, we split the data into k ¼ 5 pieces and, at each iteration, kÀ1 number of sets are used to train the model, while the remaining piece is utilized for evaluation. [19] To measure the success of the regression models, certain performance metrics are required such as R 2 and RMSE. When computing the prediction error, RMSE assigns equal weight to each data point, whereas R 2 is more sensitive to outliers. R 2 score in Equation (1) was chosen as a performance indicator in this study to also consider outliers in the data.
The functional relationship between descriptors (UVÀvis absorption, JÀV, and external quantum efficiency [EQE] curves) and the target variables (optical bandgap and photovoltaic parameters) has been decoded with the RF regression method. To probe the influence of A-and X-site variations in the lead halide perovskites, we selected eight different perovskites (Figure 1a), including the typical and most studied MAPbI 3 , mixed perovskites, and

Optical Properties' Prediction
We evaluated the influence of stoichiometric alterations on the optical properties through absorption spectroscopy. The experimental and predicted UVÀvis absorption spectra of eight different perovskites are shown ( Figure 3). From the experimental data, the absorption onsets of RbCsFAMAPI, CsFAMAPI, CsFAPI, FAPI, MAPI, MAPIÀCl, FAPI þ MAPBr, and FAMAPIÀBr were calculated as %777, %790, %812, %848, %804, %784, %778, and %835 nm, respectively. The absorption onset of perovskites has a strong negative correlation with the electronegativities of the halide components, that is, the higher electronegative component shows lower absorption onset. The blueshifted absorption onset of MAPIÀCl compared with MAPI suggests Cl inclusion. In contrast, the A-site substitutions were evaluated with their lattice constants and represent a positive correlation with the absorption onset. [20] The extended absorption onset of FAPI than MAPI has been attributed to the higher lattice constant of FA over MA and expectedly, the other mixed perovskites showed the absorption onset within the range. However, in the case of quadruple cations, with the addition of Rb cation (higher lattice constant), the absorption onset of the CsFAMAPI layers is shifted to a lower wavelength (%13 nm), suggesting a higher bandgap. We ascribed this blueshift related to the surface of perovskite due to band filling and/or reduced surface traps. [21] The performance of the RF model is indicated with a high R 2 score ( Table 2); all the studied perovskites showed an exceptional R 2 value of >0.99, indicating the strong correlation between predicted and experimental curves. After evaluating the R 2 scores of each material, to demonstrate the model's generalization performance, the average R 2 score and standard deviation based on the eight perovskites were  [22] (Figure S1-S8, Supporting Information), and the resultant E g values are tabulated ( Table 2). The predicted bandgap derived from the UV, and the RF model, were consistent and displayed a low deviation of <1.4% from the experimental results. In our case, MAPIÀCl was predicted with the higher success (0.00062 deviations), while FAMAPIÀBr prediction yields the comparatively least success with a 0.01321 error rate from all the samples. Notably, MAPI yielded a top R 2 score; however, it also showed comparatively higher deviation on bandgap prediction as the RF model was obtained by the random selection of data points in the UVÀvis dataset. Bandgap calculations from the absorption dataset can deviate if the randomly selected data points are not typically on the linear region. Investigating eight different perovskites with A-and X-site variation, we suggest that the RF model can be an appropriate model to accurately predict the optical bandgaps of lead halide perovskite. These findings signal the rational designing of the perovskite structure to push the performance.

Photovoltaic Parameters' Prediction
The goal of this work is to predict the performance of PSCs from the predicted current densityÀvoltage ( JÀV ) and the powerÀ voltage characteristics. The JÀV data from the fabricated PSCs using the eight different perovskites as absorber layers were modeled using RF regression to test the PCE predictability. Figure 4 depicts the experimental and predicted JÀV curves, and the corresponding PCEs from experimental data and the RF model are tabulated (Table 3). We used R 2 scores to evaluate the RF efficacy for JÀV models (Table 3), and the average R 2 score and standard deviation based on the eight perovskites were measured as 0.9010 and 0.0534, respectively. This highlights the model's generalization performance. Rational fitting of the RF regression model is achieved for all the PSCs, that is, 0.82 < R 2 < 0.97.
To minimize the error factors, all the PSCs have been fabricated under the same laboratory conditions in a single laboratory, and notably, this allowed us to reach a good R 2 value. The PCEs calculated from the experimental JÀV curves show that the FAPIbased PSCs displayed the lowest PCE value of 15%, and FAMAPIÀBr measured the maximum value of 19.3%. However, the other fabricated PSCs fall in between, which is in agreement with the predicted PCEs by our RF regression  www.advancedsciencenews.com www.solar-rrl.com model. It is worthy to note that MAPI and MAPIÀCl displayed maximum deviations of 0.290 and 3.176%, respectively, between the measured and predicted PCEs, while the other PSCs showed deviation near %1%. As depicted in Figure 1c, MAPI-Cl based PSC was fabricated in a pÀiÀn fashion, while the rest of the PSCs was in nÀiÀp configuration. This factor was not taken into consideration during ML. We have also not taken into account the effects of the charge transporting layers and the interfaces on the device performances to avoid complexity and this work is mainly focused on the light-harvesting layer. In comparison with the bandgap prediction, the RF regression model displayed a reduced accuracy in PCE predictions and we attribute this to the influences of charge transport layers, device architectures, interface properties, halide segregations, and induced losses. Further, we calculated the powerÀvoltage curves from both the experimental and RF-simulated JÀV datasets ( Figure S9, Supporting Information). The observed correlation between the experimentally calculated PÀV curves with the RF-simulated curve supports the commendable performance of our ML approach. External quantum efficiency (EQE) measures the ratio of the number of charge carriers collected to the number of photons of given energy on light illumination. We further assessed the performance of the proposed RF model for the EQE dataset ( Figure S10, Supporting Information). It can be deduced that the overall EQE response of PSCs is in agreement with the RF simulation. This validates our findings and demonstrates the suitability of the adopted simulation model. We adopted R 2 values (Table S1, Supporting Information) to track the performance of the RF-based EQE models and validate the performance, The average R 2 score and standard deviation based on the eight perovskites were measured as 0.9717 and 0.0239, respectively.
Here, we assessed the suitability of the RF regression model to predict the optical and photovoltaic properties of lead halide perovskites with A-and X-site variations. The predicted and experimental PCEs as a function of the predicted and calculated bandgap are plotted ( Figure 5), suggesting the efficacy of our RF regression model. Though the FAPI perovskite showed the lowest bandgap of %1.49 eV in both the experimental and RF simulation methods, it yielded the lowest PCE of 15% here, due to the method adopted for perovskite preparation. However, FAMAPIÀBr with a lower bandgap of 1.514 eV gave the highest PCE > 19%. Expectedly, CsFAPI and CsFAMAPI with a comparatively lower bandgap of <1.6 eV showed a slight decrement in PCE. In contrast, RbCsFAMAPI with a bandgap of >1.6 eV measured >18.6% PCE. Analyzing the outputs of our RF regression model, we noted that the RF model is reliable to predict optical bandgaps of lead halide perovskites. The predicted optical bandgap was not directly correlated with the performance of corresponding devices, due to factors limiting the electrical properties and interfacial phenomena. In this context, we anticipate that the widening of the descriptor pool with more inputs on the charge transport layers, device architecture, interface properties, crystal size, halide segregations, ion migration, phase stability, and induced losses is detrimental to the success of ML in predicting the solar energy conversions and this could be further extended to other energy devices such as light-emitting diodes, batteries, photodetectors, etc.

Conclusion
We developed a RF model to predict the bandgap of different halide perovskites and their performance evaluation in PSCs. We investigated the influence of different perovskite compositions on optoelectronic features and photovoltaic performance and validated using the RF model. Our model showed exceptional performance in predicting the optical bandgaps with a high R 2 value of >0.99 and demonstrated that this knowledge can be used to design new lead halide perovskites through accurate bandgap predictions. Further, our RF model showed judicious fitting of JÀV curves and predicted the PCEs, which is in agreement with the experimental data. This signals the suitability of the used prediction approach in this work as an effective, reliable, and fast method that can be implemented to the variety of materials for solar cell applications, to allow the acceleration of materials discovery and renaissance for rapid screening.

Experimental Section
Materials: All chemicals were purchased from Sigma Aldrich unless and otherwise stated and were used as received without any further purification. CsI, MA, FA, and PbI 2 were procured from TCI, while chlorobenzene (CB), isopropanol (IPA, 99.9%), anhydrous dimethyl sulfoxide (DMSO, 99.8%), and N, N-dimethylformamide (DMF. 99.8%) were purchased from Acros Organics. Perovskite precursors were purchased from Dyesol, while PbI 2 and CsI 2 were procured from Tokyo Chemical Industry (TCI).
Perovskites: Eight different types of perovskites layers were deposited as follows.
RbCsFAMAPI: The quadruple-cation perovskite precursor solution was prepared using FAI (  www.advancedsciencenews.com www.solar-rrl.com The precursor solution was spin coated in a two-step spin-coating program (1000 and 6000 rpm for 10 and 30 s, respectively). 112 μL of chlorobenzene was dripped at 10 s before ending the second spin step, followed by annealing at 100 C for 1 h.
MAPI: The MAPbI 3 precursor solution was realized by dissolving an equimolecular amount of MAI and PbI 2 (1.2 M) in DMSO solvent. The precursor solution was spin coated in a two-step spin-coating program (1000 and 4000 rpm for 10 and 30 s, respectively). 112 μL of chlorobenzene was dripped at 10 s before ending the second spin step, followed by annealing at 100 C for 1 h.
FAPI and CsFAPI: Instead of conventional precursor materials, presynthesized nonperovskite yellow powders were used as the precursor materials for FAPI and CsFAPI precursor solution, and the powder precursor synthesis was reported in our previous work. [4] 1.25 M precursor solutions were prepared by dissolving 791.25 mg of δ-FAPbI 3 and 800 mg of δ-CsFAPbI 3 powders in a 1 mL anhydrous solvent mixture of DMF and DMSO with a 4:1 (v/v) ratio. The FAPI and CsFAPI perovskites were fabricated by spin coating the precursor solutions at 1000 for 5 s and 5000 rpm for 20 s. 100 μL chlorobenzene was dripped at the final 5 s of spinning and the FAPI and CsFAPI thin films were annealed at 150 and 80 C respectively.
MAPIÀCl: The MAPbI 3Àx Cl x perovskite films were fabricated by a two-step deposition method. [23] PbI 2 solution was prepared by the dissolution of PbI 2 in DMF and stirred at 70 C for 12 h. The mixed-cation solution was prepared by dissolving MAI and MACl with the concentrations of 50 mg and 5 mg mL À1 in 2-propanol (IPA), respectively. The PbI 2 films were spun at 4500 rpm for 20 s using warm PbI 2 solution at 70 C, and then a drop of mixed-cation solution was dropped on the center of the spin-coated film for 30 s. The as-prepared samples were annealed at 100 C for 3 min.
FAMAPIÀBr: The FAMAPbI 3Àx Br x films were also fabricated by a two-step deposition method. First, 1.3 M PbI 2 was dissolved in a mixed solvent (DMF/DMSO ¼ 9.5/0.5) at 70 C overnight. The warm PbI 2 was spin coated at 4000 rpm for 20 s, and then a drop of the mixed organic solution with FAI/ MABr/ MACl ¼ 60 mg/6 mg/6 mg in 1 mL isopropanol was added on the spinning substrate for 30 s. The as-prepared samples were annealed at 150 C for 15 min under ambient conditions with 30À40% relative humidity (RH).
Device Fabrications: Both types of architects were adopted for device fabrication of nÀiÀp type. The solar cells were fabricated on commercial laser-etched FTO glass electrodes (10 Ω sq À1 , NSG). All of the electrodes were cleaned by sonication in sequence with Hellmanex II solution, Milli-Q water, acetone, and 2-propanol for 20 min each (precleaning). The cleaned substrates were dried with a stream of compressed air and were further treated by UVÀozone for 15 min before device fabrication. For the nÀiÀp (1) PSC, the SnO 2 electron-transporting layer was prepared by spin -coating the 3% wt% SnO 2 nanoparticles (Alfa Aesar) at 5000 rpm for 30 s; it was then postheated at 150 C for 15 min. For the nÀiÀp (2), compact TiO 2 (c-TiO 2 ) layer was deposited using spray pyrolysis at 500 C, using 1 mL of titanium diisopropoxide bis(acetylacetonate) precursor solution (75% in IPA) in 19 mL of pure ethanol using oxygen as the carrier gas, followed by annealing for another 30 min at 500 C, to acquire the anatase phase. SnO 2 quantum dots (SnO 2 -QDs) synthesized by a previously reported method [24] were spun coated on the FTO:c-TiO 2 substrate followed by annealing at 150 C for 45 min. In contrast, for the nÀiÀp (3), the TiO 2 mesoporous (mp-TiO 2 ) layer (1:8 w/v in ethanol) was spin coated over the FTO:c-TiO 2 substrate at 4000 rpm with 2000 rpm s À1 acceleration for 30 s, followed by progressive heating steps until 500 C for 30min. Then, the substrates were treated with UVÀozone for 30 min and transferred immediately to the argon-filled glove box in the room and the different perovskite layers were deposited as discussed earlier.
For the hole transport materials (HTM) layer, 70-and 60-mM spiro-OMeTAD were prepared by dissolving the desired amount of material in 1 mL chlorobenzene. Doping was achieved by the addition of 4-tertbutylpyridine (38.4 and 28.8 μL for 70 and 60 mM, respectively) and bis(trifluoromethylsulfonyl)imide lithium salt solution with a concentration of 520 mg mL À1 (21.1 and 17.5 μL for 70 and 60 mM respectively). 60 mM spiro-OMeTAD solution was used as HTM for FAPI and CsFAPI perovskites and 70 mM solution was used for all other perovskites. 35 μL of HTM solution was dropped on the perovskite layer and was spin coated at 4000 rpm for 20 s. Au electrode (80 nm) was thermally evaporated under a pressure of 2 Â 10 À6 Pa to complete device fabrication.
pÀiÀn type: The PEDOT-PSS film was spin coated on the precleaned and UVÀozone-treated ITO substrates at 5000 rpm for 30 s in air and postheated at 150 C for 15 min. ITO/PEDOT-PSS substrates were transferred to the glove box for the perovskite layer deposition. The electron-  www.advancedsciencenews.com www.solar-rrl.com transporting layer (10 mg mL À1 PC 61 BM in chloroform) was spin coated on the perovskite layer at 1200 rpm for 30 s. Then, the thin BCP film was spin coated on the samples at 5000 rpm for 30 s using 0.5 mg mL À1 isopropanol. Finally, the 100 nm-thick Ag electrode was deposited by thermal evaporation.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.