Chemometrics for Raman Spectroscopy Harmonization

Raman spectroscopy is used in a wide variety of fields, and in a plethora of different configurations. Raman spectra of simple analytes can often be analyzed using univariate approaches and interpreted in a straightforward manner. For more complex spetral data such as time series or line profiles (1D), Raman maps (2D), or even volumes (3D), multivariate data analysis (MVDA) becomes a requirement. Even though there are some existing standards for creation, implementation, and validation of methods and models employed in industry and academics, further research and development in the field must contribute to their improvement. This review will cover, in broad terms, existing techniques as well as new developments for MVDA for Raman spectroscopic data, and in particular the use associated with instrumentation and data calibration. Chemometric models are often generated via fusion of analytical data from different sources, which enhances model discrimination and prediction abilities as compared to models derived from a single data source. For Raman spectroscopy, raw or unprocessed data is rarely ever used. Instead, spectra are usually corrected and manipulated, 1 often by case-specific rather than universal methods. Calibration models can be used to characterize qualitatively and/or quantitatively samples measured with the same instrumentation that was used to create the model. However, regular validation is required to ensure that aging or incorrect maintenance of the instrument does not alter the model’s predictions, particularly when applied in regulated fields such as pharmaceuticals. Furthermore, a model transfer may be required for different reasons, such as replacement or significant repair of the instrumentation. Modeling can also be used to consistently harmonize Raman spectroscopic data across several instrumental designs, accounting for variations in the resulting spectrum induced by different components. Data for Raman harmonization models should be processed in a protocolled manner, and the original data accessible to allow for model reconstruction or transfer when new data is added. Important processing steps will be the calibration of the spectral axes and instrument dependent effects, such as spectral resolution. In addition, data fusion and model transfer are essential for allowing new instrumentation to build on existing models to harmonize their own data. Ideally, an open access database would be created and maintained, for the purpose of allowing for continued harmonization of new Raman instruments using an outlined and accepted protocol. Graphical Abstract


Introduction
In recent years, the application of Raman spectroscopy has been extended to a variety of fields. Amongst its unique features are non-destructiveness, capability to analyze underneath the sample surface, low interference by water, high spatial and spectral resolution, and material fingerprinting. These render Raman spectroscopy a powerful technique, for example, for pharmaceutical development, 2 as well as for life 3 and materials sciences. However, Raman users are still sometimes concerned with accuracy, reproducibility, and exchangeability of Raman data, particularly for chemometric modeling. This highlights the importance of harmonization for Raman instrumentation, data acquisition, and processing. 4 The quality of a model depends on the modeling procedure as well as on the quantity and quality of the data used for modeling.
In this review, we analyze different approaches to robust and reliable chemometric models, describing the proposed steps, procedures, and protocols, with a particular focus on harmonization. The objective is to lead the reader through the various steps required to construct a chemometric model for calibration and harmonization of Raman data. Within each step, several techniques will be introduced. It is not the objective of this review to give detailed instructions on how to perform any or all of these techniques but merely to highlight their major differences and the need for them. Ultimately, the review of these steps and techniques will be related to their applicability to harmonization of Raman spectroscopy.

Spectrum Collection
To ensure that different instruments can be harmonized, an initial step is to calibrate each instrument to an appropriate standard (see the Existing Standards, Guides, and Protocols section below). This creates a common ground to be used as a starting point for other instrumental factors to be accounted for. After instrument calibration, the acquisition protocol must be optimized. The performance of a prediction model from Raman spectra will be directly related to the quality of the data used for model generation. As such, the minimum acceptable data quality must be determined, which can be examined primarily by the signal to noise ratio (S/N). The parameters that determine the S/N, together with typical values and limitations, are listed in Table I. Despite the wide range of possible Raman instrument configurations, dispersive Raman spectrometers are characterized by three main parameters when collecting data through a specific optical path: laser power, integration time for each accumulation, and number of averaged accumulations in the spectrum. In case of a focusing optics, the laser focus will also have a significant impact. Along with sample properties such as Raman scattering cross-section, topography, or transmissivity, the incident light intensity will ultimately determine the S/N (Eq. 1) of a single spectrum 5,6 S=N} ffi ffi The model's quality depends as well on the quantity of data used for modeling. As for all other forms of statistical and predictive analytics, the general rule is that the larger the training set, the better the result. Specifically with regards to regression models, the number of observations should be at least one order of magnitude larger than the number of variables considered in the model, as a lower ratio of observations to variables will lead to misleadingly high covariance statistics. 7 Moreover, the quantity of data to be collected must take into account if the model validation method requires samples separated from the primary dataset, see the Model Validation (Performance) section below.

Spectrum Processing
In order to be used for model generation, the input spectra must be comparable with one another. Thus, in general, multivariate data analysis should not be applied to unprocessed spectra, which can contain significant artifacts and structural noise. Models will be sensitive to these variations, which do not describe the analytes being modeled and therefore will lead to misinterpretation. As such, processing techniques should be applied to the data to remove erroneous and misleading variations, while maintaining the integrity of the data itself. 8 Methods of data processing for vibrational spectroscopy may be divided into two categories, namely, filtering and model-based methods. Filtering methods, like calculating second derivatives or performing vector normalization, simply transform spectra into a presumably "better" version of the same data, removing some undesired types of variation. Meanwhile, model-based methods allow for quantifying and separating different types of physical and chemical variations in the spectra, for example, when a fluorescent baseline is modeled ("guessed") and separated from the remaining data. 9,10,11 Ryabchykov et al. 12 distinguish three phases of Raman data processing: (i) pre-treatment (including, e.g., data import/conversion, spike removal, and calibration), (ii) pre- processing (including, e.g., smoothing, baseline separation, normalization, and nonlinear scaling), and (iii) processing or model generation. Following this scheme, several Raman spectrum data processing methods typically employed by users are described in the following sections. Note that a complete list is beyond the scope of this review.

Pre-Treatment
Pre-treatment of input spectra has a strong impact on harmonization models. During Raman data collection, raw data from the detector, usually a charge-coupled device (CCD) array, are converted into intensity versus primary Raman shift. In commercial devices, this first pre-treatment step, which often also applies the instrument calibration, is usually performed by the acquisition firmware and hidden from the user, although some corrections may be optional. Then, the obtained set of spectra must be checked for spikes and outliers before proceeding with (pre)processing.

Instrument Calibration
Each detector and spectrometer in any given Raman device must be calibrated and verified. During the setup of a Raman device, well-established calibration procedures are performed that require a consistent and careful approach. Raman spectrometers must be calibrated independently for both axes, that is, the x-(wavelength calibration) and y-axis (intensity correction). Relevant calibrations require executing multiple, carefully designed steps to generate calibration data for the whole device. Hereby, Raman spectrometers and detectors are prepared in the same way as for other spectroscopy techniques, based on correlating detector pixels with the known signal generated by external references, such as gas emission lines. 13 Raman shift x-axis calibration is then performed by defining the incident laser wavelength/wavenumber as zero Raman shift, with the rest of the axis being redefined as the wavenumber difference between the pixel assignment and the laser line. There are several standard protocols and guides for the calibration of Raman instrumentation, and it is recommended to follow the appropriate one for the instrument's use (i.e., pharmacopeia for pharmaceutical instruments). Ideally, calibrations must be done not only per instrument, but rather per mode of measurement, depending, for example, on the excitation laser wavelength, sample mounting, sampling technique, or laser power output. However, this is not a standardized practice across system manufacturers and users. Since the instrumental response is subject to a steady and significant temporal drift, the calibrations-or at least their verifications-should be repeated frequently. 14

Wavenumber Calibration
Raman spectrometers are typically calibrated on the wavenumber (x) axis using standard light sources, such as neon or mercury-argon lamps, with sharp emission lines between 250 and 1700 nm wavelength. The known line positions are compared to the measured peak positions, and a wavenumber calibration is generated accordingly. This procedure often requires technical modifications of the Raman setup, such as re-wiring of optical fibers, which sometimes cannot be performed by standard users. Furthermore, the procedures often bypass the optical path that the Raman-scattered light rays would normally follow, and thus deviations incurred by some components along this path may not be accounted for. Figure 1. Raman x-axis calibration using a pair of target and reference spectra. (a) Spectra of the same polystyrene sample are recorded using the target and reference instruments. (b) The relative shift (RS) at a number of prominent, well-separated peaks is calculated (black dots), and the shifts fitted to a third-order polynomial (blue curve). (c) The resulting calibration can be used to compensate for the relative x shift, boxed region in (a). Note that the shifts calibration outside the range of measured x shifts in <∼500 cm À1 and >∼3000 cm À1 in (b) was set to the boundary values, since the polynomial rapidly diverges and may result in unreasonably large shift values.
The Raman shift denotes the difference, in wavenumbers, between the excitation laser and the Raman-scattered light. As such, the Raman shift conversion of the resulting wavenumber axis is only possible if the excitation source's wavelength is accurately known. In addition to the initial wavelength calibration, x-axis calibration is required, which can be performed by typical users using standard samples (Fig. 1). The procedure can be described as follows: (i) A spectrum of a defined standard sample is measured with the Target instrument. (ii) A defined set of anchor points x i is located, which are typically chosen as the positions of prominent peak maxima. (iii) For each anchor point, the wavenumber shift s i (x i ) with respect to the Reference is calculated. The latter can be a list of known peak positions, or a reference spectrum recorded with a well-calibrated instrument. (iv) A continuous, parametrized curve s(x) is fitted through the graph of wavenumber-dependent shifts (e.g., a polynomial or spline). This wavenumber calibration curve is stored, assigned to the specific Target instrument and mode, and given a timestamp. (v) For all following measurements performed with the Target instrument, the original x-axis is substituted by the new, calibrated axis x', with x' = x + s (x).
Dörfer et al. reported that they could not eliminate instrument-dependent differences in Raman spectra of the same material even after applying a wavenumber calibration following the standard procedure described above. 4 They also indicate that the results could be improved by additional wavenumber adjustment using an alternative chemometric procedure based on cross-correlation (or HQI). This suggested procedure is described in the Spectral Alignment section below.

Intensity Calibration
Calibration of the intensity (y) axis is performed to account for the variation in quantum efficiency and dynamic response of the detector across the spectrometers spectral range. This is especially important for Raman harmonization, as the spectral wavelength range varies depending on the excitation wavelength used by the Raman device. This leads to detectors with a wide variety of quantum efficiency curves which, without calibration, would make quantitative inter-instrument comparison impossible. Furthermore, the spectrometer's response to light depends on multiple factors, most significantly the wavelength of the light hitting the pixel. For instance, in a 532 nm Raman spectrometer the wavelength of the Raman scatter of 3200 cm À1 is approximately 641 nm, while in a 785 nm Raman instrument it is approximately 1048 nm. The quantum efficiency of a CCD detector is significantly lower at 1048 nm than at 641 nm, and as such the intensity of the scatter appears reduced in that region.
A wavenumber calibration, as described in the previous section, and, if required, a baseline separation (see the Background and Baseline Correction section) must be performed prior to intensity calibration. Instead of a set of sharp peaks, a standard material for intensity calibration should generate an intense, constant signal over a wide wavenumber range, such as the standard SRM 2242. 15 A wavenumber-dependent intensity modulation is then calculated by dividing the Reference spectrum by the Target spectrum. This will exacerbate any noise in the regions of a spectrometer with a low response to light, which is particularly noticeable when correcting 785 nm based systems. One method, proposed to avoid this problem, is to apply the intensity correction in reverse to an intensity corrected library spectrum and then compare that with the spectrum collected on a separate instrument. 16

Spike Removal
Spikes are a common artifact in Raman spectroscopy and appear as a consequence of cosmic rays (X-rays) hitting the detector during the acquisition of a spectrum. Cosmic rays can cause an intense and sharp signal burst at single pixel position on a CCD camera that typically spreads to a few neighboring pixels, thus giving rise to a corresponding spike in the digital Raman spectrum (Fig. 2). Spikes represent prominent features in an unprocessed Raman spectrum and, if uncorrected, can lead to errors in subsequent data processing steps such as normalization, peak search, calibration, or dimensionality reduction. Numerical spike correction is thus mandatory if these methods or other multivariate analyses are to be applied unless the section of the spectrum is ignored. After Ryabchykov et al., 12 the correction can be performed by: (i) (Unspecific) filtering of the entire spectrum, for example, by median smoothing or wavelet (Gabor) transformation. (ii) Setting markers that locate channels affected by spikes, followed by replacing the large count values in the affected areas. Markers can be generated, for example, using nearest neighbor comparison through the discrete Laplace operator (numerical derivative, see Fig. 2) to the standard deviation.

Outlier Detection/Rejection
An outlier is a single Raman spectrum being part of a larger dataset, such as a time series, which shows significant differences to the dataset average not related to sample properties. Outliers can occur for a multitude of reasons, including errors related to the user, instrument, sampling, or processing, and can have a significant effect on the predictive capability of a model. As such, their removal from the training dataset is key to creating a reliable model. Detection of outliers can be performed manually during data collection when obviously uncharacteristic features are observed in the spectrum. Besides, algorithmic methods can also be employed to identify outlier data, which removes user bias and speeds up the review of large datasets. Examples are: (i) Parametric methods: RMSSR and SRVIV 17 (ii) Linear modeling methods: PCA and SVM 18,19 (iii) Proximity-based methods: NND, NNMD, cluster analysis, and local outlier factor 17,20,21 (iv) High-dimensional methods: HCS 22

Pre-Processing
Pre-processing is required to give relevance to data. Special care must be taken to avoid overprocessing, where the part of the data removed from the processed spectrum is of significance to the sample's response and not due to variance from other sources (e.g., instrumental or environmental effects). Overprocessing can also lead to sample-dependent variations being unproportionally weighted within the model, and thus to over and underfitting (see the corresponding section) as well as misleading model classifications or predictions. 23 Some examples: (i) Over-smoothing can remove a shoulder peak from a larger peak. (ii) Rigorous baseline separation can make peak maxima shift away significantly from their original position. (iii) Multiple peaks can be fit to a single response peak, and while this may improve the residual of fitting, it may not be based on reality.
In the following sub-sections, four steps of Raman spectrum pre-processing are described, which are considered essential to data harmonization.

Noise Reduction
According to common literature on Raman spectroscopy, two main types of noise occur in digital Raman spectra: 24 (i) Additive noise with Gaussian distribution, originating mainly from detector readout and dark current. (ii) Intensity-dependent noise with a Poisson distribution.
Both types of noise can be reduced by spectrum smoothing, typically applying Savitzky-Golay (Fig. 3), mean, median, or Gauss filters. 2,4 Note that all smoothing algorithms will lead to some information loss, since they correspond to annual filters in Fourier space. 2,6 Therefore, smoothing should only be used where absolutely necessary, or for intermediate steps such as baseline model fitting. Information loss by smoothing can be mitigated by the implementation of maximum likelihood estimation (MLE). 25 By restricting the behavior of the smoothing algorithm based on the distribution of the noise, MLE is able to reduce the impact of smoothing on the Raman signal. A discrete wavelet transform (DWT) can also be used to reduce the noise. 26 DWT can detect and filter out gaussian noise based on a thresholding value but does not have a significant impact on the intensity-dependent noise. This process can be iterative, using different threshold values for each iteration to remove the noise.
An enhancement of the signal-to-noise ratio can be achieved without information loss through averaging of multiple spectra, for example, of a spectral time series. Here, the S/N will improve with N 1/2 , where N is the number of spectra averaged. The same holds for many methods of dimensionality reduction relying on averaging, such as principal component analysis (PCA) or non-negative matrix factorization (NMF) (see previous Spectrum Processing section). Example of a simple, marker-based spike removal algorithm working on a noisy Raman spectrum of solid polyethylene. First, the spike is located via a threshold to the modulus of the numerical spectrum derivative along the x-axis (boxed area). Then, the y values in the spike interval (red), usually a few spectrum channels (four in this case), are substituted by the average counts of six channels adjacent to the spike interval.

Background Correction and Baseline Separation
Spectral baseline is affected by several factors: environmental, instrumental, and sample-related. Background correction is the subtraction of any environmental and instrumental signatures detected by the spectrometer when there is no laser excitation. Background correction, if correctly applied, will improve the quality of the data. The background includes, but is not limited to: (i) Detector dark current and baseline offsetting. (ii) Environmental emissions detected by the spectrometer. (iii) Self-emission from the sample.
The baseline is also often altered by factors that are observed only under laser excitation. Relevant sources of baselines in a Raman spectrum are: (i) Rayleigh scattering (non-constant, more intense at small Raman shifts). (ii) Fluorescence (non-constant, low-frequency curve). (iii) Raman signatures originating from impurities, contamination, or substrates.
Baselines can interfere with the analysis of the underlying Raman spectrum, and sometimes even dominate the Raman peak intensity, such as in the case of fluorescent biological or polymer samples. Therefore, an essential pre-processing step is to separate the Raman signal, typically consisting of a set of sharp peaks, from its baseline. We deliberately avoid the term "baseline correction", since all components of a Raman spectrum can potentially contain sample information and should thus be preserved in general. Therefore, we refer to "baseline separation". Separating and preserving both the sharp Raman peaks and the baseline avoids information loss, since it allows for recovery of the original spectrum.
Baselines can often be modeled by a smooth function, which is then subtracted from the raw spectrum. Methods for separation of broad baselines include asymmetric least squares (ALS; Fig. 4), 27 advanced penalized least squares approaches, 28 sensitive nonlinear iterative peak (SNIP) algorithms, 29 Goldinec, 30 spline-fitting methods, such as morphology-based cubic p-spline fitting 31 or b-spline fitting, piecewise polynomial fitting, 32 wavelet feature points, 33 convolutional neural networks, 34 morphological and mollifier-based baseline corrections, 35 numerical derivatives, and rolling ball techniques. 36 A number of the listed methods are available as part of the SNIP, 29 ISREA, 37 and HyperSpec 38 software packages.
Like other pre-processing steps, baseline separation can drastically impact the performance of Raman-based parameter prediction models. Schulze et al. compared and evaluated several baseline separation techniques for automated implementation. 39 Guo et al. report a method of automated selection of a baseline algorithm that optimizes a specific model. 40 This potential model specificity of the baseline separation method underlines the need to preserve the original training data with any model to modify the baseline separation procedure if needed.

Normalization
Intensity variations of Raman spectra can be dramatic and occur between laboratories, instruments, measurements, and even within one measurement (e.g., maps or time series). They can originate from instrumental or experimental factors, for example changes in laser focus or detector response. A global, that is, wavenumber-independent, scaling of Raman spectra can be mitigated by a number of normalization techniques, including 41 vector normalization, area normalization (i.e., normalization to the integrated spectral intensity), standard normal variate, and min-max scaling. All algorithms listed above are based on spectrum modification by two constants a and b, according to Eqs. 2-4, such that each spectrum channel y i is modified as follows: For min-max scaling, Spectrum normalization is widely applied to improve comparability of Raman data, and will also improve the performance of many prediction and decomposition models. However, it cannot account for wavenumber-dependent gain artifacts, where different scaling applies to different Raman peaks. Such effects must be compensated by appropriate intensity calibration (see Intensity Calibration section above).

Non-Linear Scaling
Raman spectrometers will output data on a linear scale, assuming an intensity corrected spectrum (see "Intensity calibration"). For quantitative analysis, it is often assumed that the relative amplitude of a substance-characteristic peak is proportional to its relative content within the sample. However, the relationships of peaks and their behavior do not always act linearly, as can be seen in the Raman peak intensities from graphene. 42 Raman spectra can be scaled in a non-linear fashion to increase the relative weights of some spectral features, such as through a logarithmic, square root or power transformation. For example, if (x,y) represents a Raman spectrum, then (x, log(y)) is taken as the processed spectrum. This is often done to increase the performance of prediction models, such as partial least squares (PLS) regression. 43

Processing and Model Generation
Chemometric models are statistical, or machine learning models typically implemented in the form of numerical structures (classes), that is, computer programs, which predict sample properties or parameters from digital Raman spectra. In general, those models can be generated through supervised or unsupervised learning. 44 Models should be trained on a sufficiently large number of known Raman spectra, the "training set" (also referred to as fitting; see Model Validation (Performance) section). Afterward, the trained model is saved as a data structure, which can be later used for predicting properties of unknown samples from their Raman spectra (queries). This process is also referred to as model deployment. To consistently harmonize Raman spectroscopic data across different instrumental designs a chemometric model must account for the effect of the component variations on the resulting spectrum.
Predictions can be categorical, in the case of a classification (e.g., recognition of bacterial species), or numerical, in the case of a regression (e.g., for predicting the composition of a polymer blend). A recently reported example of classification for Raman spectra is the SIMCA method, a supervised pattern recognition method in which a PCA model is constructed for each material class; for classification of a spectrum, its residual difference with each class is calculated. 45,46 Particularly for pharmaceutical applications, the classification criterion HQI > 0.950 is often used as a simple model for material verification (classification). 47 As an example, calibration and deployment of Raman models for pharmaceutical online process monitoring has been described to involve four main steps: 43 (i) Collection of well-characterized spectral and analytical data for the parameters of interest, either from existing databases ("historical" data) or from new measurements, (ii) spectral data pre-processing for S/N improvement, (iii) correlation of pre-processed spectra to the parameters of interest by statistical methods. The resulting model must be validated, and (iv) construction of a visual tool to use the calibrated model online for real-time monitoring and control applications.
A large number of statistical and machine learning models have been developed, and Fig. 5 shows some of the most relevant. Most of them are readily available within libraries of established scientific programming languages, such as Python's Scikit-Learn library, 48 Matlab, R, and C++.
Unsupervised models are used if no training data with known properties (targets) are available, for example clustering; vertex component analysis; 49 or unmixing, for example, using NMF, PCA, independent component analysis (ICA), modified alternating least squares, 50 and multivariate curve resolution alternating least squares (MCR-ALS).
Supervised machine learning algorithms, like regression or classification models, can be applied if training data with targets are available to extract high-level information, such as linear classification and regression methods (e.g., multilinear regression or PLS regression); kernel support vector machines (SVMs); 51 artificial neural networks; 52 random forest (RF); 53 and linear discriminant analysis. 54 A supervised model (e.g., Fig. 6) is typically generated in several steps (Fig. 7) that will be commented on in the following sections: (i) (Centralized) data fusion: The complete dataset is converted to a matrix composed of the individual spectra as lines. This requires appropriate preprocessing and interpolation of all spectra to a common x-axis. The columns of the resulting matrix are referred to as "features", while its lines are "instances" (measurements). (ii) Dimensionality reduction: Use of algorithms that reduce the number of features (Fig. 10) and thus enable a robust prediction.
(iii) Test/train split or data sampling: The dataset is separated into a training and a testing subset. The testing subset must not be used for training to avoid bias. The training set must have significant variation to describe the population. (iv) Model training (fitting): Also referred to as model "calibration" by some publications. 43 This is not to be confused with spectrometer calibration (see Instrument Calibration section above).

Data Fusion
Raman data fusion (DF) refers to merging a Raman dataset with other data, which can be either Raman data from different measuring modes, instruments or laboratories, or   data from sources other than Raman spectroscopy. In principle, one can differentiate two types of data fusion: (i) (Horizontal) DF: The same set of samples is analyzed by Raman plus other analytical methods. 56,57 In this case, only columns/features are added, while the samples or data instances, that is, the matrix rows, are the same for all datasets. After appropriate dimensionality reduction or feature extraction, the resulting method-specific data matrices are merged (concatenated) horizontally to obtain a unified data matrix (Fig. 8).
(ii) Data augmentation: 58 A Raman dataset is merged with another Raman dataset either from the same or from a different set of samples. In this case, the features, that is, matrix columns, are the same for all datasets. This requires some overlap on the Raman shift (x) axis and either dimensionality reduction or interpolation of spectra to a common xaxis (Fig. 9).
Furthermore, Ryabchykov et al. 12 describe two levels of data fusion: (i) Centralized DF is carried out directly after preprocessing, possibly even before dimensionality reduction, 57 by merging data from different sources into Scheme for high-level data fusion of Raman data with data from other (hyphenated) methods. Raman data is subjected to dimensionality reduction to three scores (features), and then combined with data from two other analytic methods carried out on the same samples. The features are combined as columns of a fused data matrix, which can then be used for analysis and model generation. In this case, the result is an n × 8 data matrix.
a single data matrix with subsequent simultaneous analysis. (ii) High-level DF, also called distributed DF, analyses each data type separately and combines the scores at the final step. 59 In any case, the fusion of data following either of the described schemes will lead to some information loss, resulting either from dimensionality reduction or from x-axis interpolation and rescaling. Therefore, fused datasets should be temporal and only generated for a specific analytic purpose, for example, the training of a chemometric model. The original data should always be preserved alongside the fused dataset.

Peak Fitting
A common mode of quantitative analysis is fitting of a mathematical model function to one or multiple peaks (bands) in a Raman spectrum. 60 Typically, Gaussian, Lorentzian, Voigt, or Pearson-type profiles are used, since they often yield a good fit to experimental data. 61 For some materials, inhomogeneous peak broadening occurs, which requires asymmetric models for peak fitting. 62 The model functions yield a number of univariate peak features such as amplitude, maximum position, centers of mass, or full width half-maximum. All features derived from the intensity (y) axis should either be measured in a normalized spectrum or given relative to a reference peak. Note that the representation of a Raman spectrum by univariate peak features, such as intensity ratios or parameters of a fitted Gaussian profile, represents a form of dimensionality reduction (see the corresponding section).

Dimensionality Reduction (Model-Based)
Raman spectroscopy often produces large datasets that can have high dimensionality. For example, a dataset containing 10 Raman maps of 100 × 100 spectra each and 1600 channels per spectrum will have four dimensions and consist of 160 million numbers. In order to make such datasets accessible to human observers, and enable a robust analysis and model generation, the dimensionality must thus be reduced, while preserving as much dataset information as possible. A schematic example is shown in Fig. 10. In many cases, dimensionality reduction (DR) is a prerequisite for fusion of Raman data from different sources, or Raman data with data from other analytical methods (see Data Fusion section above). The most common DR algorithms are unsupervised feature extraction methods, such as PCA, 63 independent component analysis (ICA), 64 NMF, 65 and multivariate curve resolution alternating least squares (MCR-ALS). 66 Factor methods reduce dimensionality in terms of some factorization of the original data matrix, which has the individual n Raman spectra with k channels each as lines. 12 After DR, the original n × k matrix is factorized into the m × k matrix of m spectral components and the n × m matrix of weights or scores, with m << n, k. A visualization of the m components is Figure 9. Scheme for data augmentation (centralized or high-level) of Raman data. For unprocessed data, features can be the detector counts for a specified set of Raman shifts, while for processed data they may, for example, be weights of PCA components (scores). The result is an (m + n) × k data matrix. usually called a loading plot (see Fig. 10). Once the factorization model is fitted, any Raman spectrum with k channels can be transformed into m scores, indicating the weights resulting from the decomposition of the target spectrum into the m inherent components of the model. In particular, NMF and MCR-ALS can provide physically and chemically meaningful decompositions, where loading plots correspond to Raman spectra of the pure components.
An alternative route to dimensionality reduction is variable or feature selection, where a subset of spectrum channels is chosen bearing the highest relevance for a subsequent prediction or decomposition model, for example, by means of a genetic algorithm. 67 Dataset dimensionality can also be reduced by extraction of a number of univariate features, such as (fitted) peak amplitudes, widths, or ratios.

Over-and Under-Fitting
An overfitted model is defined as a "model that contains more unknown parameters than can be justified by the data". 69 This means that the model is too precisely tailored to the training dataset, so that the model cannot fit new data or reliably make predictions. The model is therefore hardly accounting for the defining variables but rather for artifacts and/or the noise.
An underfitted model is defined as a "model where some parameters or terms that would appear in a correctly specified model are missing either by mistake or by design." 69 This means that the model is excluding variables that would better define the observations. Both overfitting and underfitting will increase the error in classification or prediction performed by a model. By following the principle of parsimony, the user should find the adequate parameters to define a model. One can both loosen and tighten the parameters used, comparing the error of the model, to find the lowest possible number of parameters. The parameter analysis and model creation can be guided by theory; it can help streamlining the process. Tools such as Akaike's information criterion can be used to compare several models constructed from the same dataset and indicate which is best in terms of statistical inference. 8 In the case of Figure 10. Schematics of dimensionality reduction using a factor method (NMF) for a confocal Raman map, an example of a large Raman dataset (left). First, all Raman spectra included in the map are stacked as lines of a common matrix. To the latter, a NMF model is fitted, which contains four components (center): the polymer matrix (blue) and three types of flame retardant additives (orange, green, and red). The transformed dataset (right) has reduced dimensions, and each spectrum of the original dataset is represented by four scores W n , corresponding to the components c n . If the original dataset is a Raman map, the transformed dataset can be represented as a series of images (scale bar: 5 µm). 68 Figure 11. Demonstration of chemometric spectrum adjustment implemented in a CHARISMA software prototype. 97 (left) A wavenumberdependent negative shift (top hat plus ramp) was applied to a Raman spectrum. The shift is correctly detected by maximizing the HQI of original and shifted spectrum through simulated annealing (orange circles). A continuous shift function is modeled using a cubic spline (blue curve). (right) The target (shifted) spectrum is adjusted (calibrated) using the shift model. supervised models, over-or under-fitting can be mitigated, for example, by grid search optimization over relevant model parameters, such as the number of components in PLS. For unsupervised approaches, the application of chemical or physical constraints can be appropriate, such as setting the number of PCA or NMF components equal to the number of expected chemical components in the sample.

Data Structure of Chemometric Models
In contemporary scientific programming languages such as Python, models can be generated, trained, saved, and used in the form of class instances. Classes are data structures that provide a means of bundling data and functionality together. 70 Each class instance can have attributes attached to it for maintaining its state. Class instances can also have methods (defined by its class) for modifying its state. A model class for prediction from Raman spectra typically has a.fit() method for training and a.predict() method for prediction, plus attributes storing a numeric representation of the trained model, for example, a linear regression or neural network classifier. An instance of a model class can, for example, be serialized and saved to disk as a binary file. A similar implementation exists for other programming languages such as Matlab or R.

Model Validation (Performance)
Validation and continued monitoring of a model are important to its performance and longevity, respectively. For harmonization, an increased longevity of the model is desired, which can be achieved by the reliable maintenance of the instruments throughout their lifetimes. Therefore, the model performance must be constantly monitored after deployment, and re-training must be applied if required.
Models can be quantitatively validated and scored using one of the following sampling schemes for a test/train split of the dataset: 71 (i) Cross-validation splits the data into a given number of folds, usually 5 or 10, 72 holding out one fold at a time for testing; the model is trained using the other folds. This process is repeated until all the folds were held out. (ii) Leave one out is similar, but it holds out just one instance at a time, training the model with all the others and then testing it with the held-out instance. This validation process is very stable and reliable, but also slow. (iii) Random sampling randomly splits the data into a training and a testing set in a given proportion (e.g., 65:35) for a specified number of times.
Using an appropriate sampling, one or more scores are then calculated following established metrics for regression and classification. 71 In regression analyses, many metrics for regression performance are derived from the error of a single prediction, defined here as the difference between the true value of the dependent variable and the value predicted by the model. First, the mean error of prediction (MEP) is the average error of predictions for the test dataset. It is equal to the bias of a model. A model is biased if it makes predictions with a nonzero MEP. Second, the mean absolute error of prediction (MAEP) is the arithmetic average of the absolute errors (how close forecasts or predictions are to eventual outcomes). The MEAP corresponds to an l 1 norm and is generally less sensitive than the root mean standard error of prediction (RMSEP). Third, the mean squared error of prediction or deviation (MSEP or MSD) is the average of the squares of the prediction errors for the test dataset. Finally, the root mean square error of prediction or deviation (RMSEP or RMSD) is the square root of the MSEP. It is equal to the Euclidean or l 2 norm, and therefore more sensitive towards large deviations as compared to the MAEP. The RMSEP is the most common measure of performance for regression models. The coefficient of determination (R 2 ) is interpreted as the proportion of the variance in the dependent variable that is predictable from the independent variable.
In classification analyses, the area under the receiveroperating characteristic (ROC) curve 73 is a measure of sensitivity and specificity for a binary classifier, that is, a model that distinguishes two classes. A perfect classifier would generate a ROC with an area of 1.0, whereas totally random predictions will result in an area of 0.5. It should be noted that an appropriate external validation set, ideally acquired with instruments or parameters other than those of the training data needs to be chosen to test a classifier using the ROC. If discrimination is based on instrumental or other artifacts rather than on chemical differences, the resulting ROC will probably be misleading. Classification accuracy is the proportion of correctly classified examples. 74 The accuracy can be misleading, especially if classes are unequally represented in the training dataset. For example, a trivial model that always predicts the same class will achieve a high precision if 90% of the training instances belong to this class. The balanced Fscore, or F 1 , is a weighted harmonic mean of precision (positive predictive value) and recall (sensitivity), 75 which are the proportion of true positives among instances classified as positive 76 and among all positive instances, respectively. The F-score favors classifiers for which precision and sensitivity are similar. This may not always represent the best metric, especially for models that require either a particularly high sensitivity or precision.
For classification, combined metrics like the sum of ranking difference have been applied that can allow for more stable model evaluations than the single metrics listed above. 77

Model Transfer
Without model transfer, chemometric harmonization of multiple Raman instrumentation would be impossible. The requirement for transferability of a prediction model is best illustrated using a typical use case for chemometric modeling with Raman spectroscopy data: (i) A prediction model for a material property is trained during a measuring campaign, where a large set of training spectra are recorded by some laboratory, using a defined set of instruments. This data, upon which the model is generated, is referred to as the primary dataset (PD). For training, the primary dataset is split into training and test datasets. (ii) The performance of the model is validated using the primary test dataset. Then, the trained model is stored in a standardized format and made available for future use (deployed).
(iii) At some point in the future, a second laboratory intends to use the deployed model to predict the sample properties using its own current data (secondary dataset, SD), acquired using instruments and modes different from those used to record the primary dataset.
When a trained chemometric model (built using PD) is applied to newly measured data (SD), predictions are often less precise than those done based on the original test data (PD). This issue gets more important if the new data is significantly different to training data. Large, undesirable spectral variations often pose great challenges for chemometrics, manifested as a failed prediction of the trained model for newly measured datasets. Even if both primary and secondary datasets are stored in the same data format with a common x-axis, which in principle makes them mutually comparable, the prediction may fail for a secondary dataset that has been measured under different conditions compared with the conditions of the training set. 78 The reason for this is often wavenumber (x) shifts and intensity (y) variations due to instrumental artifacts, or changes of experimental conditions such as laser wavelength, power, or sample temperature. 79 Moreover, the established calibration procedures yield calibration models that are reliable only in the operating conditions they were calibrated in (see Data Fusion section above). Typically, it will not be possible for the secondary laboratory to reproduce a training dataset that allows for the generation of a new model, since this requires a large number of samples, which may be expensive or even impossible. Therefore, reliable model transfer procedures are necessary to tackle problems related to the abovementioned undesirable variations (model transfer and calibration transfer are often used synonymously in the literature). 12,80 A number of model transfer approaches has been proposed, [80][81][82][83] such as discrete wavelet transformation (DWT) 86 and direct standardization (DS). 85 For near infrared spectra, piecewise standardisation 86,87 and finite impulse response filters 88 have proven to enable the successful transfer of calibration models.
Existing model transfer approaches can be categorized into three strategies: (i) Build robust models: Raman spectral features are sensitive to measurement variations. To build a robust model, one option is to involve metadata of measurements into model construction, as in the generic solution proposed by Kalivas et al. to always enable the application of Tikhonov Regularization. 79 (ii) Standard-free spectral alignment: Use of numeric calculations to remove spectral variations between the primary and secondary data set. Examples for spectral standardization methods used in nearinfrared spectroscopy, such as Procrustes analysis, orthogonal signal correction, parametric time warping, or piecewise direct standardization, are still limited in Raman spectroscopy due to the high noise they introduce. Moreover, as purely datadriven procedures, they may remove both sampleand setup-induced variations without distinction. In this context, methods based on bilinear modeling, such as replicate extended multiplicative signal correction 89 and ANOVA-simultaneous component analysis, make it possible to estimate the setup-induced spectral differences and remove them from the data while preserving the sampleinduced variations. This can be a more reliable mechanism, although it requires the experiments to be well designed. Such a method is also limited if the sample-and setup-induced spectral variations are not independent of each other. 90-92 (iii) Model updating: The aim is to update the model introducing the differences between the primary and the secondary datasets to obtain a new model that tolerates interfering variations.
The following sections focus on spectral alignment and model updating, approaches that require model training based on modified primary and secondary datasets. Both model transfer strategies can be combined, which may optimize the model performance, as shown for the case of biological Raman datasets. 82

Spectral Alignment
Spectral alignment does not require the generation of new training data by the second laboratory, because the data, both primary and secondary, are processed in order to make them more similar. However, it does require the primary dataset to be stored with the model. Two methods of spectral alignment are possible: spectrum calibration to a standard, and standard-free spectrum adjustment: For spectrum calibration, the wavenumber (x) and counts (y) axes of both datasets are calibrated using a standard material, for example, 4-acetamidophenol (x-axis) and reference material SRM 2242 (y-axis); 93 for standard-free spectrum adjustment, both datasets can be adjusted to each other by using the total spectral average of both datasets combined as the reference, and the averages of each individual dataset (primary and secondary) as targets. Then, the targets are aligned to the reference.
In both cases, a set of peak maxima is used as anchor points to which local wavenumber shifts with respect to the reference are assigned. A wavenumber calibration function is then fitted to the anchor points, typically a third-order polynomial. As opposed to the standard calibration method, an optimization step is added for spectrum adjustment where the anchor point positions are refined using a maximization algorithm. Such an algorithm may use, for example, the hit quality index (HQI; Eq. 5): 94 where the s are the vector representations of the averaged spectra and s ref is the reference spectrum. The HQI is equivalent to the numerical cross-correlation ( fig. 11). The refinement of local wavenumber shifts often leads to a significant improvement of the model accuracy. Parameter optimization methods that may be used for this task include simulated annealing 95 or downhill simplex. 96 Notably, spectrum adjustment must be performed as an ensemble method equally for all spectra in the PD and SD, and not individually for each spectrum, since this may eliminate material-dependent effects. This requires modification of the primary data, and thus a new model training after adjustment. This step may be undesirable and could be avoided if instead of adjusting both datasets to their average, the secondary data is chosen as the target and adjusted to the PD (reference), while leaving the PD unchanged. However, this is only possible if the PD range is contained in the SD range, so that the SD can be truncated and re-sampled to the PD (Fig. 12).

Model Updating
Model updating has shown to be effective for prediction models based on Raman data. Rather than modifying the primary or secondary data, model updating requires a small number of training data acquired in the secondary laboratory. Using the primary (spectra X and targets y) and secondary (L, y*) training data, the model is re-generated, assigning an increased weight λ to the SD portion as compared to the PD. This can be achieved by Tikhonov regularization (Eq. 6): 0 The weighted unity matrix εI is inserted to avoid singularity of the matrices. b is a matrix representation of the model, e.g., a multilinear regression.
However, the criteria that must be met for applying Tikhonov regularization often limit its utility. For instance, the method is not applicable if there are no secondary targets y* available.

Other Approaches
Another approach for model generation based on heterogenous Raman datasets is locally weighted modeling, such as Figure 12. Effect of spectral range on model transferability. If the primary range is included in the secondary range (case a), the original model may be used after appropriate truncation and re-sampling of the secondary dataset. However, the model cannot be used if the primary range is not fully included in the secondary (case b). In this case, an appropriate model transfer is mandatory, which requires the training data of the original model. localized regression or PLS. 98 In a nutshell, such algorithms identify those training spectra most similar to the one to be predicted, and build a new local model using these training instances only. Tulsyan et al. 43 propose the use of just-in-time learning (JITL) on global generic models. Here, an extensive training data set (library) is generated, including many spectra from different experiments, instruments, and laboratories. Fitting a generic model on this "incompatible" global dataset would result in poor model performance. Instead, JITL calibrates "local" models in real-time upon measurement of a Raman spectrum to be predicted. For local model fitting, the most relevant local calibration set is selected from the library based on the Euclidean distance, and then the model (e.g., PLS) is fitted with this local training set. Once the model predictions are available, the local model is discarded, and the JITL waits for a new Raman spectrum to calibrate a local model (Fig. 13). The authors report a substantial improvement in the online prediction of component concentrations during various pharmaceutical processes by Raman spectroscopy and mention a "paradigm shift in pharmaceutical manufacturing."

Existing Standards, Guides, and Protocols
There exist several protocols from multiple sources related to Raman spectroscopy. 99 The main source is the ASTM International organization, [100][101][102][103][104] which has published several standards on multivariate analysis and its use for qualifying and verifying spectrometers. 17,[105][106][107][108] Besides, the main pharmacopeias include specific chapters dedicated to Raman 109,110 and are also leading the guidance to chemometric data analysis. 111,112 Pharmacopoeias The pharmacopoeias (mainly European and U.S.) have specific chapters fully dedicated to the use of chemometrics. The EU pharmacopoeia created the chapter 5.21 "Chemometric Methods Applied to Analytical Data" in 2016, 111 and the US pharmacopeia created the chapter 1039 "Chemometrics" in 2017. 112 The former is a general text, which aims at guiding the user towards good chemometric practice and introduces chemometric requirements and techniques. It explains the need during the creation of a model for error analysis by RMSEP and standard error of calibration (SEC), as well as outlining the considerations that should be made when selecting samples and processing methods to ensure a good quality model. It also outlines the need for maintenance of the models, with the parameters to be validated for both qualitative and quantitative models. While the EU pharma chapter recommends PCA, PCR, and PLS, it also explains the principles and potential uses for measures between objects, SIMCA, clustering, MCR, MLR, support vector machines, and artificial neural networks. The US pharmacopeia is similar, but it also guides the user through calibration, validation, maintenance, and update methods in an integrated workflow for lifetime maintenance.

ASTM International (ASTM)
While the MVDA-focused ASTM standards 17,[105][106][107][108]113,114 are not always specific to Raman spectroscopy, they are all relevant and transferable to multivariate analysis performed on Raman spectroscopic data. The multivariate techniques covered by the standards are limited: the user must employ partial least squares (specifically the PLS-1 algorithm), principal component regression (PCR), or multilinear regression (MLR) to be considered compliant with some of these standards. Moreover, the standards are also limited in their scope, as they do not describe the entire multivariate data analysis method and model creation process, and instead they recommend a subject matter expert to define the process steps that are not described, such as how to treat and process the spectra before model creation without compromising the integrity of the data.
Methodologies and mathematics for qualifying and validating models during development, implementation, and continual running, as well as after model updating are described. This includes flow charts, considerations for the model and for building a calibration set, as well as several equations for error statistics to highlight issues in the validation as well as outliers: The SEC and its derivative equation, the root mean standard error of calibration, which are used to assess the agreement in the predicted and actual result.
The T-value of the significance of validation bias, which informs the users whether they should use the standard error of validation (SEV) and the standard deviation of the validation residual (SDV).
The equations for the SEV and SDV are altered depending on the calibration model being used (i.e., single reference value and estimate, multiple reference, and single estimate, multiple of both).
The predicted residual error sum of squares, which is used within the standard error of cross validation to assess those models, utilizes a cross validation method. Several outlier statistics are described for assessing the spectra input in creating the model as well as those used to validate the model when implemented and after update. These are the leverage statistic, root mean sum of squares residual (RMSSR), nearest neighbor distance (NND), nearest neighbor Mahalanobis distance (NNMD), and the standard residual variance of the independent variables (SRVIV).
The procedures defined above are described using equations, but the standards also note other techniques for the purposes of model calibration and validation. It is recommended that validation is performed using external and internal data. Internal validation data are obtained from data used to create the model (primary dataset) or by statistical resampling of the analyte. External validation data are independent of the model, and thus can be neither part of the creation dataset, nor collected from the analytes used to train the model (secondary dataset).
The outlier statistics are described to allow the user to identify and exclude data which would be an extrapolation of the model and/or have unusually high leverage on the model. These statistics can be applied to data used to create, validate, or calibrate the model, as well as to data tested against the model.

Outlook
For models to be developed, validated, and used for predictions, the data involved must comply to certain requirements, that is, there must be common variables across the data. When data is collected from multiple instruments, x-axis matching is easily achieved by interpolation to create a new universal x-axis across the data. However, if the x-axis is not properly calibrated then the model's results will be misleading, as well as in the case of variations in the resolution of the different spectrometers, as a spectrum can lose key features with lower resolution. Therefore, a harmonization effort should be made to account for these effects, and this relies on chemometrics. The implications for data handling and processing can be summarized as follows: (i) Calibration of any instrument should be carried out using a set of standards which generate well-defined features in their Raman spectrum, such as peak positions and relative peak intensities. (ii) Any data treatment or processing should follow a carefully defined and outlined scheme.
Data pre-treatment must be kept at a minimum. That is, outside of calibration, only spike removal should be performed unless further treatments are required to obtain usable data.
(iii) Data (pre)processing should preserve the original data. Processing steps should be reversible, reproducible, and well described in terms of their algorithms (logged) within the data file. (iv) A common database should be built and made accessible for the intended users, to allow for models to be reconstructed for transfer.

Spectrometer Calibration
Wavelength calibration of the spectrometer is not sufficient to calibrate the x-axis. The precise laser wavelength must also be known for an accurate conversion from wavelength in units of nm, to Raman shift in units of cm À1 . Even in cases where the laser wavelength is known with good precision, not all aspects of the instrumentation that can cause deviations can be fully accounted for. One of the reasons is that the spectrometer is usually calibrated independently, thus ignoring deviations due to other components within the Raman instrument, which should be considered as a whole to reduce the sources of discrepancy when comparing separate Raman instruments. Commonly, a practical solution is analyzing a Raman shift standard and using it to fine tune the Raman shift calibration. The spectrum adjustment by HQI described by Rodriguez et al. 115 may mitigate artifacts introduced into spectrum wavenumber calibration by unequal spectral resolution, which may initially lead to errors in peak locations, and thus in the Raman shift calibration model function, usually a polynomial. 116 One calibration which is often overlooked as an insurmountable hurdle is that of the resolution, which should not be confused with interpolation of the x-axis for variable matching. The most straightforward way is smoothing the higher resolution spectrum until its resolution is equal to that of the lower resolution spectrum. 47 This will improve the comparability, but also cause some information loss. Alternatively, through use of a double digital projection slit (DDPS) it appears possible to in fact simulate higher resolution in low resolution data. 80 Using a virtual ideal spectrum of the calibration material, DDPS can improve matching between low-and high-resolution spectra, and visual inspection of the spectra shows deconvolution of peaks that were indistinguishable in the low-resolution spectrum. The reliance on a virtual ideal spectrum may prevent the use of DDPS on wide scale harmonization. The use of the HQI as target function for spectral alignment takes all spectral values into account, rather than only the peak positions, and may also be able to correctly align spectra of unequal resolution. 115 Currently there is no literature that demonstrates intensity correction outside of white light and standard fluorescent glass samples. The HQI can be affected by wavenumber-dependent intensity variations, caused by different excitation wavelengths, which impacts the intensity of Raman scatter observed at higher Raman shifts due to the spectrometer quantum efficiency as well as deviations due to resonance effects. This issue with HQI is studied by Park et al. to determine the applicability of modified versions of the HQI approach. It is believed by the authors that following the work of Guo et al. 83 describing the adjustment of wavenumbers (x-axis) using HQI, it may in principle also be used for intensity adjustment (y-axis).

Data Fusion and Model Transfer
Data fusion and model transfer are core elements for the harmonization of Raman spectroscopy, especially when applied to a variety of materials. A high-level data fusion is implemented in harmonization models. For the longevity and integrity of the harmonization system, the original data should always be preserved alongside the fused dataset. A centralized data fusion can be implemented in the form of file groups, where a file group is generated each time a number of spectra are compared or used for model generation. This process should be computation-efficient, since it may have to be performed many times during search in a large database. While appropriate data fusion methods are a prerequisite for developing a broad database, the reusability of existing prediction models should be a core value in the databases creation. However, relatively little literature exists on the transfer of chemometric models based on Raman spectroscopy specifically. Hence, research activities should aim at developing the state-of-the art further and contribute to publications on this topic.
According to the typical use case for models described in the Model Transfer section, a model transfer will almost always be necessary prior to its application, which usually requires the primary dataset to be accessible for re-training. Therefore, the complete training data along with the pre-processing steps and fitting parameters should be stored with the model. In contrast, storage of only the (serialized) pre-fitted model object file will generally not permit model transfer. However, a pre-fitted, serialized model should also be stored to be routinely used once the transfer has been performed, because model transfer may be computationally expensive. It is therefore recommended that, wherever possible, data used in model building and transfer is maintained with all original information in a database to facilitate model reconstruction when required. Since Raman instrument calibrations are subject to steady drifts, the performance of chemometric models can be expected to deteriorate with time. Ryabchykov et al. advise instrument calibrations on a daily basis, 12 which, strictly speaking, would also apply to model transfers. However, model transfer should not be performed with secondary data consisting of very few, or even a single, spectrum, since this may introduce a bias. Therefore, a compromise must be found between frequent model transfer (update) and completeness of secondary transfer data. In any case, model transfer must be re-done when predicting from data recorded with significant time difference, that is, days or weeks, to the secondary data used for transfer, or if stored models become obsolete due to changes in the programming language or the serialization libraries (such as Pickle for Python). It is recommended that serialization of models is avoided in the Raman data structure, but this is likely to be unavoidable for storage of fitted models, such as PCA classifiers or neural networks.

Conclusion
The steps and considerations required to create, validate, and transfer a chemometric model using Raman spectra are outlined. With regard to harmonization, the pre-treatment step is of utmost importance as inaccurate calibration of the instrument removes any level of common ground to begin the harmonization. The systematic calibration of Raman instrumentation performed with a set of reference materials characterizes the instrument response and allows for digital calibration of subsequent acquired data. Second in significance is model validation and maintenance. The harmonization model needs to be long lived to make its adoption worthwhile. If laboratories need to adopt a new model regularly to maintain a harmonized standard, the benefit from being harmonized will not compensate its cost. However, pre-treatment and processing will also have a significant impact on the resulting model.
There already exists a large array of methods that can be used to harmonize Raman spectroscopic data, and more are being created by a very active chemometric community. Thus, it can be expected that a complete chemometric method tailored to build a robust and reliable system for Raman harmonization can be soon developed and should be adopted by the Raman community.

Supplemental Material
All supplemental material mentioned in the text is available in the online version of the journal.