Ground-based lidar processing and simulator framework for comparing models and observations (ALCF 1.0)

. Automatic lidars and ceilometers provide valuable information on cloud and aerosols, but have not been used sys-tematically in the evaluation of GCMs and NWP models. Obstacles associated with the diversity of instruments, a lack of standardisation of data products and open processing tools mean that the value of the large ALC networks worldwide is not be-ing realised. We discuss a tool, called the Automatic Lidar and Ceilometer Framework (ALCF), that overcomes these problems and also includes a ground-based lidar simulator, which calculates the radiative transfer of laser radiation, and allows one-to- 5 one comparison with models. Our ground-based lidar simulator is based on the Cloud Feedback Model Intercomparison Project (CFMIP) Observation Simulator Package (COSP) which has been used extensively for spaceborne lidar intercomparisons. The ALCF implements all steps needed to transform and calibrate raw ALC data and create simulated backscatter proﬁles for one-to-one comparison and complete statistical analysis of cloud. The framework supports multiple common commercial ALCs (Vaisala CL31, CL51, Lufft CHM 15k and Sigma Space MiniMPL), reanalyses (JRA-55, ERA5 and MERRA-2) and models 10 (AMPS and the Uniﬁed Model). To demonstrate its capabilities, we present case studies evaluating cloud in the supported reanalyses and models using CL31, CL51, CHM 15k and MiniMPL observations at three sites in New Zealand. We show that the reanalyses and models generally underestimate cloud fraction and overestimate cloud albedo, the common "too few too bright" problem. If sufﬁciently high temporal resolution model output is available (better than 6 hourly), a direct comparison of individual clouds is also possible. We demonstrate that the ALCF can be used as a generic evaluation tool to examine cloud occurrence and cloud properties in reanalyses, NWP models and GCMs, potentially utilising the large amounts of ALC data already available. This tool is likely to be particularly useful for the analysis and improvement of low-level cloud simulations which are not well monitored from space. This has previously been identiﬁed as a critical deﬁciency in contemporary models, limiting the accuracy of weather forecasts and future climate projections.


Introduction
Automatic lidars and ceilometers (ALC) are active ground-based instruments which emit laser pulses in the ultraviolet, visible or infrared (IR) part of the electromagnetic spectrum and measure radiation backscattered from atmospheric constituents such as cloud and fog liquid droplets and ice crystals, haze, aerosol (particulate matter, ash) and atmospheric gases (Emeis, 2010).
Vertical profiles of attenuated backscattered radiation can be produced by measuring received power as a function of time 5 elapsed between emitting the pulse and receiving the backscattered radiation. Quantities such as cloud base height (CBH) and a cloud mask (Pal et al., 1992;Wang and Sassen, 2001;Martucci et al., 2010;Costa-Surós et al., 2013;Van Tricht et al., 2014;Liu et al., 2015a, b;Lewis et al., 2016;Cromwell and Flynn, 2018;Silber et al., 2018), aerosol optical depth (Marenco et al., 1997;Welton et al., 2000Welton et al., , 2002Wiegner and Geiß, 2012;Wiegner et al., 2014;Jin et al., 2015;Dionisi et al., 2018) and boundary layer height (Eresmaa et al., 2006;Münkel et al., 2007;Emeis et al., 2009;Tsaknakis et al., 2011;Milroy et al., 2012; misrepresentation has a strong effect on other components of the model, limiting the ability to accurately represent past and present climate and predict future climate (Zadra et al., 2018). An improved understanding of clouds and cloud feedbacks is one of the focuses of the Coupled Model Intercomparison Project Phase 6 (CMIP6) (Eyring et al., 2016), and comparison of model cloud with observations is one of the key points of the Cloud Feedback Model Intercomparison Project (CFMIP) (Webb et al., 2017). Satellite observations make up the majority of the data used to evaluate model clouds. These include passive 5 visible and IR low earth orbit and geostationary radiometers measuring, among others, features such as cloud cover, cloud top height (CTH), cloud top temperature; passive microwave instruments measuring total column water; active radars and lidars measuring cloud vertical profiles. Ground-based remote sensing instruments include radars, lidars, ceilometers, radiometers and sky cameras. As pointed out by Williams and Bodas-Salcedo (2017), using a wide range of different observational datasets including satellite and ground-based for general circulation model (GCM) evaluation is important due to limitations of each 10 dataset.
Model cloud is commonly represented by the mixing ratio of liquid and ice and cloud fraction (CF) on every model grid cell and vertical level. In addition some models provide the cloud droplet effective radius used in radiative transfer calculations. Remote sensing observations do not match the representation of the atmospheric model fields directly because of their different resolutions, limited field of view (FOV) and attenuation by atmospheric constituents before reaching the instrument's 15 receiver. Instrument simulators bridge this gap by converting the model fields to quantities which emulate those measured by the instrument, which can then be compared directly with observations. One such collection of instrument simulators is the CFMIP Observation Simulator Package (COSP) (Bodas-Salcedo et al., 2011;Swales et al., 2018), which has been used for more than a decade for evaluation of models using satellite, and more recently ground-based, observations. The simulators in COSP include active instruments such as spaceborne and ground-based radars: Cloud Profiling Radar (CPR) on CloudSat 20 (Stephens et al., 2002), Ka-band ARM Zenith Radar (KAZR); lidars: Cloud-Aerosol Lidar Orthogonal Polarization (CALIOP) on CALIPSO (Winker et al., 2009), Cloud-Aerosol Transport System (CATS) on ISS (McGill et al., 2015), the Atmospheric Lidar (ATLID) on EarthCARE (Illingworth et al., 2015a); and spaceborne passive instruments: ISCCP (Rossow and Schiffer, 1991), MODIS (Parkinson, 2003) and MISR (Diner et al., 1998). The more recent addition of ground-based radar (Zhang et al., 2018) and lidar Bastin et al., 2018) opens up new possibilities to use the large amount of remote sensing 25 data obtained from ground-based active remote sensing instruments. In practice, ground-based observational remote sensing data are not straightforward to use without a substantial amount of additional processing. Some previous studies have also compared models and ground-based radar and lidar observations without the use of an instrument simulator (Bouniol et al., 2010;Hansen et al., 2018), though for the reasons identified above this is not advisable.
In this study we introduce a software package called the Automatic Lidar and Ceilometer Framework (ALCF) for evaluating 30 model cloud using ALC observations. It extends and integrates the COSP lidar simulator (Chiriaco et al., 2006;Chepfer et al., 2007Chepfer et al., , 2008 with pre-and post-processing steps, and allows the simulator to be run offline on model output, instead of having to be integrated inside the model. This makes it possible to compare ALC data at any location without having to run the model with a specific configuration. Multiple ALCs, reanalyses and model output formats are supported. The original COSP lidar simulator was extended with Rayleigh and Mie scattering at multiple lidar wavelengths. Observational ALC data from a number of common instruments can be processed by re-sampling to a common resolution, removing noise, detecting cloud and calculating statistics. The same steps can be performed on the simulated lidar data from the model (the output of running COSP on the model data), allowing for one-to-one comparison of model and observations. A particular focus of our work was on applying the same processing steps on the observed and simulated backscatter in order to avoid biases. The ALCF is made available under an open source license (MIT) at https://alcf-lidar.github.io, and as a permanent archive of code and technical 5 documentation on Zenodo at https://doi.org/10.5281/zenodo.3779518.
A relatively small amount of other open source code is available for ALC data processing. A lidar simulator has been developed as part of the Goddard Satellite Data Simulator Unit (G-SDSU) (G-S, 2019), a package based on the instrument simulator package SDSU (Masunaga et al., 2010). The Community Intercomparison Suite (CIS) (Watson-Parris et al., 2016) allows for subsetting, aggregation, co-location and plotting of mostly satellite data with a focus on model-observations intercomparison. 10 The STRAT lidar data processing tools are a collection of tools for conversion of raw ALC data, visualisation and feature classification (Morille et al., 2007).
Here, we provide an overview (Sect. 2) and describe the supported ALCs, reanalyses and models (Sect. 3). We present a set of case studies at three sites in New Zealand (NZ) (Sect. 6) to demonstrate the value of this new tool. Furthermore, we describe the lidar simulator (Sect. 4), observed and simulated lidar data processing steps (Sect. 5) and the results of the case studies 15 (Sect. 7).
2 Overview of operation of the Automatic Lidar and Ceilometer Framework (ALCF 1.0) The ALCF performs the necessary steps to simulate ALC backscatter based on 4-dimensional atmospheric fields from reanalyses, NWP models and GCMs, and to transform the observed raw ALC backscatter profiles to profiles comparable with the simulated profiles. It does so by extracting 2-dimensional (time × height) profiles from the model data, performing radiative 20 transfer calculations based on a modified COSP lidar simulator (Sect. 4), absolute calibration and resampling of observed backscatter to common resolution and performing comparable cloud detection on the simulated and observed backscatter. The framework supports multiple common ALCs (Sect. 3.1), reanalyses and models (Sect. 3.2). The schematic in Fig. 1 illustrates this process as well as the ALCF commands which perform the individual steps. The following commands are implemented: model, simulate, lidar, stats and plot. The commands are normally executed in a sequence, which is also implemented by a 25 meta-command auto, which is equivalent to executing a sequence of commands. The commands are described in detail in the technical documentation available online at https://alcf-lidar.github.io, on Zenodo at https://doi.org /10.5281/zenodo.3779518 and in the Supplementary information. The physical basis is described here.
The model command extracts 2-dimensional profiles of cloud liquid and ice content (and other thermodynamic fields) from the supported NWP model, GCM and reanalysis data (model data in Fig. 1) at a geographical point, along a ship track or a 30 flight path. The resulting profiles are recorded as NetCDF files. Sect. 3.2 describes the supported reanalyses and models. The model data can be either in one of the supported model output formats, or a new module for reading arbitrary model output can be written, providing that the required atmospheric fields are present in the model output. The required model fields are: per-level specific cloud liquid water content, specific cloud ice water content, cloud fraction, geopotential height, temperature, surface-level pressure and orography. No physical calculations are performed by this command. The atmospheric profiles are extracted by a nearest-neighbour selection.
The simulate command runs the lidar simulator described in Sect. 4 on the extracted model data (the output of the model command) and produces simulated backscatter profiles. This command runs the COSP-derived lidar simulator, which performs 5 radiative transfer calculations of the laser radiation through the atmosphere. The resulting simulated backscatter profiles are the output of this command.
The lidar command applies various processing algorithms on either the simulated backscatter data (the output of the simulate command) or the observed ALC data (lidar data in Fig. 1) (Sect. 5). The data are resampled to increase the signal-to-noise ratio (SNR), noise is subtracted, LR is calculated, a cloud mask is calculated by applying a cloud detection algorithm and CBH 10 is determined from the cloud mask. Absolute calibration (Sect. 5.2) can also be applied in this step by multiplying the observed backscatter by a calibration coefficient. This is important in order to obtain unbiased backscatter profiles comparable with the simulated backscatter profiles. Sect. 3.1 describes the supported instruments. The lidar data can be in one of the supported instrument formats. If the native instrument format is not NetCDF, it has to be converted from the native format with the auxiliary command convert or one of the conversion programs: cl2nc (Vaisala CL31, CL51), mpl2nc or SigmaMPL (Sigma Space 15 MiniMPL).
The stats step calculates summary statistics from the output of the lidar command. These include CF, cloud occurrence by height, backscatter histograms and the averages of LR and backscatter.
3 Supported input data: instruments, reanalyses and models

Instruments
The primary focus of the framework is to support common commercial ALCs. Ceilometers are considered the most basic type of lidar (Emeis, 2010;Kotthaus et al., 2016) intended as commercial products designed for unattended operation. They are 25 used routinely to measure CBH, but most instruments also provide the full vertical profiles of backscatter. Therefore, they are suitable for model evaluation by comparing not only CBH, but also cloud occurrence as a function of height. Their compact size and low cost make it possible to deploy a large number of these instruments in different locations, or use them in unusual settings such as mounted on ships (Klekociuk et al., 2019;Kuma et al., 2019). Common off-the-shelf ceilometers are the Lufft CHM 15k, Vaisala CL31 and CL51. Some lidars offer higher power and therefore higher SNR, and capabilities not present 30 in ceilometers such as dual polarisation, multiple wavelengths, Doppler shift measurement and Raman scattering. Below we describe ALCs supported by the framework and used in our case studies: Lufft CHM 15k, Vaisala CL31 and CL51 and Sigma Space MiniMPL. Table 1 lists selected parameters of the supported ALCs.
Lufft CHM 15k (previously Jenoptik CHM 15k) is a ceilometer operating at a wavelength of 1064 nm in the near IR spectrum.
The maximum range of the instrument is 15.4 km, vertical sampling resolution 5 m and sampling rate up to 2 s. The wavelength in the near IR spectrum ensures low molecular backscatter. The instrument produces NetCDF files containing uncalibrated attenuated backscatter profiles and various derived variables.
Vaisala CL31 and CL51 are ceilometers operating at a wavelength of 910 nm in the near IR spectrum. The maximum range 5 of CL31 and CL51 is 7.7 km and 15.4 km and the sampling rate is 2 and 6 s, respectively. The vertical resolution is 10 m.
The wavelength is characterised by relatively low molecular backscatter (but higher than 1064 nm) and is affected by water vapour absorption (Wiegner and Gasteiger, 2015;Wiegner et al., 2019), which can cause additional absorption of about 20% in the mid-latitudes and 50% in the tropics (see also Sect. 5.4). The instruments produce data files containing uncalibrated attenuated backscatter which can be converted to NetCDF (see cl2nc in the Code and data availability section). The firmware 10 configuration option "noise_h2 off" results in backscatter range correction to be selectively applied under a certain critical range and above this range only if cloud is present (Kotthaus et al., 2016, Sect. 3.2). This was the case with our case study dataset (Sect. 6). We apply range correction on the uncorrected range gates during lidar data processing. The critical range in CL51 is not documented, but was determined as 6000 m based on an observed discontinuity.
Sigma Space (part of Hexagon) Mini Micro Pulse Lidar (MiniMPL) (Spinhirne, 1993;Campbell et al., 2002;Flynn et al., 15 2007) is a dual-polarisation micro pulse lidar (meaning that it uses a high pulse repetition rate (PRF) and low pulse power) operating at a wavelength of 532 nm (green) in the visible spectrum. The maximum range of the instrument is 30 km. The vertical resolution is up to 5 m and sampling rate up to 1 s. The shorter wavelength is affected by stronger molecular backscatter than 910 nm and 1064 nm. The instrument can be housed in a protective enclosure, which makes it suitable for deployment in a relatively harsh environment. A scanning head can be mounted on the enclosure which provides configurable scanning by 20 elevation angle and azimuth. The instrument produces data files containing raw backscatter which can be converted to NetCDF containing normalised relative backscatter (NRB) with a vendor-provided tool SigmaMPL (see also mpl2nc in the Code and data availability section).

Reanalyses and models
Below we briefly describe reanalyses and models 1 used in the case studies presented here (Sect. 6). We used publicly available 25 output from three reanalyses and one NWP model. In addition, we performed nudged GCM simulations with high-temporal resolution output with the Unified Model (UM). Table 2 lists some of the main properties of the reanalyses and models.
The Antarctic Mesoscale Prediction System (AMPS) (Powers et al., 2003) is a limited-area NWP model based on the polar fifth-generation Pennsylvania State University-National Center for Atmospheric Research Mesoscale Model (Polar MM5), now known as the Polar Weather Research and Forecasting (WRF) model (Hines and Bromwich, 2008). The model serves 30 operational and scientific needs in Antarctica, but its largest grid also covers the South Island of NZ. AMPS forecasts are publicly available on the Earth System Grid (Williams et al., 2009). The forecasts are produced on several domains. The largest domain D01 used in the presented analysis covers NZ and has horizontal grid spacing of approximately 21 km over NZ. The model uses 60 vertical levels. The model output is available in 3-hourly intervals and initialised at 00:00 and 12:00 UTC. The

Lidar simulator
The COSP lidar simulator Active Remote Sensing Simulator (ACTSIM) was introduced by Chiriaco et al. (2006) for the purpose of deriving simulated CALIOP measurements (Chepfer et al., 2007(Chepfer et al., , 2008. The simulation is implemented by applying the lidar equation on model levels. Scattering by cloud particles and air molecules is calculated using the Mie and Rayleigh theory, respectively. CALIOP operates at a wavelength of 532 nm, and calculations in the original COSP simulator use this wavelength. We implemented a small set of changes to the lidar simulator to support a number of ALCs with different operating wavelengths. The lidar equation (Emeis, 2010) is based on the radiative transfer equation (Goody and Yung, 1995;Liou, 2002;Petty, 5 2006;Zdunkowski et al., 2007), which relates transmission of radiation to scattering, emission and absorption in media such as the atmosphere. The lidar equation assumes laser radiation passes through a horizontally homogeneous atmosphere where it is absorbed and scattered. A fraction of laser radiation is scattered back to the instrument and reaches the receiver. Scattering and absorption in the atmosphere is determined by its constituents -gases, liquid droplets, ice crystals and aerosol particles.
Atmospheric model output typically contains 4-dimensional fields of mass mixing ratios of liquid and ice and CF. The lidar 10 equation can be applied on these output fields to simulate the backscattered radiation received by the instrument. Table 4 lists physical quantities used in the following sections. Here, we use radiative transfer notation similar to Petty (2006) and the notation of the original lidar simulator (Chiriaco et al., 2006).

Rayleigh and Mie scattering
The Rayleigh volume backscattering coefficient β mol (m −1 sr −1 ) in ACTSIM is parametrised by the following equation (Eq.

15
(8) in Chiriaco et al. (2006)): where for lidar wavelength λ = 532 nm, C mol = 6.2446 × 10 −32 ; k B is the Boltzmann constant k B ≈ 1.38 × 10 −23 JK −1 , p is the atmospheric pressure and T is the atmospheric temperature. We multiply this equation by exp(4.09(log(532) − log(λ))) (where the value of λ is in nm) to get molecular backscatter for wavelengths other than 532 nm, which allows us to support 20 multiple commercially available instruments. The strength of molecular backscattering is usually lower than backscattering from clouds for the relevant wavelengths.
The lidar signal at visible or near IR wavelengths is scattered by atmospheric constituents of similar size in the Mie scattering regime (Mie, 1908). In the most simple approximation, one can assume spherical dielectric particles. The scattering from these particles depends on the relative size of the wavelength and the (spherical) particle radius r, expressed by the dimensionless 25 size parameter x: While the wavelength is approximately constant during the operation of the lidar 2 , the particle size comes from a distribution of sizes, typically approximated in NWP models and GCMs by a Gamma or log-normal distribution with a given mean and 2 The actual lidar wavelength is not constant and is characterised by a central wavelength and width. The central wavelength may fluctuate with temperature (Wiegner and Gasteiger, 2015). standard deviation. Some models provide the mean as effective radius r eff . If the effective radius is not provided by the model, the lidar simulator assumes a value r eff = 30 µm by default, which is the value assumed by the original ACTSIM simulator.
In order to support multiple laser wavelengths, it is necessary to calculate backscattering efficiency due to scattering by a distribution of particle sizes. We use the computer code MIEV developed by Warren J Wiscombe (Wiscombe, 1979(Wiscombe, , 1980 to calculate backscattering efficiency for a range of the size parameter x and integrate for a distribution of particle sizes. 5 The resulting pre-calculated LR (extinction-to-backscatter ratio) as a function of the effective radius is included in the lidar simulator for fast lookup during the simulation.
Cloud droplet and ice crystal size distribution parameters are an important assumption in lidar simulation due to the dependence of Mie scattering on the ratio of wavelength and particle size (the size parameter x). NWP models and GCMs traditionally use the effective radius r eff and effective standard deviation σ eff (or an equivalent parameter such as effective variance ν eff ) to 10 parametrise this distribution. Knowledge of the real distribution is likely highly uncertain due to a large variety of clouds occurring globally and the limited ability to predict microphysical cloud properties in models. In this section we introduce theoretical assumptions used in the lidar simulator based on established definitions of the effective radius and effective standard deviation and two common distributions. Edwards and Slingo (1996) discuss the effective radius in the context of model radiation schemes, and we will primarily follow the definitions detailed in Chang and Li (2001) and Petty and Huang (2011). 15 The practical result of this section (and the corresponding offline code) is pre-calculated backscatter-to-extinction ratios as a function of the effective radius in the form of a lookup table included in the lidar simulator, and used in the online calculations.
The offline code is provided and can be re-used for calculation of the necessary lookup tables for different lidar wavelengths, should the user of the code want to support another instrument.
The effective radius r eff and effective standard deviation σ eff are defined by: where n(r) is the probability density function (PDF) of the distribution. Here, we follow Petty and Huang (2011), who define the effective variance ν eff which relates to σ eff by ν eff = σ 2 eff /r 2 eff . Due to lack of knowledge about the real distribution of particle radii, it has to be modelled by a theoretical distribution, such as a log-normal or Gamma distribution. The original ACTSIM simulator assumes a log-normal distribution (Chiriaco et al., 2006) with the PDF: where µ and σ are the mean and the standard deviation of the corresponding normal distribution, respectively. Chiriaco et al. (2006) use the value of σ = log(1.2) = 0.18 "for ice clouds" (the value for liquid cloud does not appear to be documented).
In our parametrisation we used a combination of r eff and σ eff to constrain the theoretical distribution, where the effective standard deviation σ eff was assumed to be one fourth of the effective radius r eff . This choice is approximately consistent with 30 σ = log(1.2) = 0.18 at r eff = 20 µm (see Table 5, described below). In future updates, the values could be based on in situ studies of size distribution or taken from the atmospheric model output if available.
From the expression for the n-th moment of the log-normal distribution E[X n ] = exp(nµ + n 2 σ 2 2 ) and Equation (3) we calculate r eff and σ eff of the log-normal distribution: 5 We find µ and σ for given r eff and σ eff numerically by root-finding using the equations above. In practice, we find that the root-finding converges well for r eff between 5 and 50 µm, which is the range mostly likely to be applicable in practice.
The Gamma distribution follows the PDF: (see e.g. Eq. 13 in Petty and Huang (2011) or Eq. 1 in Bréon and Doutriaux-Boucher (2005)). In this case, the distribution 10 depends explicitly on r eff and σ eff , and as such does not require numerical root-finding. Figure 3a shows the log-normal and Gamma distributions calculated for a number of r eff and σ eff values, and Table 5 summarises properties of these distributions. The actual mean and standard deviation of the distributions do no necessarily correspond well with the effective radius and effective standard deviation.
In ACTSIM, the volume extinction coefficient α e is calculated by integrating the extinction by individual particles over the 15 particle size distribution: assuming approximately constant extinction efficiency Q e ≈ 2 (which is approximately true for the interesting range of r eff and laser wavelengths), and using the relationship between the cloud liquid (or ice) mass mixing ratio q and ∞ 0 r 2 n(r)dr: 20 where ρ and ρ air are the densities of liquid water (or ice) and air, respectively.
Likewise, the volume backscatter coefficient from particles β p is calculated by integrating backscattering by individual particles over the particle size distribution: where Q s is scattering efficiency and P π (π) is scattering phase function at 180 • . Since the normalisation of n(r) is not known until the online phase of calculation, the backscatter-to-extinction ratio from particles k p = β/α e can be calculated offline instead (the requirement for normalisation of n(r) is avoided by appearing in both the numerator and denominator): We pre-calculate this integral numerically for a permissible interval of r eff (5-50 µm) at 500 evenly spaced wavelengths, 5 and store the result as a lookup table for the online phase. The integral in the numerator is numerically hard to calculate due to strong dependency of P π (π) on r. Figure 3b shows LR as a function of r eff , calculated for log-normal and Gamma particle size distributions with σ eff = 0.25r eff and σ eff = 0.5r eff . This corresponds to the lookup table we use in the online phase of the lidar simulator. As can be seen in Fig. 3, LR depends only weakly on the choice of the distribution type and the effective standard deviation ratio.

Cloud overlap and cloud fraction
Model cloud is defined by the liquid and ice mass mixing ratio and cloud fraction in each atmospheric layer. The lidar simulator simulates radiation passing vertically at a random location within the grid cell. Therefore, it is necessary to generate a random vertical cloud overlap based on the cloud fraction in each layer, as the overlap is not defined explicitly in the model output. Two common methods of generating overlap are the random and maximum-random overlap (Geleyn and Hollingsworth, 1979). In 15 the random overlap method, each layer is either cloudy or clear with a probability given by CF, independent of other layers. The maximum-random overlap assumes that adjacent layers with non-zero CF are maximally overlapped, whereas layers separated by zero CF layers are randomly overlapped. COSP implements cloud overlap generation in the Subgrid Cloud Overlap Profile Sampler (SCOPS) (Klein and Jakob, 1999;Webb et al., 2001;Chepfer et al., 2008). The ALC lidar simulator uses SCOPS to generate 10 random subcolumns for each profile, using the maximum-random overlap assumption as the default setting of a 20 user-configurable option. The backscatter profile and cloud occurrence can be plotted for any subcolumn. Due to the random nature of the overlap, the backscatter profile may differ from the observed profile even if the model is correct in its cloud simulation. The random overlap generation should, however, result in unbiased cloud statistics.

Multiple scattering
Due to a finite FOV of the lidar receiver, a fraction of the laser radiation scattered forward will remain in FOV. Therefore, the 25 effective attenuation is smaller than calculated with the assumption that all but the backscattered radiation is removed from FOV and cannot reach the receiver. The forward scattering can be repeated multiple times before a fraction of the radiation is backscattered, eventually reaching the receiver. To account for this multiple scattering effect, the COSP lidar simulator uses a multiple scattering correction coefficient η, by which the volume scattering coefficient is multiplied before calculating the layer optical thickness (Chiriaco et al., 2006;Chepfer et al., 2007Chepfer et al., , 2008. The theoretical value of η is between 0 and 1 and depends 30 on the receiver FOV and optical properties of the cloud. For CALIOP at λ = 532 nm a value of 0.7 is used in the COSP lidar simulator. Hogan (2006) implemented fast approximate multiple scattering code. This code has recently been used by Hopkin et al. (2019) in their ceilometer calibration method. They noted that η is usually between 0.7 and 0.85 for wavelengths between 905 and 1064 nm. The ALC simulator presented here does not use an explicit calculation of η, but retains the value of η = 0.7.
This value is also used when calculating LR from the vertically integrated backscatter (Sect. 5.2). The code of Hogan (2006) "Multiscatter" is publicly available (http://www.met.reading.ac.uk/clouds/multiscatter/) and could be used in a later version of 5 the framework to improve the accuracy of simulated attenuation and calibration.
5 Lidar data processing Scheme in Fig. 1 outlines the processing done in the framework. The individual processing steps are described below.

Noise and subsampling
ALC signal reception is affected by a number of sources of noise such as sunlight and electronic noise (Kotthaus et al., 2016).
Range-independent noise can be removed by assuming that the attenuated volume backscatter coefficient at the highest range gate is dominated by noise. This is true if the highest range is not affected by clouds, aerosol, and if contributions from molecular scattering are negligible. The supported instruments have a range of approximately 8 (CL31), 15 (CL51, CHM 15k) and 30 km (MiniMPL). By assuming the distribution of noise at the highest level is approximately normal, the mean and standard deviation can be calculated from a sample over a period of time such as 5 minutes, which is short enough to assume 15 the noise is constant over this period, and long enough to achieve accurate estimates of the standard deviation. The mean and standard deviation can then be scaled by the square of the range to estimate the distribution of range independent noise at each range bin. By subtracting the noise mean from the measured backscatter we get the expected backscatter. The result of the noise removal algorithm is the expected backscatter and its standard deviation at each range bin.

Backscatter calibration 20
ALCs often report backscatter in arbitrary units (a.u.) or as NRB (MiniMPL). If they report it in units of m −1 sr −1 , these values are often not calibrated to represent the true absolute backscatter. Assuming that range-dependent corrections (overlap, dead time and afterpulse) have been applied on the backscatter in a. u., the reported backscatter is proportional to the true attenuated backscatter (inclusive of noise backscatter). In order to have a comparable quantity to the lidar simulator and consistent input to the subsequent processing (e.g. cloud detection), calibration is required. Several methods of calibration have been described 25 previously: calibration based on LR in fully attenuating liquid stratocumulus clouds (O'Connor et al., 2004;Hopkin et al., 2019), calibration based on molecular backscatter  and calibration based on a high spectral resolution lidar reference (Heese et al., 2010;Jin et al., 2015). In addition, calibration can be assisted by sunphotometer or radiosonde measurements .
A relatively large variability of the calibration coefficient has been determined for instruments of the same model (Hopkin  (Table 6), but a unit-specific coefficient should be determined for an analysed instrument during the lidar data processing step. 5 Calibration based on LR in fully opaque liquid stratocumulus clouds has been applied successfully on large networks of ALCs. It utilises the fact that given suitable conditions vertically integrated backscatter is proportional to LR of the cloud, which can be theoretically derived if the cloud droplet effective radius can be assumed. The theoretically derived value is about 18.8 sr for common ALC wavelengths and a relatively large range of effective radii (O'Connor et al., 2004). Another factor which needs to be known or assumed is the multiple scattering coefficient, which tends to be about 0.7-1.0 in common ALCs. 10 Due to its relatively simple requirements, this method is possibly the easiest ALC calibration method. The ALCF implements this calibration method by letting the user identify time periods with fully opaque liquid stratocumulus cloud, for which the mean LR is calculated. The ratio of the observed LR and the theoretical LR is equivalent to the calibration coefficient. This implementation, while very easy to perform, has multiple limitations, some of which are highlighted by Hopkin et al. (2019): 1. Aerosol can cause additional attenuation and scattering, which results in LR which is different from the theoretical value 15 by an unknown factor. Therefore, a frequent re-calibration may be necessary.
2. The multiple scattering coefficient assumption may not be accurate for the given instrument.
3. The 910 nm wavelength of CL31 and CL51 is affected by water vapour absorption which causes additional attenuation, which is currently not taken into account in the calculation of LR.
4. Near-range backscatter retrieval is affected by receiver saturation and incomplete overlap. Therefore, using stratocumulus 20 clouds above approximately 2 km for this calibration method is recommended. This range is instrument dependent.
5. The composition of the stratocumulus cloud may be uncertain. At temperature between 0 and -30 • C these clouds may contain both liquid and ice which results in a different LR than expected.
These limitations could be addressed in the future by (1)  hours of integration are required to identify the molecular backscatter . The molecular backscatter is attenuated by an unknown amount of aerosol with unknown LR, and the near-range backscatter is affected by a potentially inaccurate overlap correction. Therefore, this method alone produces calibration coefficient which depend on the atmospheric conditions. We found that all studied ALCs except for the CL31 are capable of observing the molecular backscatter (Section 7).
Therefore, this method may be used in addition to the liquid stratocumulus LR method for cross-validation of the calibration.

Cloud detection 5
Cloud is the most strongly attenuating feature in ALC backscatter measurements. Due to this attenuation, the lidar signal is quickly attenuated in thick cloud and can fall below the noise level before reaching the top of the cloud. This means that the first cloud base can be detected reliably (unless the cloud is too thin or too high and obscured by backscatter noise), while the cloud top or multi-layer cloud cannot be observed reliably under all conditions. The opposite is true for spaceborne lidars, which can detect the cloud top reliably but cannot always detect the cloud base. Therefore, ALC observations can be 10 regarded as complementary to spaceborne lidar observations. By applying a suitable algorithm, one can detect CBH, CTH and identify cloud layers. Instrument firmware often determines CBH and sometimes cloud layers as part of its internal processing, often using an undisclosed algorithm which is not comparable between different instruments and potentially not even different versions of the instrument firmware (Kotthaus et al., 2016). Mattis et al. (2016) compared a large number of ALCs and found differences of up to 70 m between the reported CBH, and others found relatively large differences as well (Liu et al., 2015b;15 Silber et al., 2018). Alternatively to the instrument reported CBH and cloud layers, it is possible to detect cloud based on the backscatter profile. A relatively large number of cloud detection algorithms have been proposed (Wang and Sassen, 2001;Morille et al., 2007;Martucci et al., 2010;Van Tricht et al., 2014;Silber et al., 2018;Cromwell and Flynn, 2019). We use a simple algorithm based on a backscatter threshold applied on the denoised attenuated backscatter, assuming that the noise can be represented by a normal distribution at the highest range, which is unlikely to contain cloud or aerosol if the instrument is 20 pointing vertically (this may not be true, however, for CL31 which has a maximum range of just 7.7 km). This assumption neglects the range-dependent molecular backscatter, which is relatively small at the ceilometer wavelengths examined (910 nm and 1064 nm). A cloud mask is determined positive where the backscatter is greater than a chosen threshold plus three standard deviations of noise at the given range. A threshold of 2 ×10 −6 m −1 sr −1 was found to be a good compromise between false detection and misses. This value is above the maximum molecular backscatter, which is approximately 1.54×10 −6 m −1 sr −1 25 at the surface in the case of the MiniMPL (wavelength 532 nm). Noise is not simulated by the lidar simulator, but the cloud detection algorithm allows for coupling of simulated and observed profiles, whereby the noise standard deviation is taken from the corresponding location in the observed profile. With 5 minute averaging, when the standard deviation of noise is relatively low, we found that the coupling does not make substantial differences to the detected cloud (not shown). While the thresholdbased algorithm is less sophisticated than other methods of cloud detection, the vertical resolution of the simulated backscatter 30 is likely too low and the vertical derivatives of the simulated backscatter too crudely represented (as can be seen in Fig. 5b-f, 6b-f, 7b-e) to apply any algorithm based on the vertical derivatives of backscatter. Using the same cloud detection algorithm on observed and simulated backscatter is essential for an unbiased one-to-one comparison of cloud.

Water vapour absorption
Previous studies have noted that ceilometers which utilise the wavelength of 910 nm such as the Vaisala CL31 and CL51 are affected by additional absorption of laser radiation by water vapour (Wiegner and Gasteiger, 2015;Wiegner et al., 2019;Hopkin et al., 2019). The wavelength coincides with water vapour absorption bands between 900 and 930 nm, while the other common ceilometer wavelength of 1064 nm is not affected. Wiegner and Gasteiger (2015) reported that it can cause absorption 5 of the order of 20% in the extratropics and 50% in the tropics. The lidar simulator does not currently account for this. However, as the water vapour concentration is available from the reanalyses and models, it should be possible to use a line-by-line model to calculate the water vapour volume absorption coefficient for each vertical layer during the integration process. Water vapour also affects calibration of observed backscatter. In order to use the liquid stratocumulus LR calibration method, the backscatter has to be corrected for water vapour absorption to achieve high accuracy of calibration.  Figure   20 2 shows the location of the sites and Table 3 summarises the case studies, which are also described in greater detail below.
The sites were chosen from available datasets to demonstrate the use of the framework with all supported instruments. Two of the sites also had co-located instruments: CL31 and MiniMPL in Lauder, and CHM 15k and MiniMPL in Christchurch. The Observations at the Christchurch site were performed at the University of Canterbury campus on the Ernest Rutherford building rooftop. Christchurch is located on the east coast of the South Island of NZ. Its climate is affected by the ocean, its proximity to the hilly area of the Banks Peninsula, the Canterbury Plains and föhn-type winds (Canterbury northwester) resulting from its position on the lee side of the Southern Alps. The city is affected by significant wintertime air pollution from domestic wood burning and transport. The orography of the city and the adjacent Canterbury Plains is very flat, making it prone to inversions. The Ernest Rutherford building is a 5 floor building situated in an urban area, surrounded by multiple buildings of similar height. We have analysed 24 days of co-located MiniMPL and CHM 15k observations performed in July   5 and August 2019. The MiniMPL was operated in a fixed vertical scanning mode (elevation angle 90 • ). The nudged run of the UM was only available up to year 2018. Therefore, it was not analysed for this site.

Case study results
To demonstrate the ways that the ALCF can be used we compared a total of 50 days of ALC observations with simulated lidar backscatter at three sites in NZ (Sect. 6). The observed backscatter was normalised to calibrated absolute range-corrected 10 total volume backscattering coefficient. The noise mean as determined at the furthest range was removed from the backscatter.
Cloud detection based on an attenuated absolute backscatter coefficient threshold of 2 ×10 −6 m −1 sr −2 and three noise standard deviations was applied to derive a cloud mask and CBH. We compare the statistical cloud occurrence as a function of height above the mean sea level (ASL) (Fig. 4) and individual backscatter profiles (selected profiles are shown in Fig. 5, 6 and 7) in this section. 15 We analysed 13 days of CL51 observations at the Cass field station in late winter. Due to the location of the station at a relatively high altitude in a varied terrain of the Southern Alps, the models with their relatively coarse horizontal grid resolution  Figure 4a shows that predominantly low cloud and precipitation between the ground and 2 km ASL in 25% of profiles was observed. Cloud between 3 and 12 km 25 ASL was observed about evenly in 2% of profiles. While the reanalyses and models were able to partially reproduce the peak of cloud occurrence near 1 km ASL, the peak they displayed is less vertically broad than observed, and in the UM the peak was much weaker than observed. The lack of precipitation simulation might also have contributed to this apparent difference between observed and simulated cloud. Above 3 km ASL, the reanalyses and models tended to overestimate cloud, with only ERA5 and JRA-55 simulating close to the observed cloud occurrence. The observed total CF was 62%. AMPS overestimated  Figure 4b shows that the CL31 observed relatively even cloud occurrence between the ground and 3 km ASL at 8%, falling off to about 3% between 4 and 8 km ASL (the maximum lidar range of CL31 is 7.7 km). The MiniMPL observed much weaker backscatter than CL31 below 3 km ASL, which was identified as an 10 overlap calibration issue in the MiniMPL. The MiniMPL observed substantial amounts of cloud above 8 km, not present in the CL31 observations due to its range limitation. Overall, the observed cloud occurrence had two peaks at ground to 3 km ASL and at about 9 km ASL. The simulated cloud occurrence was generally underestimated between the ground and 5 km ASL, with the exception of the UM which reproduced the lower half of the peak accurately, and ERA5 which reproduced the upper half of the peak accurately. Above 5 km ASL, the cloud occurrence was well reproduced in ERA5 and JRA-55, and strongly 15 overestimated in AMPS, MERRA-2 and the UM. The reanalyses and models also tended to have two peaks at about 2 km ASL and 11 km ASL, but these were quite different from the observed peaks, with the lower peak underestimated by about 5 pp in the reanalyses and models and the higher peak overestimated by about 5-10 pp. The total CF was observed as 44% and 58% by CL31 and MiniMPL, respectively. CF observed by the MiniMPL was likely higher due to its higher maximum lidar range (CL31 missed a substantial amounts of high cloud due to this limitation). The total CF was strongly underestimated by  Figure 4c shows that the co-located CHM 15k and MiniMPL observed a strong peak of cloud occurrence of 26% (CHM 15k) and 20% (MiniMPL) at about 500 m ASL. This was likely due to the combined precipitation 30 and fog as well as false detection of aerosol as cloud. The observed cloud occurrence had a local minimum of 2% at about 5 km ASL, a secondary peak of 5% at 7 km ASL, and fell off 0% at 11 km ASL. The CHM 15k and MiniMPL observations showed inconsistencies of up to 5 pp. The reanalyses and models strongly underestimated low cloud by about 10-20 pp. With the exception of AMPS, they underestimated mid-level cloud by about 5 pp and represented high cloud relatively accurately.
The total CF observed was 69% and 66% for the CHM 15k and MiniMPL, respectively, while the reanalyses and models strongly underestimated CF by up to 36 pp (JRA-55), with underestimates around 20 pp common. Figures 5,6,7 show images of backscatter for three separate days taken from the three case studies. The selected days represent some of the best-matching profiles and demonstrate how well the reanalyses and models can simulate cloud under favourable conditions. As can be seen in the figures, the UM performs the best in terms of temporal and height accuracy of the 5 simulated cloud (Fig. 5f, 6f), followed by ERA5 (Fig. 5c, 6c, 7c). This is likely due to the high output temporal resolution of the UM and ERA5 of 20 min. and 1 h, respectively. The UM and ERA5 were able to represent the relatively fine structure of cloud and to a lesser extent the optical thickness (inferred from the strength of backscattering) of the cloud. Deficiencies, however, are readily identifiable. The low cloud in the UM (Fig. 5f) appears to be shifted by several hours relative to observations (Fig.   5a) and the high cloud has a greater vertical extent in the UM. Likewise, the altocumulus cloud observed in Fig. 6a is shifted 10 by several hours in the UM (Fig. 6f). The stratocumulus and nimbostratus cloud, identified visually based on the backscatter profiles, in ERA5 (Fig. 5c) is markedly lower than observed (Fig. 5a), as well as optically thicker than in reality. The mid-level cloud in ERA5 (Fig. 6c) was located about 2 km higher than observed (Fig. 6a). Precipitation observed in Fig. 7a towards the end of the analysed period was not present in the ERA5 simulated profile (Fig. 7c) due to lack of precipitation simulation in the current lidar simulator (even though rain and show specific content is available from the reanalysis). MERRA-2 was the third- 15 best performing reanalysis in terms of cloud representation accuracy. The reanalysis managed to capture the overall structure of clouds ( Fig. 5c, g, k), but substantial discrepancies were present, some of which were likely due to the relatively low temporal resolution of the reanalysis (3 h). AMPS was identified as the fourth-best in terms of cloud representation accuracy despite the highest spatial resolution of the model. Fig. 5h shows that while the overall structure of the cloud is represented in the reanalysis, it was not able to represent any of the finer details. This is likely partially due to the temporal resolution of the 20 model output of just 3 h (AMPS has, however, relatively high horizontal grid resolution of 21 km). A visual comparison of AMPS and MERRA-2 with observations suggests that the the MERRA-2 simulations are more accurate than AMPS despite the fact that the horizontal grid resolution of AMPS is much finer than MERRA-2. This demonstrates that other factors in the model than resolution have stronger influence on the quality of cloud simulation. JRA-55 was identified as the last in terms of cloud representation accuracy. JRA-55 has the lowest temporal resolution of the studied reanalyses and models of just 6 h, as 25 well as the lowest horizontal grid resolution of 139 km. Therefore, it cannot be expected to capture any fine details of cloud.
In the presented profiles (Fig. 5d, l) one can see that the cloud is only crudely represented. JRA-55 was able to represent the stratocumulus cloud of Fig. 5a, although its temporal extent and optical thickness were overestimated. The mid-level cloud of Fig. 5i was relatively well-represented in terms of height and optical thickness (Fig. 5l), given the low temporal resolution of the reanalysis. We stress that a direct backscatter profile intercomparison is highly dependent on the temporal resolution of The comparison between observations, reanalyses and models can be summarised by considering that the radiative effect of clouds is the product of cloud occurrence/fraction and their albedo. CF can be derived from ALC observations by determining the cloud mask and identifying cloudy profiles which contain detected cloud. Cloud albedo cannot be directly observed with ALCs due to the unknown backscatter-to-extinction ratio of the cloud layers, which depends on the cloud phase, the effective radius and the shape of ice crystals. However, we can use LR, calculated from the vertically integrated backscatter, as a proxy for cloud albedo. Clouds which are not fully opaque result in LR which is greater than the expected theoretical LR based on the Mie theory (Fig. 3). The inverse of LR (extinction-to-backscatter ratio) is therefore approximately proportional to the albedo of the cloud. Figure 8a, b shows an absolute and relative (respectively) scatter plot of the average cloud fraction and the 5 average of the inverse of LR calculated from cloudy profiles. As already shown in Fig. 4, the reanalyses and models generally underestimate CF. However, Fig. 8 shows that this is compensated by overestimated albedo (higher LR −1 indicates higher cloud albedo), which is higher in all reanalyses and models than the corresponding ALC observations. Further examination shows that the UM and ERA5 show the closest proximity to the corresponding observations in terms of the average CF and LR −1 . The Lauder CL31 and MiniMPL also show a stark difference in CF, with the CL31 detecting much smaller CF due to 10 its maximum vertical range limitation of 7.7 km, which causes a significant fraction of high cloud to be omitted. The results shown in Fig. 8 suggest that the "too few too bright" model cloud problem identified in previous studies (Nam et al., 2012;Klein et al., 2013;Wall et al., 2017;Kuma et al., 2019) is also observed in the current case studies, whereby the simulated CF is too low, but its exaggerated albedo means that the top of atmosphere (TOA) shortwave radiative balance can be approximately correct.  (Table 6). The molecular backscattering in the boundary layer is, however, superimposed on backscattering by aerosol and cloud. In the case of the MiniMPL observations at the Christchurch site (Fig. 9i), the molecular backscatter streak has multiple secondary streaks.
These are caused by different levels of attenuation by cloud and aerosol during the period of the observations. These secondary 25 streaks were also partially reproduced by the simulator (Fig. 9j). A smaller portion of the width of the streak is also caused by fluctuations of atmospheric temperature and pressure. Under suitable conditions, the molecular backscatter can be used for absolute calibration of an instrument. With the exception of CL31 (Fig. 9c), the molecular backscatter can be identified in the observed backscatter in each case. Therefore, it is possible to choose a calibration coefficient such that the observed and simulated molecular backscatter overlap. This can be considered a viable alternative to the liquid stratocumulus LR cal-30 ibration method, or as a means of cross-validating the instrument calibration. However, it should be noted that the accuracy of this method is affected by an unknown amount of aerosol attenuation. Cloudy profiles can be filtered when calculating the histogram, and therefore the effect of cloud attenuation can be minimised. In addition to the molecular backscatter streak, there is a zero-centred streak visible in the histograms. This is caused by noise when the signal is fully attenuated by cloud. Lastly, a zero-centred "cone" of noise is visible in the observed backscatter, increasing with the square of range. The size of this cone is particularly large in the case of the CL31 (Fig. 9c), which is most likely the result of its low receiver sensitivity and low power compared to the other instruments. The standard deviation of the cone at the furthest range is used to determine the noise standard deviation used by the cloud detection algorithm (Sect. 5.3). Figure 10 shows the same information as Fig. 9, but for clear sky profiles only. Here, it can be seen that the zero-centred peak caused by the complete attenuation by cloud is no longer present. There is a clear overlap between the centre of the noise cone 5 with the simulated molecular backscatter; i.e. the noise cone is centred at the observed molecular backscatter. This is visible with all instruments including CL31 (Fig. 10c), where the overlap between the observed and simulated molecular backscatter is most clearly visible at about 1 km ASL. Below 1 km ASL, the effect of boundary layer aerosol distorts the molecular backscatter by an unknown quantity. The clear sky histograms as shown in Fig. 9 may therefore be preferable to the all-sky histograms of Fig. 9 for calibration by fitting the molecular backscatter. Note that Figure 10i does not show a good match 10 with the simulated molecular backscatter (Fig. 10j) likely due to miscalibration of the MiniMPL in the lower troposphere, which causes spurious cloud to be detected. The majority of profiles were therefore filtered out as cloudy. The dead time, afterpulse and overlap MiniMPL calibration supplied by the vendor appears to be deficient and causes range-dependent bias in the backscatter profile.
We now examine the noise in each instrument using the ALCF. Figure 11 shows the distribution of standard deviation of 15 backscatter noise determined at the highest observable range of each instrument and range-scaled to 7 km. It can be seen that the CL31 is affected by the greatest amount of noise, peaking at about 2×10 −6 m −1 sr −1 . This is at the threshold of cloud detection of 2×10 −6 m −1 sr −1 . Therefore, thin cloud may be obscured by noise at higher ranges with this instrument. The MiniMPL, operating in the visible spectral range, shows a bimodal distribution of backscatter noise depending on sunlight.
During daytime, it peaks at about 0.7×10 −6 m −1 sr −1 , which is the second highest of the analysed instruments. During night- 8 Discussion and conclusions 25 We presented the Automatic Lidar and Ceilometer Framework, which combines lidar processing and lidar simulation for the purpose of model evaluation. The lidar simulation is based on the COSP spaceborne lidar simulator by accounting for the different geometry and lidar wavelength. New lookup tables for Mie scattering were calculated for a number of ALC wavelengths and noise removal and cloud detection algorithms were implemented. The framework supports the most common ALCs and reanalyses. We demonstrated the use of the framework on ALC observations at three different sites in New Zealand, 30 and applied the lidar simulator to three reanalyses and two models. We found that while some reanalyses and models such as the UM and ERA5 show relatively good correspondence with observed cloud, others performed relatively poorly. All reanalyses and models underestimated the total CF by up to 38 pp, with underestimation by 20 pp common. In some cases, the observed and simulated backscatter profiles matched relatively closely in terms of time and altitude, and a better match was observed with reanalyses with high output temporal resolution such as the UM and ERA5, while reanalyses with low temporal resolution didn't allow for reliable direct (non-statistical) comparison of cloud. However, it is clear that more factors than the horizontal and vertical resolution influence the cloud simulation accuracy; especially the cloud, boundary layer and convection schemes employed by the atmospheric model. The reanalysis and model output temporal, horizontal grid resolution 5 and vertical resolution are not always the same as the internal resolution of the underlying atmospheric model. Both have an impact on the comparison between simulated and observed backscatter and cloud. While the output resolution should not have an impact on the long-term statistics, it can be a limiting factor for direct backscatter profile comparison. We demonstrated that the ALCF could be used to identify substantial differences in cloud backscatter which were present in all reanalyses and models. We showed that all the studied instruments except for the CL31 are capable of detecting molecular backscatter and 10 that this can be used for calibration or for cross-validation of other calibration methods. We found that the nighttime MiniMPL was subject to the least amount of noise of all the instrument examined, followed by the CL51, CHM 15k, daytime MiniMPL and CL31. Noise in the MiniMPL was shown to have a bimodal distribution due to day/nighttime. The ALCF can therefore be useful for testing the quality of collected data.
Currently the framework has several limitations which should be addressed in the future. The water vapour absorption at 15 910 nm likely affects the instrument calibration of the CL31 and CL51 ceilometers and limits the accuracy of the one-to-one comparison, even though due to the relatively high backscatter caused by cloud, the calculated cloud masks are unlikely to be strongly affected. The lidar simulator currently does not simulate backscattering from precipitation. Observed precipitation is generally detected as "cloud" by the cloud detection algorithm, while the simulated profile contains no backscatter at the location of precipitation (backscattering and attenuation by rain drops and snow should be implemented in the lidar simulator 20 in the future). If desired, the backscatter profiles affected by precipitation can be excluded before the comparison or their fraction determined by visually inspecting the observed backscatter to assess their possible effect on the statistical results. In addition, backscattering from ice crystals is currently treated in the same way as backscattering from liquid droplets, i.e. the same effective radius of spheres is assumed and the particle concentration is calculated from the model ice mass mixing ratio in the atmosphere layer. In reality, the non-spherical shape of ice crystals, the different typical effective radius and refractive 25 index of ice result in different backscattering from ice crystals versus liquid droplets for the same mass mixing ratio. The ice mass mixing ratio reported by models is usually much lower than the liquid mass mixing ratio, therefore the simulated backscatter from cold cloud tends to be much lower than from warm cloud. The ALCs also suffer from various measurement deficiencies. Notably incomplete overlap, dead time and afterpulse corrections tend to give sub-optimal results at the near range. It is possible to use semi-automated methods to correct for these deficiencies, such as by calculating the integrated 30 backscatter distribution by height of the maximum backscatter and correcting for the range-dependent bias (Hopkin et al., 2019, Sect. 5.1). This method could be implemented in the framework to enable range-dependent calibration of the observed backscatter.
The presented framework streamlines lidar data processing and tasks related to lidar simulation and model comparison.
The framework was recently used by Kuma et al. (2019) for Southern Ocean model cloud evaluation in the GA7.1 model and MERRA-2 reanalysis. Considering the existing extensive ALC networks worldwide there is a wealth of global data. We therefore think that ALCs should have a greater role in model evaluation. Satellite observations have long been established in this respect due to their availability, spatial and temporal coverage and their well-developed derived products and tools. ALCs, with their diverse formats and decentralised nature, have so far lacked derived products and tools which would make them more accessible for model evaluation. We hope that this software will enable more model evaluation studies based on ALC 5 observations. Development of lidar data processing is currently hampered by closed development of code. We note that code has very rarely been made available with past ALC studies. Continued improvement of publicly available code for lidar data processing is needed to achieve faster development of ground-based remote sensing and make it more attractive for GCM, NWP model and reanalysis evaluation.
Code and data availability.   (Hunter, 2007), netCDF4 (Rew and Davis, 1990) and Astropy (Price-Whelan et al., 2018) and the Python programming language (Rossum, 1995) which we used in the implementation of our code; the R programming language (R Core Team, 2017), the Natural Earth dataset (https://www.naturalearthdata.com) and the Shuttle Radar Topography Mission (SRTM) Version 3 Global 1 arc second digital elevation model (Werner, 2001; NASA JPL, 2013) which we used to produce a map of sites; GitHub which provided free hosting of our code; and the Linux-based (Torvalds, 1997) operating systems Devuan GNU+Linux and Debian GNU/Linux on which we produced this analysis.
The observed backscatter was normalised to absolute units and denoised. The vertical levels were interpolated based on the nearest neighbour assumption. The cloud detection threshold of 2×10 −6 m −1 sr −1 corresponds the low end of the colour scale, i.e. all visible backscatter in the plots was determined as cloud, unless the difference from the threshold was smaller than three noise standard deviations. The first subcolumn of 10 columns generated by the Subgrid Cloud Overlap Profile Sampler (SCOPS) was selected to make the plots. The red line signifies the station altitude.   -0.5, 0.5] for CHM 15k, [-1, 1] for CL31 and CL51 and [-2, 2] ×10 −6 m −1 sr −1 for MiniMPL). The simulated backscatter is based on the ERA5 atmospheric fields. Visible in the plots is backscatter caused by molecular backscatter (the main "streak"), noise when signal is fully attenuated by cloud (the zero-centred "streak"), and the range-dependent noise (the zero-centred "cone"). The molecular backscatter is marked by a red dashed line on the observed backscatter plots, the shape of which is taken from the simulated molecular backscatter for the corresponding instrument and site.