!BOI

\fancyhf{}
\fancyhead[R]{}
\fancyhead[L]{
\includegraphics[scale=0.8]{Figs/nasa.png}
}

!  !TITLE: The GEOS Hybrid Ensemble-Variational \\ Atmospheric Data Assimilation System \\ Version 1.01

!  !AUTHORS: Ricardo Todling and Amal El Akkraoui

!  !AFFILIATION: Global Modeling and Assimilation Office, NASA/GSFC, Greenbelt, MD 20771

!  !DATE: Aug 2014

!  !INTRODUCTION: Package Overview 

\topmargin      -1.2in

\setcounter{secnumdepth}{5}
\setlength{\parskip}{0.5em}
This document describes the implementation and usage of the Goddard 
Earth Observing System (GEOS) Hybrid Ensemble-Variational Atmospheric Data Assimilation 
System (Hybrid-EVADAS). This is a document largely aimed at explaining the particulars
of the GMAO implementation and its various options. Only a brief summary of the 
present state of science related to the hybrid EVADAS is given here with 
the intention to record progress. 

{\it Remark: This is a live document being upgraded as the source code changes.}

%....................................................................


\section{Hybrid-Ensemble Data Assimilation}
\label{sec:HEnDA}
%           ---------------------

%%%%%%%%%%%%%
\subsection{Introduction}
%%%%%%%%%%%%%

The basic idea of hybrid variational data assimilation is to use an ensemble of background 
fields to introduce instantaneous, flow-dependent, features to the traditionally
non-evolving (climatological) background error covariance. In 3DVar this can be done by augmenting
the control vector with an extra set of variables, usually referred to as the alpha-control
variables. The cost function of a hybrid incremental 3DVar system can be written as
\begin{equation}
 J(\delta {\bf z}) = \frac{1}{2}  \delta {\bf z}^T 
                                  \left[  \beta_c {\bf B}_c + \beta_e {\bf B}_e \right]^{-1}
                                  \delta {\bf z} + 
                     \frac{1}{2} ( {\bf d} - {\bf H} \delta {\bf z} )^T {\bf R}^{-1} 
                                 ( {\bf d} - {\bf H} \delta {\bf z} ) \, ,
\label{eq:HybCost}
\end{equation}
where the control variable $\delta {\bf z}$ is a combined contribution from the solution $\delta {\bf x}$ 
of the standard variational problem and a component that comes from an $M$-member ensemble, that is,
\begin{equation}
  \delta {\bf z} = \beta_c \delta {\bf x} +  \beta_e \sum_{m=1}^{M} {\bf \alpha}_m \circ \delta {\bf x}^e_m \, .
  \label{eq:HybInc}
\end{equation}
Here, the symbol $\circ$ stands for the Hadamard-Schur (element-wise) product of two vectors, 
${\bf \alpha}_m$ is the $m$-th control vector related to the $m$-th ensemble member, 
and $\delta {\bf x}_m^e = ({\bf x}_m-\bar{{\bf x}})/\sqrt{M-1}$ is the $m$-th 
ensemble perturbation created from the $m$-th member background state ${\bf x}_m$, with respect to the ensemble
mean $\bar{{\bf x}}$. In (\ref{eq:HybCost}),
the matrices ${\bf B}_c$ and ${\bf B}_e$ stand for the climatological and ensemble background error covariances, respectively; 
the last term is the usual observation-fit term involving the observation error covariance matrix ${\bf R}$,
and the observation residual vector ${\bf d} = {\bf y} - {\bf h}({\bf x}^g)$ created from differencing the observation
vector ${\bf y}$ with the projection of the first-guess state-vector ${\bf x}^g$ onto observation space by the 
observation operator ${\bf h}$, whose linearization is represented by the matrix ${\bf H}$. 
The parameters $\beta_c$ and $\beta_e$ 
specify the interplay between the climatological and the ensemble background error covariances, respectively. 
The problem is reset to its traditional 3DVar configuration, with solution $\delta {\bf x}$, when $\beta_c =1$ and 
$\beta_e = 0$.  Details of the hybrid variational problem can be found in Hamill and Snyder (2000), Lorenc (2003) 
and Wang et al. (2007).  

In the GMAO atmospheric DAS the variational problem of minimizing (\ref{eq:HybCost}) is solved using the Gridpoint 
Statistical Interpolation (GSI; Kleist et al. 2009a) analysis and the preconditioning strategy of Derber and Rosati (1989).
The climatological background error covariance matrix is implemented as a series of recursive filters producing nearly Gaussian 
and isotropic correlation functions following Wu et al. (2002), and tuned from GEOS forecasts (Wei Gu contribution;
see Rienecker et al. 2008). Satellite radiances are processed using the Community Radiative Transfer
Model (CRTM; Kleespies et al. 2004) and the online variational bias-correction procedure of Derber and Wu (1998).
A normal-mode-based balance constraint term following Kleist et al. (2009b) is applied to the climatological part of the increment
as well as to the ensemble part of the increment whenever the hybrid analysis is used.

The ensemble hybrid-capable version of GEOS ADAS relies on the GEOS global atmospheric general circulation model (AGCM), 
developed at Goddard as multiple upgrades to GEOS-5 AGCM. The AGCM contains a version of the 
finite-volume hydrostatic hydrodynamics of Lin (2004), and a recent upgrade to its more advanced cubed-sphere 
hydrodynamics version (S.-J. Lin and W. M. Putman, personal communication). The GEOS AGCM 
is built under the infrastructure of the Earth System Modeling Framework (ESMF; Collins et al. 2005) and couples together 
various physics packages including a modified version of the Relaxed Arakawa-Schubert convective parameterization
scheme of Moorthi and Suarez (1992), the catchment-based hydrological model of Koster et al. (2000),
the multi-layer snow model of Stieglitz et al. (2001), and the radiative transfer model of 
Chou and Suarez (1999). Furthermore, the AGCM is accompanied by two versions of adjoint models (ADMs), 
each capable of handling either the finite-volume hydrodynamics (Giering et al. 2005; Errico et al. 2007), 
and the newer cubed-sphere hydrodynamics (Jong Kim et al., personal communication); each of these having their own 
simplified-physics package, with the latter including a simplified convective parameterization (Dan Holdaway et al.,
personal communication).

\begin{figure}[ht]
\begin{center}
\includegraphics[trim=40 40 10 0,clip,height=0.25\paperheight,width=0.45\textwidth]{Figs/iau_schematic.pdf}
\includegraphics[trim=40 40 10 0,clip,height=0.25\paperheight,width=0.45\textwidth]{Figs/iau_enshyb_schematic.pdf}
\end{center}
\captionsetup{margin=10pt,font=small,labelfont=bf}
\caption{Schematic of standard IAU (left) and IAU as implemented in GEOS hybrid ensemble-variational atmospheric 
data assimilation system (right).
 \label{fig:IAUschematics}}
\end{figure}

Assimilation in GEOS ADAS is performed using the incremental analysis update (IAU) procedure of Bloom et al. (1996).  
A schematic representation of standard IAU appears in the left panel of Fig. \ref{fig:IAUschematics}. 
Considering for example the availability of observations around 00 UTC and of AGCM background fields, the 
GSI analysis (purple boxes) produces an increment that,
following the IAU procedure, is converted into a tendency that is used to force a 6-hour (corrector)
model integration (red triangles); this is followed by a 6-hour (predictor) integration period when the model is then set 
to run free from the analysis forcing as to produce backgrounds (green, upside-down, triangles) for the 
next assimilation cycle; the prediction period can be extended beyond 6-hours to complete, say, a 5-day forecast 
(horizontal orange-dashed lines). The cycle of running GSI and the AGCM, over and over, takes place whether GEOS ADAS 
is performing its traditional 3DVar procedure or its hybrid extension. The only difference between these two options is that in the
latter case, an ensemble of background fields is required for GSI to internally augment its background error covariance
information. Throughout the present document the cycle just described is referred to as the {\it central} ADAS. It is envisioned to
run at a higher resolution than the ensemble ADAS (see below).

The hybrid data assimilation system involves not only the minimization of the cost function (\ref{eq:HybCost}), but also generation
of an ensemble of background fields to make up the ensemble background error covariance ${\bf B}_e$. In turn, the ensemble of 
backgrounds requires an ensemble of ``initial conditions'' (analyses) to be available.  At least three options exist within
GEOS Ensemble Atmospheric DAS to generate an ensemble of analyses. The standard option follows Whitaker et al. (2008) and 
relies on the ensemble Kalman filter (EnKF) software of J. S. Whitaker, from NOAA/ESRL. This is the same software presently 
used in the NCEP operational global data assimilation system. Alternatively, one can generate an 
ensemble of GSI analyses, but this is considerably more computationally demanding than using the EnKF since it involves a complete 
variational analysis for each member of the ensemble. Lastly, a simplified ensemble generation procedure, referred to as 
the Filter-free Ensemble Scheme, is also available. In this procedure, ensemble members are created by simply
inflating the central (hybrid) analysis with NMC-like perturbations. 
Regardless of the ensemble of analyses scheme, once analyses are available, a corresponding set of background fields is 
generated through IAU-based AGCM integrations (similar to those of the central ADAS). This IAU-based ensemble procedure 
is illustrated in the right panel of Fig. \ref{fig:IAUschematics}. Similarly to the central ADAS, once observations and,
now an ensemble of backgrounds are available, one of the ensemble analysis options (EnAna; right-placed, purple boxes) 
generates an ensemble of analyses, which can then be turned into an ensemble of tendencies to initialize the ensemble
of AGCM integrations --- forced during the first 6 hours (light-red triangles), and unforced during the 6-hour background 
prediction period (light-green, upside-down triangles).

As depicted in the right panel of Fig.\ref{fig:IAUschematics} (blue boxes), in addition to the ensemble analysis procedure
(e.g., the EnKF),
the GEOS ensemble ADAS embeds a procedure to re-center and inflate the ensemble of analyses. Re-centering is done in order 
to align the ensemble of analyses with the central GSI analysis and avoid the possibility of divergence of the ensemble. The chance
for divergence is real given the finite (and usually rather small) size of the ensemble. Additive inflation is applied to
compensate for lack of better ways to represent model error (see Hamill and Whitaker 2005, Whitaker et al. 2008, and Charron et
al. 2010). It should be noted that in case of the EnKF, an attempt is made to account for sampling error by applying
a multiplicative inflating factor to the analyses themselves (e.g., Whitaker et al. 2008), but this is internal to the EnKF software. 
In GEOS, a single program is responsible for both re-centering and additive inflation
(see  Secs. \ref{subsec:AddPerts} and \ref{subsec:Recenter}).
 
The following sub-sections give a brief summary of the state of science of GEOS hybrid EVADAS, at the time of its first 
release to GMAO Operations and setup of the first ``parallel'' experiment. 
All illustrations that follow involve experiments with either single analysis or fully cycled ADAS. The hybrid analyses are
produced at 0.5-degrees resolution with 72 model levels and, when applicable, a 32-member, 1-degree 72-level ensemble is used.
All AGCM integrations carry along climatological aerosols through the Goddard Global Ozone Chemistry Aerosol Radiation 
and Transport (GOCART; Colarco et al. 2010).  Readers already familiar with the state of science are encouraged to skip to the next section.

%%%%%%%%%%%%%
\subsection{Brief Summary of Latest Scientific Results}
%%%%%%%%%%%%%
 

%%%%%%%%%%%%%
\subsubsection{About the ensemble itself}
%%%%%%%%%%%%%
 
 \begin{figure}[ht]
 \begin{center}
 %\includegraphics[scale=1.2]{Figs/Rel1/rec_inf.pdf}
 \includegraphics[scale=0.22]{Figs/Rel1/hy05f_anainc.png}
 \includegraphics[scale=0.22]{Figs/Rel1/hy05f_rec.png} \\
 \includegraphics[scale=0.22]{Figs/Rel1/hy05f_inf.png}
 \includegraphics[scale=0.22]{Figs/Rel1/hy05f_recinf.png}
 \end{center}
 \captionsetup{margin=10pt,font=small,labelfont=bf}
 \caption{Illustration of contribution from each step taking place after the EnKF ensemble of analyses are generated. The panels show
 500 hPa temperature: 
 analysis increment for a given ensemble member (top left); effect of re-centering this given member about the central GSI
 analysis (top right); effect of applying additive inflation to the member analysis with a coefficient of 0.25 (bottom left); 
 and resulting increment after both re-centering and additive inflation are applied (bottom right).
  \label{fig:MEMinfrec}}
 \end{figure}

 Re-centering and inflating are such key components of the hybrid implementation that we start illustrating the scientific results  
 from our experiments by showing how much these operations contribute to the ensemble.
 Figure \ref{fig:MEMinfrec} shows the effect of re-centering and additive inflation when applied to a given member
 of an existing EnKF analysis increment. Each panel in the figure shows the individual contribution to the increment: EnKF only (top left);
 EnKF plus re-centering (top right); and EnKF plus additive inflation (bottom left). The final increment when both re-centering 
 and inflation have been applied to the EnKF increment appears in the bottom right panel. 
 If not careful, either one of these two operations might do more than the EnKF itself.  When the underlying
 ensemble assimilation system is not well tuned, it is possible that increments due to re-centering become so large as to wipe out 
 the EnKF increments; similarly, when additive inflation is too large, it can easily overwhelm increments from the EnKF. An adequate
 balance between tuning the EnKF to have a mean state that is reasonably close to the central analysis and the magnitude of additive
 inflation must be reached to obtain an effective total increment for each member of the ensemble.
 
 As the ensemble cycles, this interplay of re-centering and inflating must result in an ensemble with reasonable spread. Figure
 \ref{fig:TropoEnsSpread} illustrates the time evolution of the global (largely tropospheric) spread of a 32-member ensemble for typical 
 experiments performed with GEOS Hybrid EVADAS. Two ensemble analysis strategies are investigated. The panel on the left uses the EnKF for 
 its ensemble analysis and shows how the initial spread (blue curve) changes as the members evolve within the 9-hour background 
 period (green, red, and black for the 3-, 6- and 9-hour backgrounds respectively). The resulting hybrid ADAS performs 
 rather well (see below), even when there is not much error growth within the 9-hour background period --- note the green, red and 
 black curves are very close to each other; however, the growth of error is consistent within the same period, with the smallest 
 error seen in the 3-hr background and the largest in the 9-hour background. 
 When an ensemble of analyses is created by simply inflating the central analysis and completely bypassing the EnKF, a procedure we
 call Filter-free Ensemble, the panel on the right shows the initial spread to be zero (by construction; blue curve), and 
 considerable error growth to take place within the 9-hour background period. Though the spread within the 9-hour forecast is now
 largest than when the EnKF is used to generate the ensemble, their corresponding hybrid performance is rather comparable (see below).  
 
 \begin{figure}[ht]
 \begin{center}
 \includegraphics[scale=0.18]{Figs/Rel1/hy005_10days_apr_twe_spread.png}
 \includegraphics[scale=0.18]{Figs/Rel1/hyA05_10days_apr_twe_spread.png}
 \end{center}
 \captionsetup{margin=10pt,font=small,labelfont=bf}
 \caption{Global spread of a 32-member ensemble measured in total energy units (J/kg; see Sec. \ref{subsec:ensspread}); when EnKF is 
 used to generate ensemble (left), and when filter-free ensemble scheme is used instead (right). The curves are for: analysis spread 
 before re-centering and inflation (blue); 3-, 6- and 9-hour backgrounds (green, red, and black respectively). Totals exclude levels 
 roughly above 10 hPa.
  \label{fig:TropoEnsSpread}}
 \end{figure}
 
 %%%%%%%%%%%%%
 \subsubsection{Non-cycling hybrid analysis}
 %%%%%%%%%%%%%
 
 When an ensemble of backgrounds is used in a hybrid (central) GSI analysis, one of the first things to examine is how 
 the analysis increment changes with respect to its non-hybrid counterpart. Figure \ref{fig:StaticIncrement} provides an 
 illustration for the change in analysis increment, measured in total energy units, for an analysis calculated at a single synoptic 
 time using: (i) a regular 3DVar GSI, with only the climatological background error 
 covariance (left); (ii) a 3DVar GSI with a background error covariance matrix that is fully determined from a 32-member 
 ensemble (center); and (iii) a 3DVar hybrid GSI when 50\% of background error covariance matrix comes from the ensemble and the remaining 
 50\% comes from its regular climatological background error covariance matrix (right).  The ensemble-only case (center) shows considerably more activity 
 in the tropics than when compared with the climatological-only case (left); the resulting hybrid (right) increment shows slight but noticeable
 energy increase in the mid-tropospheric and low-stratospheric levels --- a little less energy seems to be present along the Southern
 tropospheric jet in the ensemble (center) when compared with the climatological case (left), with the resulting hybrid retaining 
 the energy in this region (right).
 
 \begin{figure}[ht]
 \begin{center}
   \includegraphics[trim=0 0 0 0,clip,height=0.23\paperheight,width=0.32\textwidth]{Figs/Rel1/hy005_twe_0_zene_log_vnorm_sta_0601_00z.png}
   \includegraphics[trim=0 0 0 0,clip,height=0.23\paperheight,width=0.32\textwidth]{Figs/Rel1/hy005_twe_0_zene_log_vnorm_ens_0601_00z.png}
   \includegraphics[trim=0 0 0 0,clip,height=0.23\paperheight,width=0.32\textwidth]{Figs/Rel1/hy005_twe_0_zene_log_vnorm_hyb_0601_00z.png}
 \end{center}
 \captionsetup{margin=10pt,font=small,labelfont=bf}
 \caption{Zonal mean analysis increment, in total wet energy (J/kg) norm, using a standard 3DVar (left), a 3DVar when 
 the background error covariances are fully determined by the ensemble (center), and a hybrid 3DVar when the covariances 
 are a 50\% weighted sum of the climatological- and ensemble-derived background error covariances (right). 
  \label{fig:StaticIncrement}}
 \end{figure}
 
Another aspect of relevance when introducing hybrid analyses as replacements for regular 3DVar analyses relates to how
balance gets affected. In its 3DVar configuration, GSI has the capability of applying a tangent linear normal mode 
constraint (TLNMC) to the increment (see Kleist et al. 2009b). The constraint can be applied to either part of the increment (essentially
either of the two terms in eq. \ref{eq:HybInc}, or both; see Kleist 2012). Figure \ref{fig:BalanceEval} shows two illustrations of 
the result of balancing the increment in various configurations of GSI. The panel on the left shows the total cost function 
during the iterations of the GSI minimization when using: traditional 3DVar without TLNMC (black curve); 
traditional 3DVar with TLNMC (red curve); hybrid 3DVar with TLNMC applied only to the climatological part of increment (green); and 
hybrid 3DVar when TLNMC is applied to the full increment. The behavior is typical of when adding constraints to the analysis, that is,
with balance, the cost settles a little higher than when no constraint is applied. The hybrid minimization tends to reduce
the cost when compared to the climatological-balanced configuration; particularly noticeable in the first outer minimization (first 
100 iterations; compare green and blue curves with red curve, respectively). This is indication that the hybrid minimization 
recovers the fit to the observations somewhat deteriorated when the constraint is added to traditional 3DVar. 

\begin{figure}[ht]
\begin{center}
  \includegraphics[trim=0 0 0 0,clip,height=0.23\paperheight,width=0.45\textwidth]{Figs/Rel1/balcheck_cost_20120407_00z.png}
  \includegraphics[trim=0 0 0 0,clip,height=0.23\paperheight,width=0.45\textwidth]{Figs/Rel1/balcheck_incmassdivspc_20120407_00z.png}
\end{center}
\captionsetup{margin=10pt,font=small,labelfont=bf}
\caption{The panel on the left shows the total cost function as it changes during the iterations of the GSI minimization; all 
cases are calculated for the same synoptic time but GSI is configured as follows: climatological (non-hybrid) 3DVar without balance 
constraint (black curve); (non-hybrid) 3DVar with TLNMC balance constraint (red curve); hybrid 3DVar without balance constraint applied 
to hybrid part of increment (green curve); and hybrid 3DVar with balance constraint applied to full increment (blue curve). 
The panel on the right shows the integrated mass-wind divergence spectra of the analysis
increment as a function of wave number for the same four configurations; color scheme of curves is as in panel on the left.
\label{fig:BalanceEval}}
\end{figure}

The real measure of improved balance is displayed in the right panel of Fig. \ref{fig:BalanceEval} where 
the spectra of the vertically integrated mass-wind divergence 
increment is shown for the same four configurations. The color scheme is preserved, and the curves show
clearly that TLNMC brings in considerable improvement in balance when applied to traditional 3DVar (compare black and red curves).
It is also clear from the figure that applying TLNMC only to the climatological part of the increment when hybrid 3DVar is used is
rather troublesome (green curve). This is natural since there is nothing to guarantee the ensemble contribution to the increment,
through its background error covariance matrix ${\bf B}_e$, to be balanced in any way; TLNMC must be applied to the 
full increment (blue curve) for balance to be acceptable in a hybrid configuration. However, this latter case 
is not completely perfect since some power in the spectrum still remains for large wave numbers which would best be reduced. 
As pointed out by Kleist (2012; see Figure 4.2 on page 108, in that work), this is a consequence of the dual-resolution aspect 
of the hybrid analysis and some aliasing of the winds; in the example shown here, the ensemble is generated at half the 
resolution of the climatological background error covariance (that is, a 1-degree ensemble, for a 0.5-degree analysis).
It is possible to use scale-dependent weights to reduce some of the aliasing issue (see Kleist 2012, Fig. 4.4, in that work).
In the case of GEOS hybrid ADAS, the default is to apply TLNMC to the full increment.

 %%%%%%%%%%%%%
 \subsubsection{Cycling hybrid analysis}
 %%%%%%%%%%%%%
 
 Examination of a series of hybrid GSI analyses indicates that, at least in GEOS, hybridizing the background error covariance 
 matrix seems to result in slightly better behaved norm of gradient reduction than when using traditional 3DVar. That is, the hybrid 
 minimizations seeming better conditioned than purely-climatological ones.
 It has been typical of GEOS analyses in its 5.7 series to display abnormal loss of orthogonality of the gradients during
 the double conjugate minimization procedure of GSI (e.g., see El Akkraoui et al. 2013, for proper behavior of gradient norms,
 and references therein). Figure \ref{fig:GradNorms} shows a comparison of convergence behavior for four cases (different
 synoptic times). The control (non-hybrid; red) analyses have some trouble converging past iteration 220, or so, 
 during the second outer-loop, and even during the first outer loop in some instances (see case in the bottom right panel). 
 In contrast, the hybrid (cycling) analyses covering the same period show rather acceptable convergence behavior (green curves). 
 Though the figure shows only a few cases to illustrate the matter, we find numerous examples when hybrid analyses have no 
 problem converging while their traditional 3DVar non-hybrid counterparts have trouble.
 
 \begin{figure}[ht]
 \begin{center}
   \includegraphics[trim=0 0 0 0,clip,height=0.15\paperheight,width=0.4\textwidth]{Figs/Rel1/ana_grad_20120403_06z.png}
   \includegraphics[trim=0 0 0 0,clip,height=0.15\paperheight,width=0.4\textwidth]{Figs/Rel1/ana_grad_20120409_12z.png}
   
   \includegraphics[trim=0 0 0 0,clip,height=0.15\paperheight,width=0.4\textwidth]{Figs/Rel1/ana_grad_20120412_06z.png}
   \includegraphics[trim=0 0 0 0,clip,height=0.15\paperheight,width=0.4\textwidth]{Figs/Rel1/ana_grad_20120416_06z.png}
 \end{center}
\captionsetup{margin=10pt,font=small,labelfont=bf}
\caption{Reduction of ratio of the gradient norm to its initial value for a control (non-hybrid) experiment (red) and a 
hybrid experiment (green). The four cases correspond to analysis at 06 UTC on 3 April 2012 (top left), 12 UTC on 9 April 2012 
(top right), 06 UTC on 12 April 2012 (bottom left), and 06 UTC on 16 April 2012 (bottom right).  
\label{fig:GradNorms}}
\end{figure}
 
%%%%%%%%%%%%%
\subsubsection{Evaluation with respect to observations}
%%%%%%%%%%%%%
 
Evaluation of results from hybrid ADAS experiments, involves familiar tools and diagnostics: observation-minus-analysis (OMA),
observation-minus-background (OMB) and observation-minus-forecast (OMF) residual statistics, monthly mean comparison with 
corresponding means from other numerical weather prediction (NWP) centers, and forecast skills scores. Additionally, 
ensemble-related diagnostics can be 
used to monitor the performance of the ensemble itself. These include monthly-mean of the ensemble mean analyses and/or 
backgrounds, OMA, OMB and OMF residual statistics for the mean and ensemble members, and also time evolution of
ensemble spread. Rank histograms (of say, OMB residuals) are sometimes used, though we have found these to be rather difficult
to interpret given the uncertainties associated with the observation and the ensemble itself (see Hamill 2001), therefore
we refrain from discussing them here.

\begin{figure}[ht]
\begin{center}
  {\includegraphics[trim=80 0 180 0,clip,height=0.15\paperheight,width=0.25\textwidth]{Figs/Rel1/hyA05_omf0_uwndraob_meancmp_nh.pdf}}
  {\includegraphics[trim=80 0 180 0,clip,height=0.15\paperheight,width=0.25\textwidth]{Figs/Rel1/hyA05_omf0_uwndraob_meancmp_tropics.pdf}}
  {\includegraphics[trim=80 0 180 0,clip,height=0.15\paperheight,width=0.25\textwidth]{Figs/Rel1/hyA05_omf0_uwndraob_meancmp_sh.pdf}} \\
  {\includegraphics[trim=80 0 180 0,clip,height=0.15\paperheight,width=0.25\textwidth]{Figs/Rel1/hyA05_omf0_vtmpraob_meancmp_nh.pdf}}
  {\includegraphics[trim=80 0 180 0,clip,height=0.15\paperheight,width=0.25\textwidth]{Figs/Rel1/hyA05_omf0_vtmpraob_meancmp_tropics.pdf}}
  {\includegraphics[trim=80 0 180 0,clip,height=0.15\paperheight,width=0.25\textwidth]{Figs/Rel1/hyA05_omf0_vtmpraob_meancmp_sh.pdf}} 
\captionsetup{margin=10pt,font=small,labelfont=bf}
\caption{Regionally-averaged, monthly mean of radiosonde OMB residuals of zonal wind (top) and temperature (bottom)
for three experiments: control (blue), EnKF-based hybrid (red), and filter-free hybrid (green), shown for: Northern Hemisphere 
(left column), tropics (center column), and Southern Hemisphere (right column).
\label{fig:OMBbias}}
\end{center}
\end{figure}
 
The remaining illustrations in this section summarize results and comparisons from three experiments covering the month of April 2012. 
The abbreviations and brief explanation of each experiment follows: 
\begin{itemize}
 \item Control (CTL): traditional 3DVar, similar to what is used by GMAO Operations, though experiments here are at 0.5-degree resolution.

 \item Hybrid (HY5): Hybrid ADAS using  50\% climatological and 50\% ensemble background error covariance contributions, with an ensemble ADAS 
                     generating analyses with the EnKF. The ensemble is generated at half the resolution of the hybrid analysis, that is,
                     at 1 degree.

 \item Hybrid (HYA): similar to HY5, but using the filter-free ensemble scheme, when the EnKF is bypassed and an ensemble of analyses is 
                     generated by inflating the central analysis with scaled NMC-like perturbations. As above, the ensemble is at
                     1 degree.
\end{itemize}
 
Figure \ref{fig:OMBbias} shows vertical profiles of monthly averaged zonal wind (top) and temperature (bottom) radiosonde OMB 
residuals over three regions of the globe, namely, Northern Hemisphere (NH; left), tropics (center), and Southern Hemisphere 
(SH; right).  Two hybrid experiments, one using the EnKF (HY5, red) and another using the filter-free scheme (HYA, green),
are compared to the traditional 3DVar control experiment (CTL, blue).  The only noticeable differences are in the tropics and 
SH for zonal winds, where the hybrid experiments show reduced biases with respect to the control, and the EnKF and simplified 
(filter-free) scheme being rather comparable. Results for temperature remain rather neutral. 
Examination of standard deviation of the OMB residuals for both 
winds and temperature indicate negligible differences among all three experiments (not shown).
 
It is also possible to examine the impact of observations on the analysis following Todling (2013). This is an
observation-space approach that uses the inverse of the observation error variances to define a measure for evaluating 
the contribution of various  observing systems to the cycling assimilation. Fig. \ref{fig:ObImpThreeExps} displays 
impact results for the three experiments under consideration: control (black), EnKF-based hybrid (cyan), and filter-free-based 
hybrid (magenta). All three experiments show  aircraft, radiosondes, and Aqua AIRS to be the dominating observing 
systems in GEOS ADAS, regardless of the underlying analysis procedure. These observing systems tend to display smaller impact 
when the cycling analysis is based on a hybrid approach as compared to traditional 3DVar --- the hybrid strategies seem to rely slightly
more on these observing systems than does traditional 3DVar.

\begin{figure}[ht]
\begin{center}
  \rotatebox{-1}{\includegraphics[trim=0 0 0 0,clip,scale=0.5,angle=+1]{Figs/Rel1/hyA05_imp_nogps.png}}
\captionsetup{margin=10pt,font=small,labelfont=bf}
\caption{Observation impact on the analysis for three 3DVar experiments: control, non-hybrid (black bars); hybrid using EnKF (cyan bars); and hybrid using simplified, filter-free  approach (magenta bars). In addition to the observation types 
shown, all experiments use GPS radio occultation, but results are not shown here due to a little glitch in the output files saving
their corresponding information (basically, GPS impacts are of the magnitude of those of radiosondes, and are comparable among 
the difference analysis approaches).
 \label{fig:ObImpThreeExps}}
\end{center}
\end{figure}
 
Figure \ref{fig:OMFtwentyfourSTDV} shows vertical profiles of standard deviations, calculated over the month of April 2012, for 
zonal wind radiosonde OMF residuals of the 24 hour forecasts. Though rather small, the benefit of using a hybrid assimilation strategy shows in both the tropics and 
Southern Hemisphere. Again here, the difference between the EnKF-based system and that using the filter-free configuration is
very small, with some advantage shown with the latter in the SH.
 
\begin{figure}[ht]
\begin{center}
 \rotatebox{-1}{\includegraphics[trim=0 0 0 0,clip,height=0.13\paperheight,width=0.30\textwidth,angle=-89]{Figs/Rel1/hyA05_omf24_uwndraob_rmscmp_nh}}
 \rotatebox{-1}{\includegraphics[trim=0 0 0 0,clip,height=0.13\paperheight,width=0.30\textwidth,angle=-89]{Figs/Rel1/hyA05_omf24_uwndraob_rmscmp_tropics}}
 \rotatebox{-1}{\includegraphics[trim=0 0 0 0,clip,height=0.13\paperheight,width=0.30\textwidth,angle=-89]{Figs/Rel1/hyA05_omf24_uwndraob_rmscmp_sh}}
\captionsetup{margin=10pt,font=small,labelfont=bf}
\caption{Similar to Fig. \ref{fig:OMBbias}, but for standard deviation. Only zonal winds are shown since temperature
have neutral results.
 \label{fig:OMFtwentyfourSTDV}}
\end{center}
\end{figure}

 %%%%%%%%%%%%%
 \subsubsection{Evaluation with respect to independent analysis}
 %%%%%%%%%%%%%

GMAO routinely compares its monthly mean analyses with those from other NWP centers. 
Figure \ref{fig:Umonthlymean} shows the differences of the April 2012 zonally-averaged 
zonal wind from our experiments with ECMWF operational analysis. Panels in the figure are differences for  
the control (CTL, top left), the filter-free hybrid scheme (HYA, top right), and the EnKF-based hybrid (HY5, bottom left).
When comparing with the control difference, both hybrid procedures reduce the difference with ECMWF considerably, 
especially in the tropics. The bottom-right panel shows the monthly mean of the ensemble mean EnKF analysis (from HY5) difference
with ECMWF analysis.  Comparison of this result with, say, that in the bottom-left panel, serves to illustrate the behavior and
reliability of the underlying EnKF ensemble analyses, though in the presence of re-centering it serves mainly as a sanity 
check to show that inflation averages away.

 \begin{figure}[ht]
 \begin{center}
 \includegraphics[trim=20 20 20 0,clip,height=0.3\paperheight,width=0.9\textwidth]{Figs/Rel1/monthly_mean.pdf}
 \end{center}
 \captionsetup{margin=10pt,font=small,labelfont=bf}
 \caption{April 2012 monthly mean of zonally-averaged zonal wind analysis differences with ECMWF operational analysis from
 four different ADAS scenarios:  control, traditional 3DVar (top left); filter-free-based hybrid 3Dvar (top right); 
 EnKF-based hybrid 3DVar (bottom left); and EnKF ensemble mean (bottom right). 
  \label{fig:Umonthlymean}}
 \end{figure}
 
 %%%%%%%%%%%%%
 \subsubsection{Evaluation with respect to self analysis}
 %%%%%%%%%%%%%
 
Lastly, we show some results when comparing forecasts from each of the three experiments with their own respective analyses.
Figure \ref{fig:UF24RMSvsAna} displays the zonally-averaged wind RMS error of the 24 hour forecast, as a function of pressure, and 
for three regions of interest. Results are for the three experiments under consideration: control (blue), and the two EnKF (HY5, red) 
and filter-free (HYA, green) hybrid strategies. Both hybrid strategies yield the same improvement in 
RMS in the Northern and Southern Hemispheres, but result in some deterioration in Tropical mid-troposphere, with the filter-free
procedure being less damaging than the EnKF. This behavior is somewhat contradictory to that seen when examining both the 
monthly mean analyses and mean OMB radiosonde residuals.  This remains an issue to tackle in future release of GEOS Hybrid ADAS;
see Sec. \ref{sec:FutureRel}.
 
 \begin{figure}[ht]
 \begin{center}
    \includegraphics[trim=40 0 50 15,clip,height=0.2\paperheight,width=0.9\textwidth]{Figs/Rel1/hyA05_3way_stats_uwnd_rmscmp_24_z_APR.png}
 \end{center}
 \captionsetup{margin=10pt,font=small,labelfont=bf}
 \caption{Twenty-four hour forecast RMS error, with respect to self-analysis, of regionally-averaged zonal winds for the three 
experiments under consideration: control (blue), EnKF-based hybrid (red), and filter-free hybrid (green); Northern Hemisphere 
(left), tropics (center), and Southern Hemisphere (right).
  \label{fig:UF24RMSvsAna}}
 \end{figure}
 
In many ways, the ultimate measure of success boils down to the 500 hPa geopotential height anomaly correlations. Self-analysis 
evaluation results appear in Fig. \ref{fig:AnoCorr} for 5-day forecasts in both Northern (top-right) and Southern Hemisphere
(top-left).  Curves for the control experiment are in blue, those for the EnKF-based hybrid are in red, and those for the
filter-free strategy are in green. The corresponding statistical significance curves appear at the bottom panels. 
The NH scores are pretty much neutral, but those in the SH are significantly beneficial
(bottom-left shows red and green curves outside and above significance boxes). Both hybrid strategies bring comparable 
and non-negligible improvements up to 5 days in their forecasts.

 \begin{figure}[ht]
 \begin{center}
    \includegraphics[trim=0 0 0 0,height=0.3\paperheight,width=0.45\textwidth]{Figs/Rel1/hyA05_3way_stats_hght_corcmp_NHE_500_APR.png}
    \includegraphics[trim=0 0 0 0,height=0.3\paperheight,width=0.45\textwidth]{Figs/Rel1/hyA05_3way_stats_hght_corcmp_SHE_500_APR.png}
 \end{center}
 \captionsetup{margin=10pt,font=small,labelfont=bf}
 \caption{Anomaly correlation of the 500 hPa height of 5-day forecasts (top) verified with respect to own analysis, and shown for 
Northern (left) and Southern (right) Hemispheres for the three experiments under consideration: the control (blue), 
EnKF-based hybrid (red), and filter-free hybrid (green). Significance plots appear beneath anomaly correlations with significance
boxes color according to experiment designation; results are statistically significant when curve appear outside, and above, 
corresponding box.
  \label{fig:AnoCorr}}
 \end{figure}
 
This concludes our brief description of the present state of the science in this release of the GEOS Hybrid ADAS. 
As further progress is made and other results become available this part of the document will be updated accordingly.  

 %%%%%%%%%%%%%
 \subsection{Default Configuration of Initial Release}
 %%%%%%%%%%%%%
 
The more we test with the approximate ensemble scheme, when the EnKF is bypassed and an ensemble of analyses is created by
simply adding scaled NMC-like perturbations to the central analysis, the more we seem convinced it provides an equally 
reliable strategy for hybrid data assimilation. However, this initial release of the GMAO hybrid ensemble-variational 
capability is done ``by-the-book''. That is, we are releasing a system configured in a way that is consistent with what NCEP,
and other centers, are presently doing. At the time of this writing, a fully consistent implementation of the filter-free scheme  
can only be done for the so-called lat-lon version of the AGCM, when its initial conditions can be regridded from high to low resolution 
and allow creating the initial conditions needed for a dual-resolution formulation of the simplified scheme. When the AGCM uses its 
cubed-sphere version of the hydrodynamics, other approximations are required in order to implement the free-filter scheme 
(used to obtain the results discussed above). To avoid these extra approximations, the authors prefer not to release the
filter-free scheme as the default configuration of GEOS hybrid until the so-called cube-to-cube regrid utility
is fully capable of handling initial conditions of the cubed-based AGCM. When the system is properly tooled, we will work with
the GMAO science board and re-consider the default choice of ensemble generation scheme in the assimilation system.

The remaining part of this document gives an overall idea of how GEOS ADAS implements its hybrid-variational strategy, 
giving special attention to how the atmospheric ensemble is created for each assimilation cycle. The document tries to
serve as a User Guide providing specific details about setting up experiments, scripts, and controlling environment variables.

%....................................................................
\section{Overall Design}
%        --------------
\label{sec:Design}

\subsection{General Description}

This section presumes the reader has some familiarity with GEOS ADAS and its running mechanism. This is not meant to
be an overview for how to run the GMAO data assimilation system. For that we refer readers to the GMAO intranet
website\footnote{{\it https://gmao.gsfc.nasa.gov/intranet/personnel/rtodling/dasdev/GEOSDAS-UserGuide.htm}}.

A schematic of the hybrid ADAS implementation appears in Fig. \ref{fig:GMAOhybSchematic}. It shows two
data assimilation systems running parallel to each other; two grey-shaded blocks. The top one corresponds to
the traditional GMAO, IAU-based, 3DVar (also left panel of Fig. \ref{fig:IAUschematics}); in each cycle, given observations 
and background fields, the GSI 3DVar generates an analysis increment taken in by the GEOS AGCM as an IAU-forcing term
during its integration to produce the next cycle background fields. The bottom grey-shaded block corresponds to the
ensemble ADAS which takes in observations and now an ensemble of background fields
to calculate an ensemble of observation-minus-background (OMB) residuals which are fed into the ensemble analysis 
procedure (e.g., EnKF), this in turn generates 
an ensemble of IAU-forcing terms to be used in the ensemble of initialized GEOS AGCM model integrations to generate another batch
of ensemble of backgrounds for the next cycle. These two systems can run completely independently from each 
other. The top, deterministic system is what is presently run by GMAO operations (and developers). In its {\it hybrid} configuration, 
in addition to observations and a set of background fields, the so-called central ADAS analysis also requires an ensemble of
background fields to form the required ensemble background error covariance matrix. This ensemble of backgrounds is 
provided by the ensemble ADAS (bottom grey-shaded block). This one-way feedback, when the ensemble ADAS feeds into
the hybrid (central) ADAS, is represented in the figure by the red bracket and the red, upward-pointing, arrow. 
A two-way feedback is introduced when the ensemble of analyses generated by the ensemble ADAS is re-centered around 
the hybrid (central) GSI analysis; this is represented in the figure by the red, downward-pointing, arrow going 
from the GSI analysis box to the red rectangle in the ensemble ADAS. Removal of the two-way feedback decouples 
the two ADAS systems, that is, when the red-colored shapes are absent, the figure illustrates
two decoupled deterministic and ensemble assimilation systems working parallel to each other. 
Our present implementation has the two-way feedback in its default configuration.

\begin{figure}[ht]
\begin{center}
\includegraphics[trim=0 0 0 0,clip,scale=0.5]{Figs/geoshybscheme.pdf}
\end{center}
\captionsetup{margin=10pt,font=small,labelfont=bf}
\caption{Schematic of the GMAO Hybrid Ensemble-Variational Data Assimilation System. The two grey-shaded blocks
represent the central (variational) ADAS (top) and the ensemble ADAS (bottom). Both are IAU-based systems whose
analyses require observations (Obs) and background fields (Bkg) and AGCM integrations required initial conditions (ICs)
and boundary conditions (BCs). See text for complete description of figure.}
\label{fig:GMAOhybSchematic}
\end{figure}

%Most of the description that follows concentrates in describing the GEOS ensemble
%ADAS since the mechanism for experimenting with the central ADAS remains essentially unchanged.
%Specifically, assuming observations and an ensemble set of bacgrounds and model initial conditions to be available
%to allow processing a 6-hour cycle, the ensemble ADAS involves three main steps: (i) the ensemble observer 
%combines unperturbed observations with the ensemble of backgrounds to calculate an ensemble of observation-minus-background 
%residuals; (ii) the ensemble of backgrounds and residuals is fed into the EnKF to produce an updated ensemble
%of analyses which, after re-centering and additive inflation, serve to create an ensemble of
%increments valid at a given synoptic time; lastly,
%(iii) an ensemble of 12-hour model forecasts is launched when the model is forced with the analysis 
%increments (turned tendencies) during the first 6 hours of the IAU-based forecasts
%(the assimilation, or correction, period), and a final 6-hour unforced integration produces
%a new ensemble of background fields for the next cycle.
%This rough description hides the complexity of each step, what follows gives 
%a step-by-step picture of the whole ensemble ADAS procedure.

The implementation of the ensemble ADAS capability within GEOS ADAS is rather non-intrusive in the sense that only
very minimal changes have been made to existing scripts and procedures. Indeed, only the main job script driving GEOS
ADAS ({\tt g5das.j}) and the so-called {\tt analyzer} have been changed. In these scripts, only a handful 
of statements have been added to hook-up the ensemble ADAS. The ensemble ADAS cycle is controlled independently by 
a job script named {\tt atm\_ens.j} -- Fig. \ref{fig:EADASflowchart} shows a flowchart of the procedures in the ensemble ADAS and
a step-by-step description is given later in this section. As mentioned above, the only difference between the hybrid ADAS and 
the traditional ADAS is that the former needs to see the ensemble of background fields forming part of its background error 
covariance matrix. This corresponds to a very minor change to the {\tt analyzer} that simply needs to know that a hybrid analysis is 
running and the location where to find the ensemble members; the root of this information is set in the main ADAS driving script, 
{\tt g5das.j}. The average user needs to be aware of only the required settings in the latter script.
Specifically, recent versions of GEOS ADAS have the following three environment variables added to {\tt g5das.j}:
\begin{quote}
  setenv HYBRIDGSI     /dev/null \\
  setenv STAGE4HYBGSI  /dev/null \\
  setenv RSTSTAGE4AENS /dev/null
\end{quote}
The usual, default settings, are such as to keep the ADAS running in non-hybrid mode ({\tt /dev/null} tells the scripts
not to bother with hybrid features). When running a hybrid experiment, the user is required to provide the location of 
the ensemble members, the location where to stage the resulting central analysis so the ensemble ADAS can 
be re-centered, and the location of the central ADAS initial conditions. For example, the following are typical settings 
when running in hybrid mode:
\begin{quote}
  setenv HYBRIDGSI     \$FVHOME/atmens \\
  setenv STAGE4HYBGSI  \$HYBRIDGSI/central \\  
  setenv RSTSTAGE4AENS \$HYBRIDGSI/RST
\end{quote}
The variables are set with all required information being found in subdirectories of {\tt \$FVHOME}. The environment variable 
{\tt HYBRIDGSI} tells the {\tt analyzer} that the ensemble of backgrounds is under {\tt \$FVHOME/atmens};
the environment variable {\tt STAGE4HYBGSI} tells the {\tt analyzer} to place a copy of its resulting
central analysis and related satellite biases output under {\tt \$FVHOME/atmens/central};
and the environment variable {\tt RSTSTAGE4AENS} tells {\tt g5das.j} to
place a copy of its initial restarts under {\tt \$FVHOME/atmens/RST}. Basically, the first two of these variables define 
the two-way feedback, the third relates to the practicalities of cycling.

By default the GEOS ADAS driving script, {\tt g5das.j}, cycles one whole day of assimilation per job submitted to the batch system. 
In this first release of the hybrid capability, the central ADAS must run only a 6-hour cycle. For that, users must edit 
the {\tt g5das.j} script and add the following environment variable setting:
\begin{quote}
  setenv NSEGS 1 
  setenv NSTEP 1 
\end{quote}
This tells the scripts to stop after one 6-hour cycle -- this is the mode our operational system runs since in real-time
it can only start the analysis after observations have become available. In hybrid mode, a new cycle can only begin after an ensemble
of background fields is available. As discussed above, the {\tt atm\_ens.j} job script is responsible for the generation of these
fields. 

\begin{figure}
\begin{center}
%\includegraphics[trim=60 0 75 0,clip,height=0.55\paperheight,width=1.\textwidth]{Figs/endas_fluxschematic.pdf}
\includegraphics[scale=0.85]{Figs/endas_fluxschematic.pdf}
\end{center}
\captionsetup{margin=10pt,font=small,labelfont=bf}
\caption{Flowchart showing the sequence of events in GEOS Ensemble ADAS ({\tt atm\_ens.j}) and its connection to the (central) 
hybrid ADAS ({\tt g5das.j}). Double-dashed, marbled, boxes indicate alternative applications not normally called by default procedure.
\label{fig:EADASflowchart}}
\end{figure}

In its simpler mode of scheduling, the ensemble ADAS job is submitted at the end of the central ADAS job script and, accordingly, the 
central ADAS script is submitted at the end of the ensemble ADAS script. This mode of cycling the hybrid system does not exploit possible
parallelism between the central and ensemble assimilation systems. In actuality, the ensemble ADAS does not need to wait for the 
whole central ADAS to finish, and vice-versa. The moments to synchronize the two systems are right before the central analysis begins, 
when one must have the required ensemble of backgrounds available; and right before the ensemble ADAS needs to re-center its member 
analyses about the hybrid GSI analysis which must then be available. Looking at Fig. \ref{fig:GMAOhybSchematic}, synchronization between 
the two systems must happen when the two red arrows hit the boxes they point to. 
At the time of this writing, a smart scheduling is being tested to exploit higher levels of parallelism; 
further details appear in Sec. \ref{sec:Scheduler}.

A closer look at the sequence of events in the driving script of the ensemble ADAS appears in Fig. \ref{fig:EADASflowchart}. 
Let us walk through the main steps in the {\tt atm\_ens.j} script as observing this figure.
The first thing the ensemble job script does is to look for the starting date of the cycle. For that, it consults a 
copy of the restarts saved by the central ADAS under the directory {\tt \$RSTSTAGE4AENS}.
-- see section \ref{sec:hyEADASConf} for specific configuration instructions. The script then goes on (almost) 
sequentially doing the following:
\begin{enumerate}
\item Generating perturbations for additive inflation.
\item Running the ensemble of observers.
\item Running the ensemble analysis.
\item Post-processing the ensemble of analyses.
\item Creating the ensemble of IAU-forcing terms.
\item Running the ensemble of initialized model forecasts.
\item Post-processing the ensemble of forecasts.
\item Triggering the central ADAS.
\item Archiving the ensemble ADAS output.
\end{enumerate}
These steps make up the sequence of events in the default settings of the ensemble ADAS. The flowchart in Fig. \ref{fig:EADASflowchart}
provides a  graphical display of these step. Notice, however, that the double-dashed, marbled, boxes in the chart correspond to 
additional features normally not triggered by default, and consequently only discussed later on, in Sec. \ref{sec:AddFeatures}.
Section \ref{sec:Conventions}, on conventions, provides another view of how the 
ensemble ADAS is implemented.  Various specific details related to the steps laid out above are discussed in the remaining sub-sections
of this section. Further information about each of the scripts encountered below appears in the Appendix. 

\subsection{Generating Perturbations for Additive Inflation} 

Under the conventional mode of running the ensemble ADAS, it is necessary to obtain random perturbations to use 
as additive inflating 
factors applied to each ensemble analysis member. A database of NMC-like, 48-minus-24-hour forecast differences has been generated from
a little over one year of forecasts from the GEOS-5.7 series. The only regularly-available set of forecasts from GEOS-5.7 are those issued 
from the 00 UTC analyses, therefore, only these are used as inflating perturbations\footnote{Though some forecasts are available from 
12 UTC, they are not frequent enough to be conveniently used by the procedure that randomly selects perturbations.}. 

As we will see in Sec. \ref{sec:hyEADASConf}, most features of the ensemble ADAS are triggered by the presence of given resource files 
in the directory defined by the environment variable {\tt ATMENSETC}, typically set to {\tt \$FVHOME/run/atmens}. 
Generation of NMC-like perturbations is triggered in the main ensemble ADAS driver script by the following statements:
\begin{center}
\small{
\begin{verbatim}
  if ( -e $ATMENSETC/nmcperts.rc ) then
     if ( $DO_ATM_ENS || $RUN_PERTS ) then
        setperts.csh  ${EXPID} $nmem $anymd $anhms $TIMEINC $AENSADDINFLOC \
                      |& tee -a $FVWORK/setperts.log &
        if ($status) then
            echo "Main: failed in setperts.csh, aborting."
            exit(1)
        endif
     endif
  endif
\end{verbatim}
}
\end{center}
A check verifies the presence of the file {\tt \$ATMENSETC/nmcperts.rc}. This is a resource file with information related to the
database of NMC-like perturbations: start and end dates of the database; location of the database; and whether or not seasonality
is to be taken into account when retrieving perturbations randomly from the database (viz., NCEP operational hybrid system; 
J. S. Whitaker personal communication).

The c-shell script {\tt setperts.csh} will retrieve as many perturbations from the database as ensemble members being used.
It then calculate the mean of the retrieved perturbations and generate a new set of perturbations with mean removed. 
This is done largely to avoid
introducing biases in the current analysis since perturbations in the database are for a period likely unrelated to the 
synoptic time of interest. The bias-removal processing can be rather time consuming given that perturbations are at 0.25-degree resolution;
the cost of I/O is non-trivial. There are presently two procedures for removing the mean which simply use two sets of programs (see
Sec. \ref{subsec:ensstats}), the most recent (default) is the less demanding one.  Note that the execution of {\tt setperts.csh} is placed 
in the background to allow the main script to go on to its next task\footnote{This is the only procedure within the ensemble ADAS
that is placed to run as background job; there are specific reasons for this. In general, we largely discourage processes to 
run as background jobs.}.

\subsection{Running the Ensemble of Observers}

While the NMC-like perturbations are being de-biased, the main ensemble ADAS script moves on to run the ensemble of observers. 
Three steps are involved here: (i) retrieval of required inputs; (ii) execution of observer for mean backgrounds;
and finally, (iii) execution of observers for each member of the ensemble. All these steps are handled by the script
{\tt obsvr\_ensemble.csh}, which is invoked by {\tt atm\_ens.j} as shown below:
\begin{center}
\small{
\begin{verbatim}
      zeit_ci.x obsvr
      obsvr_ensemble.csh $OBSCLASS $EXPID $anymd $anhms |& tee -a atm_ens.log
      if( $status ) then
         echo "observer failed"
         exit(1)
      endif
      zeit_co.x obsvr
\end{verbatim}
}
\end{center}
The required inputs for the observers are the observations, the background fields, and satellite bias estimation
coeffients from a previous cycle.
Retrieving observations from the archive is done in much the same way as when running the central ADAS: 
the {\tt acquire\_obs} is eventually called from within the setup script {\tt setobsvr.csh} which in turn is called from
within {\tt obsvr\_ensemble.csh}; {\tt setobsvr.csh} also has the responsibility to obtain the satellite bias correction coefficients
(see file {\tt satbias.acq} usually placed under {\tt \$ATMENSETC}). The background files are obtained from the present location of
the ensemble (specified through the environment variable {\tt \$ATMENSLOC}; see Sec. \ref{subsec:atmensjob}). 
Once all observations, auxiliar files, and resource files
are available, the observer can be executed for the ensemble mean background\footnote{This assumes the mean of the backgrounds to be 
available; see the post-GCM step below.}. This step simply entails running GSI with especially select options for
its parameter set, as defined in the resource file {\tt obs1gsi\_mean.rc}; no minimization is triggered in this case.  
The objective here is to calculate observation-minus-background (OMB)
residuals for the mean (written out as typical "diag files") and to write out the observations passing quality control. 
The selected observations are then taken in by each of the individual observer runs, for each member of the ensemble; 
the final step taking place within {\tt obsvr\_ensemble.csh}. The GSI parameter settings controlling this step are specified 
in the resource file  {\tt obs1gsi\_member.rc}; when again, no minimization is triggered. By construction, all observers see exactly the same 
set of observations as used by the mean observer. In the end, each observer produces a set of so-called ``diag files'' with their 
corresponding OMB residuals. The member observers run parallel to one another; the level of parallelism is discussed later 
in this document.

\subsection{Running the Ensemble Analysis}

\subsubsection{Atmospheric Analysis: meteorology}

With the mean and members OMB residuals available, the {\tt atm\_ens.j} script can now 
invoke the script controlling the atmospheric ensemble analysis. The corresponding statement in the driving
script is as follows: 
\begin{center}
\small{
\begin{verbatim}
      zeit_ci.x eana
      atmos_eana.csh $EXPID $anymd $anhms |& tee -a atm_ens.log
      if( $status) then
         echo "eana failed"
         exit(1)
      endif
      zeit_co.x eana
\end{verbatim}
}
\end{center}
Depending on the settings (i.e., present resource files under {\tt \$ATMENSETC}) this procedure is capable of calling either
one of a few different options of ensemble analysis schemes (see Sec. \ref{subsec:AnaSchemeOpts}). In this introductory 
discussion, we assume the default settings are used and thus the atmospheric EnKF analysis is called. 
The ensemble of OMB residuals is used to 
update the ensemble of backgrounds and create an ensemble of analyzed fields.

\subsubsection{Atmospheric Analysis: aerosols}

The present version of GMAO's operational 3D-Var GEOS DAS incorporates a three-hourly aerosol analysis of aerosol optical 
depth using GOCART background fields using a suitably modified version of the Physical Space-space Statistical 
Analysis [PSAS; da Silva, personal comm.; see also, Cohn et al. (1998)]. A plan is in place to extend the EnKF used for 
the hybrid implementation discussed here to accommodate an ensemble of aerosol background fields and produce an updated 
set of aerosol fields. In our initial implementation of the GMAO hybrid 3D-Var systems, however, all members of the GCM ensemble 
(see what follows) see the same set of AOD analyzed fields produced by the central DAS. The script {\tt atmos\_eaod.csh} is 
invoked by the driving atmospheric ensemble DAS script to make the central AOD analysis available to all members of the ensemble
(see also contents of file {\tt aod4aens.acq}). 
\begin{center}
\small{
\begin{verbatim}
      zeit_ci.x eaod
      atmos_eaod.csh $EXPID $anymd $anhms 030000 2 |& tee -a atm_ens.log
      if( $status) then
         echo "eaod failed"
         exit(1)
      endif
      zeit_co.x eaod
\end{verbatim}
}
\end{center}
Since the meteorology is different in each GCM member, the GOCART aerosol fields become different as time progress in the 
integration of the members. The script above gives the entry-point to what eventually will produce an ensemble of AOD analysis.

\subsection{Post-processing the Ensemble of Analyses}

In the default configuration, the next step to take place is re-centering of the ensemble around the central hybrid analysis, and 
application of additive inflation. This assumes the NMC-like perturbations needed for the additive inflation procedure are ready to
be used. Recall that these were being processed by {\tt setperts.csh} that had been running as a background job while the
ensemble of observers and the EnKF have been running. At this point, there is need to synchronize the generation of the perturbations
with the main driving script. This takes place by calling the {\tt jobmonitor.csh} procedure in main:
\begin{center}
\small{
\begin{verbatim}
      set ah          = `echo ${anhms} | cut -c1-2`
      set ayyyymmddhh = ${anymd}${ah}
      jobmonitor.csh 1 setperts.csh $FVWORK $ayyyymmddhh
\end{verbatim}
}
\end{center}
The job-monitor script makes sure the ({\tt setperts.csh}) script has finished successfully. The job-monitor
script works like the barrier calls in an MPI program, serving to syncronizing all running processes.
This synchronization requirement is illustrated in Fig. \ref{fig:EADASflowchart}.
When all is complete, the post-analysis procedure can then be called:
\begin{center}
\small{
\begin{verbatim}
      zeit_ci.x post_eana
      post_eana.csh $EXPID $anymd $anhms |& tee -a atm_ens.log
      if( $status) then
         echo "post_eana failed"
         exit(1)
      endif
      zeit_co.x post_eana
\end{verbatim}
}
\end{center}
Under default settings, this calculates the mean of the updated ensemble members. 
Complementary, more advanced settings are available that instruct the scripts to calculate observation-minus-analysis (OMA) 
residuals and, possibly, observation impacts on the ensemble mean analysis as in Todling (2013; see Sec. \ref{subsec:ObSpaceObsImp}). 
After the mean analysis is available, the post-processing continues on to re-center 
the analyzed ensemble members around the hybrid GSI analysis. Re-centering amounts to removal of the ensemble mean and addition 
of the central analysis to each member of the ensemble. During this procedure, additive inflation is also applied by scaling the 
de-biased NMC-like perturbations generated from {\tt setperts.csh}.  Section \ref{subsec:Recenter} discusses in 
greater detail what really happens under the covers of this step\footnote{For example, when necessary, this step remaps the 
central analysis to the topography of each member; and furthermore, this step applies vertical blending to maintain the
stratosphere of the members as close as possible to that of the central GSI analysis.}.
 
\subsection{Creating the ensemble of IAU-forcing terms}

When all member-analyses are available, the step to create the necessary AGCM restart is called by the main {\tt atm\_ens.j} script:
\begin{center}
\small{
\begin{verbatim}
      zeit_ci.x ens2gcm
      atmos_ens2gcm.csh $EXPID $anymd $anhms |& tee -a atm_ens.log
      if( $status) then
         echo "ens2gcm failed"
         exit(1)
      endif
      zeit_co.x ens2gcm
\end{verbatim}
}
\end{center}
This creates the corresponding IAU forcing for each member of the ensemble. 
The script {\tt atmos\_ens2gcm.csh} is a wrapper controlling how the ``make-iau'' program operates.
In case of running the so-called lat-lon (regular-grid) finite-volume hydrodynamics the program {\tt makeiau.x} 
is called, whereas experiments using the cubed-sphere finite-volume hydrodynamics call {\tt mkiau.x} instead\footnote{ 
At the time of writing, both the observers and the ensemble analysis operate on a regular grid; this is similar to what is 
presently done when calculating analysis (hybrid or otherwise) with GSI --- the backgrounds are on a regular grid.}. 
When running the regular grid AGCM we create IAU-forcing terms on a regular grid, while these are created on the cubed-grid when 
the cubed-sphere hydrodynamics is used. Either way, users should not
need to be aware of this difference other than making sure the proper settings are located under {\tt \$ATMENSETC} 
(see Sec. \ref{subsec:gcmRCconfig}). The resulting increments are not quite simply the difference between each analysis 
and its corresponding background; they also incorporate a mass-wind divergence adjustment. 
Just as with the member observers, the member IAU-forcing terms are generated in parallel to one another; the level
of parallelism is discussed later on in this document.


\subsection{Running the Ensemble of Initialized Model Forecasts}

With the ensemble of IAU forcing terms available, an ensemble of AGCM integrations can be launched to 
perform the initialized assimilation of each of the members and create the background fields required for 
the next upcoming cycle of the ensemble ADAS; these background fields are also required by the hybrid (central) analysis.
The ensemble of model forecasts is controlled by the script {\tt gcm\_ensemble.csh}, called by main as seen below:
\begin{center}
\small{
\begin{verbatim}
       @ tfcst_hh  = 2 * $TIMEINC / 60
       zeit_ci.x gcm_ens
       gcm_ensemble.csh $EXPID $nymdb $nhmsb $tfcst_hh $ens_nlons $ens_nlats \
                        |& tee -a atm_ens.log
       if( $status) then
          echo "gcm_ensemble failed"
          exit(1)
       endif
       zeit_co.x gcm_ens
\end{verbatim}
}
\end{center}
By construction, the AGCM integrations cover the 6-hour IAU period plus the 6-hour background generation
period. Controlling the length of integration of the members is rather simple 
and can be done similarly to how it is done in the central ADAS (see Sec. \ref{subsec:gcmRCconfig}).
Note this does not involve changing the command line to {\tt gcm\_ensemble.csh} shown above; the
time argument {\tt \$tfcst\_hh} passed to the script is the {\it minimal} length of the forecast needed for
an assimilation frequency of {\tt \$TIMEINC} minutes, 12 hours for a 6-hour cycle. 


With successful completion of the ensemble of AGCM integrations, a crucial step takes place in the main
ensemble ADAS driver script at this point: the original (initial) ensemble is put to the side for archiving purposes; and the newly
generated ensemble is moved out of the work directory and placed where the original (initial) ensemble was.
The following statements in {\tt atm\_ens.j} perform this action:
\begin{center}
\begin{verbatim}
    /bin/mv $ATMENSLOC/atmens    $ATMENSLOC/atmens4arch.${nymdb}_${hhb}
    /bin/mv $FVWORK/updated_ens  $ATMENSLOC/atmens
\end{verbatim}
\end{center}
The important thing to realize in this step is that it avoids doing any copy of files whatsoever; copying 
would take a considerable amount of time and be utterly inefficient --- each member involves about twenty files among
backgrounds and AGCM restarts.

\subsection{Post-processing Ensemble of Forecasts}

Once the ensemble of AGCM integrations is complete, everything needed to start the central ADAS is available.
But, before the main ADAS script is launched, the ensemble ADAS post-processes the ensemble of background 
fields\footnote{In principle, there is no need to wait for this post-processing step to finish before launching the
central ADAS and its corresponding hybrid GSI. However, this first release of the hybrid system does not take
advantage of this level of parallelism due to some inefficiencies in the post-processing that will be tackled in future releases; 
see Sec. \ref{sec:FutureRel}.}.
The AGCM post-processing step is called next:
\begin{center}
\small{
\begin{verbatim}
      zeit_ci.x post_egcm
      post_egcm.csh $EXPID $nymdb $nhmsb $TIMEINC $FVHOME/atmens
      zeit_co.x post_egcm
\end{verbatim}
}
\end{center}
This entails calculating the ensemble mean of the background fields, as well as any off-line diagnostics
related to the ensemble, such as, spread and RMS (see Secs. \ref{subsec:ensstats} and \ref{subsec:ensspread}). 
Remember that the ensemble mean background is needed by the mean observer whenever the subsequent ensemble ADAS 
cycle begins again, thus at a minimal, the ensemble mean background needs to be made available. 


\subsection{Triggering the Central ADAS}

The ensemble ADAS driving script has now reached the stage when the central ADAS job can be launched. 
The statements below show how this is handled in {\tt atm\_ens.j}: 
\begin{center}
\small{
\begin{verbatim}
        cd $FVHOME/run
        if( -e ${EXPID}_scheduler.j ) then
           touch $FVHOME/.DONE_MEM001_atm_ens.${yyyymmddhh}
        else
           $ATMENS_BATCHSUB  g5das.j
        endif
\end{verbatim}
}
\end{center}
Notice there is already a check for the upcoming scheduler, under test phase at the time of this writing. At present,
however, starting the central ADAS simply amounts to submitting {\tt g5das.j} to the batch queue. Figure \ref{fig:EADASflowchart}
illustrates the continuation of the cycle when the central ADAS job is launched.

\subsection{Archiving the Ensemble ADAS Output}

The only task left over now is for the ensemble ADAS driving script to archive the required (and requested) output:
\begin{center}
\begin{verbatim}
  atmens_arch.csh $EXPID $arch_nymd $arch_nhms \
                  |& tee -a atm_ens_arch.${arch_nymd}_${arch_hh}z.log
\end{verbatim}
\end{center}
As one can imagine, archiving output from the ensemble is non-trivial. Typical files are somewhat large and the ensemble 
multiplies the total number of files to save rather dramatically. The present archiving procedure is not as efficient as 
perhaps one would hope. The archiving works by defining classes of files to be handled together and stacked in a single tar-file. Further 
details are found in Sec. \ref{subsec:enARCH}. At a minimum, the ensemble of backgrounds, valid at the synoptic hour, should be
saved. These allow re-running the central GSI analysis (in a replay-like mode) without having the recreate the ensemble itself
(see Sec. \ref{subsec:ReplayADAS}); these are also required by the adjoint-based observation impact machinery (see Sec.
\ref{subsec:StSpaceObsImp}).

This completes the introductory description of the design and implementation of the GEOS Hybrid EVADAS. The sections that follow provide
considerably more detail of each of the steps laid out above. Though we try to be as comprehensive as possible, readers must
realize this system is in its infancy and is expected to continue to evolve rather substantially as weaknesses and inefficiencies
are unraveled by the initial parallel operational experiment and eventually from users own feedback.

%....................................................................
\section{Repository Access and Installation Instructions}
%        -----------------------------------------------
\label{sec:Install}

The following is a brief step-by-step instruction for how to access and compile the source code of
GEOS Hybrid EVADAS.
\begin{description}

\item[Checkout:] cvs co -r TAG MODULE, where TAG and MODULE define a
particular version of interest. For example, 
\begin{quote}
   cvs co -r EnADAS-5\_13\_5 GEOSadas-5\_13
\end{quote}

\item[Compiling:] After a fresh checkout, compilation can be 
accomplished by using the script {\tt parallel\_build.csh} residing
under the GEOSadas/src directory. This script will prompt the user to simple
questions, usually the defaults suffice.

\item[Central ADAS Setup:] After compilation completes, all relevant scripts and executables
will be installed in the {\tt bin} directory, usually, under GEOSadas/Linux/bin
(assuming a Linux machine architecture). The program {\tt fvsetup}
provides a way to configure an experiment to run the usual ADAS. This must run
before attempting to setup the hybrid (and ensemble) component(s) for experiment of interest.

\item[Ensemble ADAS Setup:] At the time of this writing, we have only a very rough setup script
to help prepare the parameters and resource files needed to run the ensemble ADAS: {\tt
setup\_atmens.pl}. This script is installed in the {\tt bin} directory of the build.
The following shows this setup usage:
{\small
\begin{verbatim}          
NAME
     setup_atmens.pl - setup resources to allow running Hybrid ADAS
          
SYNOPSIS

     setup_atmens.pl [...options...] scheme
                                     expid
                                     aim
                                     ajm
                                     ogrid
          
DESCRIPTION


     The following parameters are required 

     scheme   enkf or engsi
     expid    experiment name, e..g., u000_c72
     aim      number of x-grid points in Atmos GCM
     ajm      number of y-grid points in Atmos GCM
     ogrid    c or f, for low- or high-resolution Ocean GCM


OPTIONS

     -expdir       experiment location (default: /discover/nobackup/user)
     -atmens       location of ensemble memebers (default: FVHOME/atmens)
     -h            prints this usage notice

EXAMPLE COMMAND LINE

     setup_atmens.pl enkf u000_c72 288 181 c

NECESSARY ENVIRONMENT

OPTIONAL ENVIRONMENT

AUTHOR

     Ricardo Todling (Ricardo.Todling@nasa.gov), NASA/GSFC/GMAO
     Last modified: 28Feb2014      by: R. Todling

\end{verbatim}
} % small

\end{description}

%....................................................................
\section{Hybrid EVADAS Configuration}
%        ---------------------------
\label{sec:hyEADASConf}

%....................................................................
\subsection{Ensemble ADAS Driving Script}
%           -----------------------------
\label{subsec:atmensjob}

As we have seen, the driving script of GEOS ensemble ADAS is called {\tt atm\_ens.j}. 
This is the equivalent counterpart of the (hybrid) ADAS script {\tt g5das.j}, but controlling 
the atmospheric ensemble assimilation instead. Though there are similarities between
these two scripts, their inner parts are substantially distinct. 
The {\tt atm\_ens.j} handles about ten steps from running the observer to
archiving output from the ensemble ADAS cycle. All the steps are explicitly laid
out  -- unlike in the ADAS ({\tt g5das.j}) script which
essentially calls the legacy {\tt fvpsas} driver (or its alternative 4DVar
driver {\tt g54var}), where all steps are hidden from the user.
The perils of leaving the steps explicit in the primary job script are that 
of giving the impression users are free to swap steps, or modify operations
at will. We largely discourage the average user from re-arranging operations
in the driver. 

As in the case of the central (hybrid) ADAS, once the batch script starts running,
the work takes place in a temporary directory. However, unlike the central ADAS,
the work directory for the ensemble is not a floating (TMPDIR-like) directory, randomly
created each and every time the job script is submitted.  The environment variable 
specifying the work directory is called {\tt FVWORK}, just as in the central ADAS 
script, but in the case of {\tt atm\_ens.j} it is defined as follows:
\begin{verbatim}
 setenv FVWORK /discover/nobackup/$user/enswork.$EXPID
\end{verbatim}
The reason for fixing the work directory relates to the eventual need to 
re-submit the job script so it completes unfinished processes that might have
failed the first time. The ensemble ADAS handles an incredibly large amount
of work and processes. Sometimes, largely due to machine glitches, batch system
issues, disc mis-behavior, time-outs and more, the job may terminate unsuccessfully
before the cycle ends. Fortunately, in the large majority
of such cases, our design is such that it requires the user to simply
re-submit {\tt atm\_ens.j} so it picks up from where it left and it completes 
the remaining tasks. At times, when things are really at odds with the 
computing environment, one cycle might stop more than once. Still, in these
cases, users should simply verify that the reason for halting is indeed a 
glitch in the system, and re-submit the job once again. This ability to allow
the driving script to pick from where it left is the reason behind having
a fixed (non-floating) work directory.  Note that sometimes in this manuscript we
refer to {\tt \$FVWORK} as {\tt \$ENSWORK} since this is the name of the work directory
used by the internal scripts of the ensemble ADAS.

In a healthy termination, the fixed work directory is removed by the script. 
Therefore, one indicator of whether things have worked successfully
or not is the absence or presence, respectively, of the work directory {\tt enswork.\$EXPID}
in the user scratch (``nobackup'') area after the {\tt atm\_ens.j} no longer shows in
the batch queue.

Developers of further and future features of the ensemble ADAS will find useful
to know that there are certain environment variables in the {\tt atm\_ens.j}
job script that come handy when working out modifications into the system.
These variables are defined near the top of the file {\tt atm\_ens.j}
and are as follows:
\begin{verbatim}
  setenv  DO_ATM_ENS    1
         #    The following set specific pieces separately
         #    Note: FVWORK better be defined by hand 
         #    ---------------------------------------------
         setenv  RUN_PERTS      0  # ens of perturbations
         setenv  RUN_OBVSR      0  # ens observers
         setenv  RUN_EAANA      0  # ens analysis
         setenv  RUN_PEANA      0  # ana post-proc
         setenv  RUN_ENS2GCM    0  # IAU restarts
         setenv  RUN_AENSFCST   0  # ens GCMs & post-proc
         setenv  RUN_ARCHATMENS 0  # archiving
\end{verbatim}
Their names are almost self explanatory. It is easy to imagine that, with
the above definition for the variable {\tt DO\_ATM\_ENS} the
script will go over all the required ensemble ADAS steps (regardless 
of how the other environment variables are declared, right after {\tt DO\_ATM\_ENS}).

If a user wants for the job script to simply 
stop after completion of the ensemble analysis (e.g., EnKF), one
can redefine the variables above as in:
\begin{verbatim}
  setenv  DO_ATM_ENS    0
         #    The following set specific pieces separately
         #    Note: FVWORK better be defined by hand 
         #    ---------------------------------------------
         setenv  RUN_PERTS      1
         setenv  RUN_OBVSR      1
         setenv  RUN_EAANA      1
         setenv  RUN_PEANA      0
         setenv  RUN_ENS2GCM    0
         setenv  RUN_AENSFCST   0
         setenv  RUN_ARCHATMENS 0
\end{verbatim}
This will force the driving script {\tt atm\_ens.j} to stop at the
desirable place. The temporary work area remains available for the
user to work with if necessary. Alternatively, the user can simply run the next 
step in the sequence to observe its behavior more closely, for example. 
As one can imagine, it would be feasible to activate {\tt RUN\_PEANA} and
re-submit the driver so the next step in the sequence would 
take place; but obviously, it would be rather mistaken to 
try activating, say, the variable {\tt RUN\_AENSFCST} which controls
the forecast before having gone through the steps prior to that.

%....................................................................
\subsection{Environment Configuration}
%           --------------------------
\label{subsec:EnvConfig}

Configuration of the ensemble ADAS takes place in the form of resource files and shell environment variables. This section 
covers the latter; specific resource-file configuration is treated separately in follow-up sections. All environment 
variables related to the ensemble ADAS are defined in the file {\tt AtmEnsConfig.csh}, which is installed under 
the {\tt etc} directory of the build; after the {\it ensemble} setup, a copy of this file will reside under
the directory \$FVHOME/run/atmens. This file compartmentalizes environment variable definitions related to 
the various steps of the ensemble ADAS with general, globally applicable variables defined atop. A typical configuration
of these variables is shown below.

{\small
\begin{verbatim}
# common to all
# -------------
setenv ATMENSLOC  $FVHOME            # locations of recycled files for 
                                     #   atmospheric ensemble
setenv ATMENSETC  $FVHOME/run/atmens # location of the ensemble-related 
                                     #   resource files
setenv RSTSTAGE4AENS  $ATMENSLOC/atmens/RST # location of AGCM restarts
                                            #   provided by central ADAS
setenv ATMENS_VERBOSE 1              # this can be put instead around each 
                                     #   script call in atm_ens.j
setenv JOBGEN_NCPUS_PER_NODE 8       # controls general setting for jobgen script
setenv JOBMONITOR_MAXSLEEP_MIN 60    # maximum time (minutes) to wait for parallel 
                                     #   job completion
setenv ENSPARALLEL 1          # 0 - does serial ensemble
                              # 1 - non-concurrent parallel ensemble
                              # 2 - concurrent (w/ central ADAS) parallel ensemble
#
setenv OBSCLASS "ncep_1bamua_bufr,ncep_1bamub_bufr,ncep_1bhrs2_bufr,..."
\end{verbatim}
} % small

The meaning of each of these environment settings is trivially explained as a comment following each declaration.
The environment variable defining the observation classes to be used in the ensemble ADAS is similar to that 
of the central ADAS, and it does not have to be defined in the {\tt AtmEnsConfig.csh} script, unless, for whatever reason the 
observation classes used by the ensemble analysis are different than those used in the central DAS. By default, the is actually
not present in {\tt AtmEnsConfig.csh, and the ensemble analysis takes the definition of classes from {\tt FVDAS\_Run\_Config}.

Some of the variables shown above require a little more explanation. The location of the ensemble members,  where 
the underlying scripts expect to find the members of the ensemble is defined by the variable {\tt ATMENSLOC}. This is
usually set to be {\tt \$FVHOME/atmens}. The location of the resource files related to the 
ensemble is separate from those of the regular
ADAS, that is, the latter is mainly {\tt \$FVHOME/run}, the former is defined through the variable {\tt ATMENSETC}, and 
it is typically set to {\tt \$FVHOME/run/atmens}, as shown above\footnote{The GMAO Operational Group, usually separates the
location of restarts (and other large files) from the basic resource files and driving scripts; the former are usually 
kept in the scratch (``nobackup'') area, and the latter are usually kept under a subdirectory of {\tt \$HOME}. 
A similar separation is allowed when running the hybrid ensemble system.}.

The variable {\tt ENSPARALLEL} defines the type of parallelism to be exploited by the ensemble ADAS scripts: a
value of zero means that everything runs sequentially -- this has been useful during the initial phase of implementation of the 
ensemble ADAS, and it sometimes helps debugging, but it is only reasonable for a very small ensemble; the value of one, 
is the recommended mode of running at the time of this writing. It exploits maximum level of parallelism within
the ensemble ADAS itself, but it does not exploit possible parallelism between the ensemble ADAS and the central,
variational ADAS. This, higher level of parallelism, is controlled by setting this variable to 2, and it 
assumes the {\it scheduler} is being used to control how the central and the ensemble ADAS scripts are synchronized (see below).

Lastly, the other important variable to notice in this block is {\tt JOBMONITOR\_MAXSLEEP\_MIN}, which sets
the maximum time the job monitoring program will allow any set of jobs to execute. This is a critical variable and 
it is somewhat dependent on the behavior of the computing systems (and size of the ensemble).
In the illustration setting above, the monitoring job is told to allow one full hour for major processes to complete.
Section \ref{subsec:jobmonitor} discusses the intricacies of this variable more closely.

Following the sequence of events in the ensemble ADAS cycle, the next set of environment parameters relates to the retrieval and
preparation of the ensemble of NMC-like perturbations necessary for the additive inflation procedure. In the example below
these are set to run on the {\tt general\_small} NCCS (discover) PBS queue using only a single CPU, and using a maximum 
wall-clock time of 1 hour.  \\
\begin{verbatim}
  Environment setting to prepare NMC-like perturbations 
  -----------------------------------------------------
setenv PERTS_QNAME general_small
setenv PERTS_WALLCLOCK 1:00:00
setenv PERTS_NCPUS 1
\end{verbatim}
As mentioned earlier, there are two procedures to calculate and remove the mean of perturbations
--- both procedures are parallelized, but involve considerably different I/O loads. Section 
\ref{subsec:ensstats} presents a detailed discussion of these procedures.

Continuing down the flux of events shown in Fig. \ref{fig:EADASflowchart}, the next set of environment variables refers to 
how the observer runs are
controlled. The settings below illustrate an acceptable configuration to run the observers using 1-degree backgrounds: 
the wall-clock time (rather inflated), number of CPUS, and {\tt mpirun}-related variables require no explanation; 
the variable {\tt AENS\_OBSVR\_DSTJOB} is optional and requires some attention.  After running the mean observer as an individual 
(32-CPU) batch job, when it comes time for the observer driving script to work on the individual member observers, 
an undefined (or absent) variable {\tt AENS\_OBSVR\_DSTJOB} results in submission of as many
simultaneous 32-CPU batch jobs as there are members in the ensemble; on the other hand, when this 
variable is defined as in the example below, the total number of batch jobs will be determined as being the total number of members
divided by {\tt AENS\_OBSVR\_DSTJOB}: for a 32-member ensemble the number of batch jobs is $32 / 4 = 8$, and since each observer 
requires $32$ CPUS, each of the $8$ batch jobs will require $32 \times 4 = 128$ CPUS. \\
\begin{verbatim}
  Environment setting to run the GSI observers
  --------------------------------------------
setenv OBSVR_WALLCLOCK 0:45:00
setenv ENSGSI_NCPUS 32
setenv MPIRUN_ENSANA  "mpirun -np $ENSGSI_NCPUS GSIsa.x"
setenv AENS_OBSVR_DSTJOB 4
\end{verbatim}
In this latter case, when multiple observers are packed into a single batch job, the {\tt mpirun} command line will be automatically
replaced with a properly defined {\tt mpiexec} command line. The partitioning of the various works and batch jobs is handled by 
the {\tt job\_distributor.csh} script installed in the {\tt bin} directory of the build.

The EnKF is the step to be executed after the observers complete their task successfully. This is just a single MPI job that 
needs to be submitted to the batch system, thus its settings are rather simple. The illustration below refers to the requirements
for running the 1-degree resolution case.\\ 
\begin{verbatim}
  Environment setting to run the EnKF analysis
  --------------------------------------------
setenv ATMENKF_WALLCLOCK 0:30:00
setenv AENKF_NCPUS 96
setenv MPIRUN_ATMENKF "mpirun -np $AENKF_NCPUS enkf.x”
\end{verbatim}

Once the ensemble analysis completes, the ensemble needs to be re-centered and inflated. The environment variables controlling 
this part are shown below.  \\
{\small 
\begin{verbatim}
  Environment setting to re-center and inflate analyses
  -----------------------------------------------------
setenv AENS_ADDINFLATION 1         # apply additive inflation to each analysis member
setenv AENSADDINFLOC addperts      # location for additive perturbations
                                   #    (path relative to FVWORK)
setenv ADDINF_FACTOR 0.25          # additive inflation parameter
setenv RECENTER_QNAME general_small
setenv RECENTER_WALLCLOCK 0:15:00
setenv ENSRECENTER_NCPUS 4
setenv AENS_RECENTER_DST 4
\end{verbatim}
} % small
The first one, {\tt AENS\_ADDINFLATION}, tells the ensemble ADAS that inflation is to be applied
to the analysis. As we learned, the default is to use NMC-like perturbations for that. The location, inside the work area,
where these perturbations can be found is specified by defining {\tt AENSADDINFLOC}\footnote{In retrospect, this 
variable should have been hidden from the user. This will be revisited in a future release.}. The factor used to scale the
perturbations while adding them to the member analyses is specified by {\tt ADDINF\_FACTOR}\footnote{Eventually, this variable
will either be moved to a resource file or disappear completely when considering possible adaptive procedures to inflate
the members.}. The other variables above are self-explanatory given their similarity with variables already discussed.

Generating IAU restarts from the available ensemble of analyses and backgrounds requires running as many executable instances
as members in the ensemble. Just as with some of the steps above, this can be done by either submitting as many batch jobs
as members (by leaving out the environment variable {\tt AENS\_IAU\_DSTJOB} from the list of defined variables), or by setting
the variable {\tt AENS\_IAU\_DSTJOB} as in the example below: $8$ batch jobs requesting $24\times 4 = 96$ CPUS will handle 
the calculation. \\
\begin{verbatim}
  Environment setting to create IAU forcing terms from each member
  ----------------------------------------------------------------
setenv AENS_IAU_DSTJOB 4
setenv IAU_WALLCLOCK 0:10:00
setenv ENSIAU_NCPUS 24
setenv MPIRUN_ENSIAU    "mpirun -np $ENSIAU_NCPUS $IAUX”
\end{verbatim}

Running the ensemble of AGCMs is controlled by similar environment variables as those above.
In the example below we chose, again, to have $8$ batch jobs each controlling the execution of $4$ simultaneous
AGCM runs, within a job script requiring $4 \times 48 = 192$ CPUS. \\
\begin{verbatim}
 Environment setting to run the ensemble of AGCM
 -----------------------------------------------
setenv AENS_GCM_DSTJOB 4
setenv AGCM_WALLCLOCK 1:00:00
setenv ENSGCM_NCPUS 48
setenv MPIRUN_ENSGCM  "mpirun -np $ENSGCM_NCPUS GEOSgcm.x"
\end{verbatim}

The next step in the sequence shown in Fig. \ref{fig:EADASflowchart} is the post-processing of the ensemble of AGCM output. 
This is handled within the computing resources used by main calling program, i.e., {\tt atm\_ens.j}. There is
presently no set of environment variables  exploiting certain levels of parallelism in this process. As hinted earlier,
this is presently being revisited.

Finally, the archiving procedure is controlled by the options below. \\
\begin{verbatim}
  Environment setting to archive results from an ensemble cycle
  -------------------------------------------------------------
setenv ENSARCH_FIELDS "eana,ebkg,stat"
setenv ENSARCH_WALLCLOCK 2:00:00
setenv ARCHLOC /archive/u/$user
\end{verbatim}
At the moment, the archiving mechanism is rather (NCCS-) discover-centric. 
Jobs handled by this procedure are automatically submitted to the {\it datamove} batch queue -- to be generalized.
More importantly, however, is the method of defining collection of files to be archived.
The example above addresses three collections: {\it eana}, which refers to the ensemble of analyses;
{\it ebkg}, which refers to the ensemble of background files (required for running hybrid GSI); and {\it stat}, which 
refers to the ensemble mean, RMS, and other statistics. This particular choice of collections is not enough
to restart the ensemble in case something is corrupted. More information about the archiving is available in the next section. 

\noindent {\it Remarks:}
\begin{itemize}
  \item It is fundamental that the {\tt mpirun} calls defined through some of the environment variables
        above never get replaced with calls to the ADAS script {\tt esma\_mpirun}. 
  \item In the authors experience, the best throughput on the NCCS machines for the ensemble ADAS, and the default 
        32-member 1-degree configuration is obtained when running each of the 32 observers and corresponding model integrations 
        as individual job scripts, rather than having them pilled up into large PBS jobs. That is, best throughput 
        is obtained when leaving out the variables {\tt AENS\_OBSVR\_DSTJOB} and {\tt AENS\_GCM\_DSTJOB} from the configuration file 
        {\tt AtmEnsConfig.csh}.        
\end{itemize}


%....................................................................
\subsection{Configuration of 3DVar-Hybrid}
%           -----------------------------
\label{subsec:gsiRCconfig}

In addition to the environment variable {\tt HYBRIDGSI}, necessary to inform the {\tt analyzer}
about the location of the ensemble members, the GSI analysis must be told it is actually
to run in hybrid mode. This is accomplished by setting the parameter {\tt l\_hyb\_ens}, in the namelist 
{\tt HYBRID\_ENSEMBLE} within the resource file {\tt gsi.rc.tmpl}, to {\it true}. 
The default GMAO hybrid ADAS configuration uses a $32$-member ensemble generated at half
the resolution of the, presently, $0.5$-degree analysis. The typical setting 
for the relevant parameters in the resource file is illustrated below. 

\vspace{0.5in}
{\small
\begin{verbatim}
 &HYBRID_ENSEMBLE
   l_hyb_ens=.true.,n_ens=32,beta1_inv=0.50,generate_ens=.false.,
   uv_hyb_ens=.true.,s_ens_h=800.,s_ens_v=-0.5,
   jcap_ens=126,nlat_ens=181,nlon_ens=288,aniso_a_en=.false.,
   jcap_ens_test=126,
   oz_univ_static=.true.,
   readin_localization=.true.,
   readin_beta=.true.,
   use_gfs_ens=.false.,
 /
\end{verbatim}
} % small
The resolution of the ensemble members is explicitly declared here, as well as the number of members
being used. An important feature of GSI hybrid relates to the possibility to provide different localization 
scales for different levels. This is controlled by the parameter {\tt readin\_localization}. Setting this parameter
to {\it true} forces GSI to look for these scales in a file named {\tt hybens\_info} (viz., 
{\tt gmao\_global\_hybens\_info.x288y181l72.rc} in {\tt \$FVHOME/run} directory).
The default localization scales have been tuned for the present resolution configuration; they might need revision
as resolutions are changed in future releases.  Another parameter to be aware of specifies the 
weights between the contribution from the ensemble error covariance and its climatological counterpart. The
setting {\tt beta1\_inv=0.5} appearing above suggests the contributions from these terms to be evenly divided. 
However, attention must be given to the parameter {\tt readin\_beta}. When this parameter is set to {\it true}, 
as in the example here, {\tt beta1\_inv} is ignored and GSI expects to
read vertically-varying $\beta_s$ and $\beta_e$  parameters from a file named {\tt hybens\_betainfo} (viz., file
{\tt gmao\_global\_hybens\_betainfo.l72.rc} in {\tt \$FVHOME/run} directory). Figure \ref{fig:BetaLocCoeffs}
shows how our current horizontal localization and ``$\beta$'' weights as a function of  the vertical levels
of the analysis. These  settings give equal weights to each background error covariance term up to $20$ hPa. Above this
level, a transition zone exists where the weights slowly changes so that above $5$ hPa full weight is given to 
the climatological background error covariance matrix (similar in nature to Clayton et al. 2012; see Fig. 7 in that work).
The other parameters appearing in the resource file (namelist) above can basically be ignored by most users. 

\begin{figure}[ht]
\begin{center}
%\includegraphics[trim=0  90 60 130,clip,scale=0.30]{Figs/Rel1/hrzloc_scales.pdf}
%\includegraphics[trim=50 90 0  160,clip,scale=0.35]{Figs/Rel1/beta_vertical.png}
\includegraphics[scale=0.25]{Figs/Rel1/hrzloc_scales.pdf}
\includegraphics[trim=50 90 0  160,scale=0.35]{Figs/Rel1/beta_vertical.png}
\end{center}
\captionsetup{margin=10pt,font=small,labelfont=bf}
\caption{Horizontal localization scales (km; left) and vertical weights (adimensional; right) given to climatological (blue) and ensemble (red) 
background error covariances, both as a function of pressure levels in the analysis. 
\label{fig:BetaLocCoeffs}}
\end{figure}

Lastly, another important hybrid GSI configuration parameter is {\tt hybens\_inmc\_option}, which is set in 
the {\tt STRONGOPTS} namelist inside {\tt gsi.rc.tmpl}. The present choice defaults to $2$ which amounts to
using the TLNMC of Kleist et al. (2009b; 2012) to balance both climatological and ensemble contributions of the minimization
to the analysis increment.

Nothing else is required to trigger the hybrid GSI in GEOS Hybrid EVADAS. What remains now relates to setting up
the actual ensemble component of the ADAS.

%....................................................................
\subsection{Configuration of the Ensemble ADAS}
%           ----------------------------------
\label{subsec:ensENSconfig}

After setup of an experiment (see Sec. \ref{sec:Install}), all resource files related to the atmospheric ensemble will be found 
under: {\tt \$FVHOME/run/atmens}. The list of current files are as follows:
\begin{description}
\item[AGCM.rc.tmpl]              - usual AGCM resource file, but controlling ensemble forecasts
\item[aod4aens.acq]              - location/template-name of central aerosol analysis files
\item[AtmEnsConfig.csh]          - sets all relevant Env Vars for the atmos-ensemble scripts
\item[atmens\_incenergy.rc.tmpl]  - info related to calculation of ensemble spread in energy units
\item[atmens\_storage.arc]        - sets template names for achiving ensemble output
\item[atmos\_enkf.nml.tmpl]       - namelist used by EnKF executable
\item[CAP.rc.tmpl]               - usual AGCM resource file, but controlling ensemble forecasts
\item[GEOS\_ChemGridComp.rc]     - required if configuration of GCM-chem component differents from central
\item[GSI\_GridComp\_ensfinal.rc.tmpl] - GSI-resource, controlling final ensemble of observers (for OMA)
\item[GSI\_GridComp.rc.tmpl]      - GSI-resource, controlling ensemble of observers
\item[HISTAENS.rc.tmpl]          - AGCM history for ensemble forecasts (backgrounds)
\item[mkiau.rc.tmpl]              - controls generation of IAU-tendency files used in ensemble of GCM
\item[nmcperts.rc]               - specify information related to database of NMC-like perturbations
\item[mp\_stats.rc]               - defines configuration for how mean, RMS, and energy-based spread are done
\item[mp\_stats\_perts.rc]         - defines configuration for how to remove mean of NMC-like perturbations
\item[obs1gsi\_mean.rc]           - controls GSI observer mean  
\item[obs1gsi\_member.rc]         - controls GSI observer members
\item[post\_egcm.rc]             - define GCM output collections to calculate ens-means for
\item[satbias.acq]               - location of satellite bias coefficient files
\end{description}
At present, the user must edit some of these files to make sure all choices are proper to the experiment being prepared. 
Typically, resolution, model time-step, and MPI distribution configuration must be set in files {\tt CAP.rc.tmpl}, 
{\tt AGCM.rc.tmpl}, {\tt GSI\_GridComp.rc.tmpl}, {\tt obs1gsi\_mean.rc} and {\tt obs1gsi\_member.rc}.  
The file {\tt satbias.acq} must be edited to have the proper experiment name.

Additional features of the ensemble ADAS can be triggerd wih additional resource files. This files are normally
not taken to {\tt \$FVHOME/run/atmens} during the ensemble setup procedure, but are available in the installation of 
the ADAS, under the build directory {\tt etc}. A list of these extra files follows here:
\begin{description}
  \item[atmens\_asens.acq]     - required when calculating analysis sensitivities with hybrid GSI
  \item[atmens\_replay.acq]    - required when replaying hybrid GSI 
  \item[atmens\_rst\_regrid.rc] - allows user to keep minimal set of restarts (this is not completely
                                  functional in the present release due to changes recently made to 
                                  the regrid utility, and due to lack of cube-to-cube capability within
                                  the regrid procedure
  \item[central\_ana.rc]       - allows centering ensemble analysis around desired (central) analysis; useful,
                                 when running ensemble-only DAS experiments
  \item[easyana.rc] - triggers filter-free scheme ({\tt atmos\_enkf.nml.tmpl} must not be present)
\end{description}
Of all the files above, the only ones that, when required, must be placed in {\tt \$FVHOME/run}, rather
than under {\tt \$FVHOME/run/atmens}, are the two acquire files: {\tt atmens\_asens.acq} and {\tt atmens\_replay.acq}.
Readers will find a little more information about some of the files above as they continue reading.

%....................................................................
\subsection{Analysis Scheme Options}
%           -----------------------
\label{subsec:AnaSchemeOpts}

The present GEOS Hybrid EVADAS implements three possibilities for generating an ensemble of analyses. The first follows
Whitaker et al. (2008) and it implements a square-root-type ensemble Kalman filter (EnKF); this is the version currently
operational at NCEP. The second procedure creates an ensemble of analyses by simply perturbing the central analysis
with adequately scaled NMC-like perturbations, generated from pre-computed 48-minus-24-hour forecast differences.
And the third, creates an ensemble of GSI analyses, thus providing the environment to run an 
Ensemble of Data Assimilation (EDA) systems within GEOS 
(analogous to ECMWF's EDA; Lars Isaksen, personal communication). Of these options, 
only the first two have been extensively examined thus far. The presence of certain resource files
controls which of the options is active. Their specific configuration follows below.

\begin{itemize}
\item EnKF (default): for this option to be triggered the following resource files must be present 
      in the {\tt \$FVHOME/run/atmens} directory:
      \begin{enumerate}
         \item {\tt obs1gsi\_mean.rc}  - controls parameters for the observer mean.
         \item {\tt obs1gsi\_member.rc} - controls parameters for each observer member.
         \item {\tt atmos\_enkf.nml.tmpl} - controls options of atmospheric EnKF itself. 
      \end{enumerate}
      The presence of these resource files triggers a call to {\tt obvsr\_ensemble.csh} and the sequence of events
      related to it as described earlier. The two observer resource files are set to not trigger the GSI minimization part. 
      Furthermore, they tell GSI to write out its set of diagnostic files containing corresponding OMB residuals.
      The member observers are told not to read the observations from scratch, but rather to 
      read the observations that passed quality control when running the mean observer --- this is why the member observers
      must wait for the mean observer to complete its task.  The presence of {\tt atmos\_enkf.nml.tmpl} triggers the EnKF, 
      which follows the observers as shown in the flowchart in Fig. \ref{fig:EADASflowchart}).
      The EnKF normally uses background error localization parameters from the central (hybrid) GSI. It is 
      possible to overwrite those by placing an alternative resource file {\tt gmao\_global\_hybens\_locinfo.l72.rc}
      under {\tt \$ATMENSETC}. This might be useful depending on considerations related to resolution of the members versus
      that of the central GSI.
\item Filter-free Ensemble Scheme.  This simplified ensemble generation procedure  is triggered by the following
      resource files:
      \begin{enumerate}
         \item {\tt easyana.rc} - trigger for the filter-free scheme; it must be edited to specify the resolution of  
                                  the central analysis and that of the members to be created.
         \item {\tt atmens\_rst\_regrid.rc} - required only when using the lat-lon AGCM and running the filter-free scheme
                                in dual AGCM resolution. 
      \end{enumerate}
      This procedure does not require running the observers and it amounts to a rather simple way of generating an ensemble,
      as explained earlier.
      At the time of this writing, when the cubed-sphere model is used, the filter-free scheme executes at its best when
      the ensemble of AGCMs is at equal resolution with the central AGCM -- a dual-resolution ensemble is possible, but requires  very
      careful management of the AGCM initial conditions. With the lat-lon AGCM hydrodynamics, a dual resolution ensemble 
      configuration is possible, but users must be prepared to setup the regridding capability.
      When experimenting with the filter-free procedure, be sure to remove the files related to triggering the observers and 
      EnKF from the directory defined by {\tt \$ATMENSETC}, to avoid conflicts of triggering two schemes simultaneously.
\item Ensemble of GSIs. This requires the following resource files to be placed in the {\tt \$ATMENSETC} directory:
      \begin{enumerate}
         \item {\tt gsi\_mean.rc}   - set to run only the mean observer (similar to obs1\_mean.rc) -- no minimization takes place; 
                                      this is done so all member analyses (next) use the same quality-controlled set of observations.
         \item {\tt gsi\_member.rc} - controls how GSI analyzes each set of member backgrounds.
                                      The minimization options here can be set just as for
                                      of the central analysis (except for being hybrid).
      \end{enumerate}
      To properly trigger the ensemble of GSIs --- which turns the system into an Ensemble of Data Assimilation Systems
      --- the resource files related to either the EnKF or the filter-free scheme must not be present in 
      directory {\tt \$ATMENSETC}. Users should be aware that though this provides GEOS ADAS with the capability to 
      perform EDA, it is still a largely premature knob needing particular attention when it comes to perturbing the observations, 
      perturbing sea-surface-temperature and other atmospheric model parameters, and using GSI effectively. 
\end{itemize}

%....................................................................
\subsection{Atmospheric AGCM Configuration}
%           ------------------------------
\label{subsec:gcmRCconfig}

Setting up the ensemble of AGCM members is rather much like setting up the AGCM for the usual ADAS.
The main difference being that the AGCM resource files are found under the ensemble-related resource 
directory defined by {\tt \$ATMENSETC}. Indeed, the files {\tt CAP.rc.tmpl} and {\tt AGCM.rc.tmpl} are 
near copies of those setting the central ADAS, under {\tt \$FVHOME/run}, with the exception of resolution
and MPI distribution parameters that must be set properly. As mentioned earlier, output of the AGCM members
is controlled by {\tt HISTAENS.rc.tmpl}, which is a very trimmed down version of {\tt HISTORY.rc.tmpl}.
Some, less usual, files may be placed in the resource directory of the ensemble ADAS
for alternative AGCM settings:
\begin{description}
  \item[AGCM.BOOTSTRAP.rc.tmpl] -- It is possible to bootstrap the AGCM restart files related to the
        ensemble ADAS. This works in exactly the same way as when bootstrapping  restarts in the traditional
        GMAO 3DVar system. Just place the bootstrap resource file under the directory holding all
        resource files related to the ensemble ADAS, under {\tt \$ATMENSETC}. Once the AGCM creates the 
        output set of checkpoint restarts corresponding to the initially missing restarts, the scripts 
        stop using the bootstrap resource file and default back to the usual {\tt AGCM.rc.tmpl} file.
  \item[GEOS\_ChemGridComp.rc] -- Presently, we do not run the ensemble ADAS with the GEOS aerosol analysis
        (GAAS), consequently, a differently configured file {\tt GEOS\_ChemGridComp.rc} is placed under
        {\tt \$ATMENSETC} to allow us to turn off the aerosol analysis (GAAS) component of the AGCM.
        Notice, however, that we still run the ensemble ADAS with GOCART.
  \item[CAP\_hh.rc.tmpl] -- This essentially allows turning the ensemble of AGCM integrations, issued from 
        hour {\tt hh}, e.g., 21, into an ensemble of forecasts whose length is specified inside this particular cap file. 
        Similarly, it is possible to choose extra output history from certain particular integration times by specifying
        files named like {\tt HISTAENS\_hh.rc.tmpl}.
\end{description}

%....................................................................
\subsection{Archiving output from ensemble ADAS}
%           -----------------------------------
\label{subsec:enARCH}

Archiving the output of the ensemble is controlled by the file {\tt atmens\_storage.arc}  
and by the ensemble ADAS environment variable 
{\tt ENSARCH\_FIELDS}. The former works just like any typical archiving resource file, and
specifies template names for each type of file to be archived. For example,
the file holding text output from an EnKF run is templated as
{\tt \%s.atm\_enkf.log.\%y4\%m2\%d2\_\%h2z.txt}, and this is as it appears 
in {\tt atmens\_storage.arc}, preceded by the directory location 
where files are supposed to be placed in the archive. 

The environment variable {\tt ENSARCH\_FIELDS} defines collections 
of files to be packed together before archiving takes place. 
At the time of this writing, the following collections exist:
\begin{description} 
 \item[eaer] -- set of aerosol background files, usually, abkg.eta files
 \item[eana] -- set of analysis files, usually, ana.eta files
 \item[ebkg] -- set of background files, usually, bkg.eta/sfc files
 \item[eoi0] -- set of observation impacts on mean ensemble analysis
 \item[eprg] -- set of prognostic files, usually, prog.eta files
 \item[erst] -- set of AGCM restarts, usually, the files with ``bin'' suffix
 \item[stat] -- set of ensemble diagnostic files: mean, RMS, and spread
\end{description} 
The collections are specified in {\tt ENSARCH\_FIELDS} separated by comma.
A given collection determines a given type of file to be placed in a tar-file created
before the usual {\tt archive} script is launched. The archiving procedure can be cumbersome at times since it depends 
on the behavior of the NCCS machines. When the machines are having problems, it tends to be considerably time consuming 
to build up the tar-files carrying the collections. Indeed, this is why the default setting of file collections 
is not the minimal set necessary to reproduce, or restart, an ensemble ADAS cycle.  On that note, the collections needed to 
reproduce or restart the ensemble are ``ebkg'', ``erst'', and ``stat'' (see Sec. \ref{subsec:ReproducingEADAS}):
the first stores background files; the second stores restart files -- 
both save {\it all members} of the ensemble, which amounts to a {\it rather large number of, rather large, files} to copy 
into corresponding tar-balls; the latter collection stores the statistics, off which only the mean is needed for the purposes 
under consideration. In the future we might consider ``shaving'' some of the files being stored. Furthermore, there is a possibility
that some of the initial condition files (fields) presently used to start the ensemble of AGCM forecasts might well simply be 
bootstrapped each and every cycle. This is the approach taken at NCEP --- cold-starting the physics in 
each cycle --- and it is something we are looking into to help reduce storage requirements.

%....................................................................
\section{Scheduler}
%        ---------
\label{sec:Scheduler}

As mentioned earlier, the present way of running the hybrid system is started by the user submitting 
the usual  {\tt g5das.j} script to the batch system, and letting it
submit the ensemble ADAS job script, {\tt atm\_ens.j}. Under the {\it scheduler}, the user submits instead
a job script called {\tt EXPID\_scheduler.j}, where {\tt EXPID} stands for the user experiment name. This job is responsible for controlling how the central and 
the ensemble ADAS executions are coordinated. This mode of running, is aimed at maximizing
efficiency and exploiting most of the parallelism existing between the central ADAS and 
the ensemble ADAS (see Fig. \ref{fig:GMAOhybSchematic}). Recall that these two systems are coupled to each 
other in two ways: (i) the central ADAS GSI analysis requires the ensemble of background states; and (ii) the
ensemble ADAS requires the central GSI analysis to re-center its analysis members about\footnote{Currently, another
link exists since we use the satellite bias estimates of the central analysis to start the same-synoptic-time
ensemble analysis. This can be relaxed by having the ensemble of observers use the previous-cycle
satellite bias estimates instead}. This means that,
as soon as the central GSI analysis is finished the ensemble ADAS can begin its task. In other words,
the ensemble ADAS can run concurrently with the 12-hour (or longer) initialized forecast of the 
central ADAS. Once these two components are running in parallel, the scheduler is responsible
for synchronizing the two systems. A new central analysis can only start once the ensemble ADAS has 
produced the necessary ensemble of background fields.
At the time of this writing only a preliminary version of the scheduler exists, therefore we say no 
more here until its completion and later revision of this manuscript.

In this mode of running, when the GSI hybrid and the ensemble (EnKF) analyses are concurrent, the satellite 
biases needed by the ensemble analysis must come from a previous cycle, thus being the same as used by GSI. 
In this case, the {\tt satbias.acq} file under {\tt FVHOME/run/atmens} must point to the directory defined
by the environment variable {\tt RSTSTAGE4AENS}. An example of such {\tt satbias.acq} is
\small{
\begin{verbatim}
/discover/nobackup/USER/EXPID/atmens/RST/EXPID.ana_satbias_rst.%y4%m2%d2_%h2z.txt =>
            EXPID.ana.satbias.%y4%m2%d2_%h2z.txt
/discover/nobackup/USER/EXPID/atmens/RST/EXPID.ana_satbang_rst.%y4%m2%d2_%h2z.txt => 
            EXPID.ana.satbang.%y4%m2%d2_%h2z.txt
\end{verbatim}
}
for the case when {\tt RSTSTAGE4AENS} is set to {\tt /discover/nobackup/USER/EXPID/atmens/RST}.

If the job needs to be re-started from the beginning, say for the cycle on 1776070415, simply remove the 
work directories {\tt fvwork\_EXPID\_1776070415}, {\tt enswork\_EXPID\_1776070415}, and resubmit the scheduler
with the refresh option, that is,
\begin{verbatim}
   $ATMENS_BATCHSUB -v refreshdate=1776070415 EXPID_scheduler.j
\end{verbatim}

Following the framework developed to control completion status of various procedures for the EnADAS, the 
scheduler has its own set of hidden variables. The following is a list of relevant variables used in
{\tt edas\_scheduler.csh}:
\begin{description} 
\item[.DONE\_MEM001\_rstcp.1776070415]: \\ 
   signals successful copy of relevant restart files from {\tt recycle} directory to directory
 define by {\tt RSTSTAGE4AENS}
\item[.DONE\_MEM001\_analyzer.1776070418]: \\
     signals successful completion of central (hybrid) analysis
\item[.DONE\_MEM001\_atm\_ens\_eana.1776070415]: \\
     signals successful completion of ensemble analysis (presently, EnKF)
\item[.DONE\_MEM001\_atm\_ens.1776070415]: \\
     signals successful completion of EnADAS
\item[.DONE\_MEM001\_ddas.1776070415]: \\ 
     signals successful completion of central ADAS
\end{description} 
By construction, all the hidden files appear under the {\tt FVHOME} directory.

An {\it important} note refers to stopping a cycling experiment. One, easy, way to momentarily interrupt a cycling experiment is to simply move the file holding the batch script to another, temporary name. For example, in a purely deterministic ADAS case, the user can prevent the cycling from continuing by simply renaming the script {\tt g5das.j} to something else, e.g., \tt g5das.j.hold}. This way when the ongoing job tries to re-submit itself it will not find the batch job and it will come to a halt. Similarly, in hybrid mode, the cycle can be interrupted by renaming the driving script, but in this case we must remember there are two scripts at play. The central ADAS, does not re-submit itself, {\tt g5das.j}, but instead, it submits the ensemble ADAS script {\tt atm\_ens.j}. And cycling amounts to {\tt atm\_ens.j} submitting {\tt g5das.j}. Therefore it is the script lined up to be next that must be renamed. If the central ADAS is the one running, then {\tt atm\_ens.j} must be renamed; if the ensemble is what is running, than 
{\tt g5das.j} must be renamed. This is all fine without the use of the scheduler. However, when the scheduler is used to control the batch job submission, the user must be aware to {\it never renamed the scheduler script}, {\tt EXPID\_scheduler.j}, for the purposes of interrupting the cycle. In this case, it is still the script {\tt g5das.j} and {\tt atm\_ens.j} that must be renamed. Still, even here, the {\it user must exercise caution}. Since {\tt atm\_ens.j} is called twice by the scheduler, this job script can only be renamed after the second call has been made, i.e., after the ensemble of AGCM forecasts has been launched.

%....................................................................
%\section{Event Log}
%        ---------
%\label{sec:EventLog}

%....................................................................
\section{General Sanity Checks Recommended to Users}
%        ------------------------------------------
\label{sec:Things}

Just as GSI, the log file of the EnKF echoes out a table of observation-minus-background (OMB)
and observation-minus-analysis (OMA) residuals. One expects that a reasonably well configured 
system will have the GSI tables and those from the EnKF looking rather comparable, particularly if the 
thinning data strategy between the central analysis and that of the ensemble analysis are kept the same.

The following is an example of the OMB ($J_o$-observation fit) table from the central hybrid GSI analysis:
% hy05f at 00 UTC on 1Apr2012 ana.log 
\begin{verbatim}
    Observation Type           Nobs                        Jo        Jo/n
surface pressure              58993    7.7543508538588003E+03       0.131
temperature                  100202    1.4151903304701959E+05       1.412
wind                         425416    4.1748592693636852E+05       0.981
moisture                      14011    8.4253060120348000E+03       0.601
ozone                         17149    3.1765657511636487E+04       1.852
gps                           54419    8.7688179856585470E+04       1.611
radiance                    2487638    4.0790375455413770E+05       0.164
                               Nobs                        Jo        Jo/n
           Jo Global        3157828    1.1025422087716414E+06       0.349
\end{verbatim}
and below is an example of the  table from the EnKF: 
% hy05f at 00 UTC on 1Apr2012 atm_ens.log 
\begin{verbatim}
    Observation Type           Nobs                        Jo        Jo/n
    surface pressure          58966    8.9479736328125000E+03       0.152
         temperature         100154    1.4063432812500000E+05       1.404
                wind         425398    4.2579856250000000E+05       1.001
            moisture          14004    7.3794936523437500E+03       0.527
               ozone          18875    3.1167117187500000E+04       1.651
                 gps          54451    8.5983710937500000E+04       1.579
            radiance        2492015    4.3177943750000000E+05       0.173
           Jo Global        3163863    1.1316906250000000E+06       0.358
\end{verbatim}
Both of these are for the analysis at 00 UTC, on 1 April 2012. The hybrid analysis uses
an ensemble that has already been span.  We see, for example, that the EnKF is 
not taking precipitation observations. This is done by construction, we choose not to 
take this data-type for now since the EnKF requires better a handle of this type of information.
For most data-types, the difference in observation count ranges from a few dozen to a few hundred at most. 
This is largely attributed to quality control decisions and the difference between using instantaneous
0.5-degree resolution backgrounds in the hybrid GSI analysis and 1-degree ensemble mean backgrounds 
in the EnKF analysis case.
The most noticeable difference in the two tables above comes from comparing the radiance 
observation counts.  We see the EnKF ends up taking 4377 radiance observations more than 
the hybrid analysis. This is attributed to the fact that in our implementation the 
satellite biases correction coefficients used by the EnKF come from the central hybrid analysis.
That is, the EnKF only executes after the central analysis has finished its work -- the central
analysis is needed for re-centering the EnKF analysis. Though we can use the satellite bias estimates
from the previous cycle, as the central hybrid GSI analysis does, we take advantage of the 
availability of the current estimates from the central ADAS when running the EnKF analysis.
Most importantly, when comparing the two tables above, is to see the similarity
between the observation fits scaled by the number of observations ($J_o/n$, last column). 
It is difficult to establish a rule of thumb for how these numbers should compare, other
than wanting for them to be close. That the EnKF scaled fits are so close to those
from the central analysis is rather remarkable given that the former observer works 
from an ensemble mean state which is not necessarily physical.

Similarly, the observation fits to the analysis from the central GSI look as below:
% hy05f at 00 UTC on 1Apr2012 ana.log 
\begin{verbatim}
    Observation Type           Nobs                        Jo        Jo/n
surface pressure              59046    4.9852106626854447E+03       0.084
temperature                  100203    7.7757153869908128E+04       0.776
wind                         429436    2.5802116043830608E+05       0.601
moisture                      14011    4.9607887271425416E+03       0.354
ozone                         17149    8.9490519057384627E+03       0.522
gps                           55560    5.0622628146044233E+04       0.911
radiance                    2569945    3.3927362893704593E+05       0.132
                               Nobs                        Jo        Jo/n
           Jo Global        3245350    7.4456962268687086E+05       0.229
\end{verbatim}
with the equivalent (a posterior fits) table from the EnKF looking as:
% hy05f at 00 UTC on 1Apr2012 atm_ens.log 
\begin{verbatim}
    Observation Type           Nobs                        Jo        Jo/n
    surface pressure          58966    6.0409160156250000E+03       0.102
         temperature         100154    9.5538234375000000E+04       0.954
                wind         425398    3.1510571875000000E+05       0.741
            moisture          14004    5.6834106445312500E+03       0.406
               ozone          18875    1.9777376953125000E+04       1.048
                 gps          54451    5.1486160156250000E+04       0.946
            radiance        2492015    3.8114400000000000E+05       0.153
           Jo Global        3163863    8.7477581250000000E+05       0.276
\end{verbatim}
Again, the two tables are very comparable. Indeed the reduction in $J_o/n$ for both analyses
is very similar, though the central hybrid analysis tends to fit the observations slightly more
closely. Notice also that, by construction, the prior and posterior observation counts for the 
EnKF analysis do not change (compare table above with two tables up), whereas the same does
not hold for the hybrid GSI analysis due to its multiple outer loops and the quality control
permission to allow more observations to come into play as the fits to the guess improve.

Another illustration, for the same synoptic time is given below where now
the global fits to  the background are illustrated for the central analysis (same as total 
shown in the first table above), the mean observer, and the components of a 32-member ensemble of observers. 
Comparing the count from the EnKF fits to the prior, we
see the EnKF takes in only slightly less observations than what the mean observer takes --
this is due to a consistency check within the EnKF software.

\vspace{.5in}
{\small
\begin{verbatim}
Central analysis Jo Global        3157828    1.1025422087716414E+06       0.349

obs_ensmean      Jo Global        3167079    1.1400739327238461E+06       0.360
obs_mem001       Jo Global        3123256    1.4146527504929921E+06       0.453
obs_mem002       Jo Global        3128285    1.3512821520852332E+06       0.432
obs_mem003       Jo Global        3131361    1.3420369419540870E+06       0.429
obs_mem004       Jo Global        3125948    1.3649793228603806E+06       0.437
obs_mem005       Jo Global        3121210    1.3938713409115009E+06       0.447
obs_mem006       Jo Global        3131823    1.3535254403115206E+06       0.432
obs_mem007       Jo Global        3125040    1.4356407879127199E+06       0.459
obs_mem008       Jo Global        3122812    1.3893385599306291E+06       0.445
obs_mem009       Jo Global        3132624    1.3556867770441249E+06       0.433
obs_mem010       Jo Global        3122710    1.3602509368970357E+06       0.436
obs_mem011       Jo Global        3126436    1.3569395372827167E+06       0.434
obs_mem012       Jo Global        3126090    1.3789860104918096E+06       0.441
obs_mem013       Jo Global        3132534    1.3629284646751210E+06       0.435
obs_mem014       Jo Global        3125473    1.4240120201202775E+06       0.456
obs_mem015       Jo Global        3124538    1.3686023277420180E+06       0.438
obs_mem016       Jo Global        3125313    1.3625097896574179E+06       0.436
obs_mem017       Jo Global        3121705    1.4513418843554165E+06       0.465
obs_mem018       Jo Global        3124688    1.3476942535456968E+06       0.431
obs_mem019       Jo Global        3129546    1.3462618226212750E+06       0.430
obs_mem020       Jo Global        3132273    1.3767084034834011E+06       0.440
obs_mem021       Jo Global        3132648    1.3697893025347698E+06       0.437
obs_mem022       Jo Global        3126265    1.3684473016587161E+06       0.438
obs_mem023       Jo Global        3126355    1.3960081508113015E+06       0.447
obs_mem024       Jo Global        3127770    1.3681285698646517E+06       0.437
obs_mem025       Jo Global        3122046    1.4015666052547472E+06       0.449
obs_mem026       Jo Global        3120662    1.4080196645814902E+06       0.451
obs_mem027       Jo Global        3124172    1.3628864136518587E+06       0.436
obs_mem028       Jo Global        3126496    1.3329366092685640E+06       0.426
obs_mem029       Jo Global        3132788    1.3599484934718781E+06       0.434
obs_mem030       Jo Global        3113985    1.4292567105894105E+06       0.459
obs_mem031       Jo Global        3117175    1.3533330934446647E+06       0.434
obs_mem032       Jo Global        3123969    1.3742827905094966E+06       0.440
\end{verbatim}
} % small
Though the observations taken in by the individual observers are 
as those taken in by the mean observer, there is still a level of check done by each
observers that tosses away observations not sufficiently close to their
respective set of backgrounds. As it turns out, individually, a given observer
allows for slightly less observations to be used than the mean; another way to say this, is that
each member background provides a slightly worse fit to the observations
than the ensemble mean does (compare also the $J_o/n$ columns). This is reasonable
since there is likely some noise in each set of member backgrounds. In fact, a theoretical argument for the linear 
case (and when the observation set is kept constant) can be made to better explain these finds. If we 
denote ${\bar J}_o$ to be the observer mean fit, it is easy to show that
\begin{equation}
   < J_{o;m} > = {\bar J}_o  + 
        \frac{1}{M} \sum_{m=1}^M \sum_{i=1}^p 
         ( {\bf x}_m - < {\bf x}_m > )^T {\bf H}_i^T {\bf R}_i^{-1} {\bf H}_i ( {\bf x}_m - < {\bf x}_m > ) \, ,
\label{eq:JoMean}
\end{equation}
where $p$ is the total number of observations, $M$ is the total number of ensemble members, $<\bullet>$ denotes the 
ensemble average operator and $J_{o;m}$ stands for the observation fit (cost) evaluated for the $m$-th member
of the ensemble. Noticing the second term on the right is positive definite, we should always expect the observation 
fit to the mean ensemble backgrounds to be smaller than average of the ensemble of observer fits to the observations. This is
precisely what we see in the comparison above. Neglecting the difference in the observation count, we see
the observation fit to the mean to be roughly $1.14E+06$ and the average of the observers observation fit
to be roughly $1.37E+06$. It is useful to notice that application of the trace operation to 
(\ref{eq:JoMean}) leads to the alternative expression
\begin{eqnarray}
<J_{o;m}> & = & {\bar J}_o + \frac{M-1}{M} Tr \left({\bf B}_e \sum_{i=1}^p {\bf H}_i^T {\bf R}_i^{-1} {\bf H}_i\right) 
  \nonumber \\
   &  \stackrel{M \rightarrow \infty}{\longrightarrow} & 
               {\bar J}_o +  Tr \left({\bf B}_e \sum_{i=1}^p {\bf H}_i^T {\bf R}_i^{-1} {\bf H}_i\right) 
\end{eqnarray}
which states that the averaged observers fit is always biased with respect to the observer fit to the mean, regardless
of the size of the ensemble. This remains so even when the ensemble background error covariance matrix ${\bf B}_e$ becomes
an accurate representation of the true background error covariance matrix. 

Another way to corroborate desirable behavior from the hybrid system is to examine 
the observation counts and fits as the assimilation progresses. A comparison of the 
hybrid analysis with a control (traditional 3DVar) experiment at initial time 
(00 UTC on 1 April 2012) shows:
% comp x0005 vs hy05f
\begin{verbatim}
Standard Analysis
           Jo Global        3157828    1.1025422087716414E+06       0.349
           Jo Global        3220086    7.6636349426764739E+05       0.238
           Jo Global        3242795    7.5913357050380018E+05       0.234
Hybrid Analysis:
           Jo Global        3157828    1.1025422087716414E+06       0.349
           Jo Global        3223267    7.5266204694107187E+05       0.234
           Jo Global        3245350    7.4456962268687086E+05       0.229
\end{verbatim}
where we see that initially both the standard and hybrid analyses begin from the same 
background (notice the same initial data count and observation fits). As the  minimization
progresses, the standard analysis ends up taking in less observations than the hybrid analysis; this
is an ideal behavior. At this initial stage it would be acceptable, though not ideal, for the
hybrid analysis to take in slightly less data. However, as time progresses this would not be
acceptable. Indeed, in the experiments  used for the present illustration,
a few cycles away from the initial cycle, at 00 UTC on 2 April 2012, the situation is rather
clear, as show below:
\begin{verbatim}
Standard Analysis
           Jo Global        3101111    1.1045415299101665E+06       0.356
           Jo Global        3161479    7.6260731657937751E+05       0.241
           Jo Global        3184423    7.5556159453427303E+05       0.237
Hybrid Analysis:
           Jo Global        3109514    1.0803129941124925E+06       0.347
           Jo Global        3170399    7.4634125953013694E+05       0.235
           Jo Global        3190465    7.3919985174608813E+05       0.232
\end{verbatim}
Before the start of the minimization, the hybrid analysis already comes in with more observations than 
the control (standard) cycle. More impressive is the fact that even with the increased number of observations, 
the global observation fits (second column of numbers) shows smaller numbers than for the 
control analysis. Ideally we want more observations, smaller fits, and smaller $J_o/n$; this
is exactly what we see happening with the hybrid analysis in the case illustrated above.

%....................................................................
\section{Auxiliary Programs}
%        ------------------
\label{sec:AuxProgs}

%....................................................................
\subsection{Additive Inflating Perturbations}
%           --------------------------------
\label{subsec:AddPerts}

Among the set of peripheral components necessary to maintain the ensemble of analyses
is the procedure allowing for generation of random perturbations used for additive inflation.
Presently, there are two ways in which GEOS can generate these fields:
\begin{description}
\item[GSI internally-generated perturbations.] It is possible to re-configure the 
      GSI-observer mean to tell it to write out as many randomly generated perturbations as members
      of the ensemble. This can be done by modifying the resource file {\tt obs1gsi\_mean.rc}.
      Perturbations generated in this way are normally distributed, zero-mean, with error
      covariance structures derived from the GSI climatological background error covariance matrix, 
      that is, ${\cal N}({\bf 0}, {\bf B}_s)$.
      We have experimented using these perturbations to inflate the ensemble and have found
      the ensemble to collapse rather quickly in this case. In other words, it seems these
      random perturbations are simply noise that the model integrations quickly dissipate.

\item[NMC-like perturbations.] Alternatively, we follow the idea of J. S. Whitaker and what is 
      implemented operationally at NCEP, and generate perturbations for additive inflation 
      by selecting randomly from a year-long database of NMC-like perturbations. At GMAO, the
      database has been constructed from GEOS 5.7, year-long, forecasts. These perturbations
      are identical to those used to tune the GSI climatological background error 
      covariance matrix presently used in our traditional 3DVar, that is, the perturbations are 
      formed of differences from the relevant 48- and 24-hour forecast fields 
      (zonal and meridional winds, virtual temperature, 
      specific humidity, surface pressure, and ozone). The location of the database is
      found (and specified) in the resource file {\tt nmcperts.rc}.
\end{description}
The present default setting of the ensemble ADAS randomly selects from the database perturbations, 
as many files as members.  The selection is season-aware, in the sense
that perturbations are chosen for the season associated with the experiment's current
analysis time.  The NMC-like
perturbations are rather skillful, as we have found when particularly testing with the
filter-free approach. At any given analysis time, the ensemble ADAS scripts will calculate 
the mean of the selected perturbations and create a new set of perturbations with mean 
removed. Recall that perturbations in the database are for a specific year, and likely not 
related to the time of the analysis cycle of any one experiment. In some instances, depending on the procedure
used to calculate and remove the perturbations mean, the new set of perturbations might
be written out to file at the resolution of the ensemble, rather than at their
original resolution of 0.25 degrees. The presence of the resource file 
{\tt mp\_stats\_pert.rc} in the directory defined by {\tt ATMENSETC} allows
for the perturbations to be converted to low resolution (see Sec \ref{subsec:ensstats}). 
 

%....................................................................
\subsection{Ensemble Re-centering and Inflation}
%           -----------------------------------
\label{subsec:Recenter}

Another fundamental component of the ensemble ADAS is the part handling
re-centering. This initial implementation of the ensemble ADAS only handles
the usual meteorological fields: winds, temperature, specific humidity,
surface pressure, and ozone. These fields are present in both the background
and analysis fields of each member of the ensemble. Those familiar with
GEOS ADAS will know that the files carrying these fields form the basis of 
the so-called {\it dyn-vector}. The re-centering program is based on the
module {\tt m\_dyn.f90} of the {\tt GMAO\_hermes} library. The is 
convenient since it allows for use of the various dyn-capabilities, such
as automatic interpolation and remapping due to topography changes.
As with all other dyn-based programs, the so-called {\tt dyn\_recenter.x}
is command-line driven and its specific usage is shown in Fig. \ref{fig:dynrecenter}.

\begin{figure}
{\small
\begin{center}
\begin{verbatim} 
      -------------------------------------
      dyn_recenter - recenter ensemble mean
      -------------------------------------
 
 
 Usage: 
 
   dyn_recenter.x [options] 
                  x_e(i) x_m x_a [-o output file]
 
 where [options]
 
 -h              Help (optional)
 -g5             Treats files as GEOS-5 files
 -damp           Apply damp to levels above 5mb
 -noremap        Force no-remap whatsoever
 -remap2central  Remap member and ensmean to central
                 (Default: remap central and ensmean to member
 -a       factor Multiplicative factor for inflating perturbations
 -inflate fname  filename containing inflating perturbations
 -verbose        echoes general information
 -o       fname  filename of resulting recentered fields
                 (CAUTION, default: overwrite x_e(i) file)
 
  Required inputs:
   x_e(i)  - filename of field to be recenter 
   x_m     - original mean
   x_a     - new mean around which member gets recentered
 
  Remarks: 
   1. This program is used in context of ensemble DAS to recenter 
      dyn-vector around desired mean. That is, assuming the ensemble
      mean is x_m, and a desired center mean is x_a, this program 
      reads multiple members x_e of an ensemble of dyn-vectors 
      and calculates: x_e(i) = x_e(i)) - x_m + x_a, for ensemble 
      member i. 
   2. There are a million ways to write a more efficient code
      for this - indeed one might need to do this using ESMF
      to better handle high-resolution fields.
 
\end{verbatim} 
\end{center}
} % small
\captionsetup{margin=10pt,font=small,labelfont=bf}
\caption{Command line for recentering program used to not only recenter ensemble analyses about central hybrid analysis, but also
used to apply additive inflation, vertical blending, and remapping. 
\label{fig:dynrecenter}}
\end{figure} 

In its most basic form, the program requires an input file containing fields from a given member, another containing
the ensemble mean, and a third file containing the analysis to recenter about, typically coming from the hybrid (central) ADAS. 
Without any other input, the re-centering program will overwrite the input ensemble member file. 
In the dual-resolution configuration of the hybrid, any given ensemble member and its corresponding ensemble mean are at lower
resolution than the hybrid analysis to center about. The re-centering program takes care of automatically interpolating the central
analysis to the resolution of the member. This must be done with care for the change in topography. By default, the program 
remaps the central high-resolution analysis to the topography of the member analysis. Flags exist to either turn remapping off, or
remap in the other direction.
 
This program is also responsible for handling the additive inflation procedure. With properly specified inputs, the program
{\tt dyn\_recenter.x} will also add a scaled perturbation field, properly interpolated (if necessary), to the member.
Furthermore, at present, the ensemble ADAS runs with vertical blending applied to its members, so that, between $20$ and $5$ hPa, any
member analysis is smoothly merged into the central analysis. That is, above $5$ hPa, the ensemble has no variance. This is
consistent with what we discussed in Sec. \ref{subsec:gsiRCconfig}, and our present choice of vertically-varying 
weights given to both the climatological and ensemble background error covariances
in the hybrid GSI, namely, with GSI relying solely on its climatological background errors in the high atmosphere.

%....................................................................
\subsection{Ensemble Mean and RMS}
%           ---------------------
\label{subsec:ensstats}

Use of an EnKF-based ensemble assimilation strategy requires, at the very least, calculation of the ensemble mean.
This is needed to form quantities such as ${\bf h}({\bf x}_m) - {\bf h}({\bar {\bf x}})$.
In practice, these differences come from the difference between the OMB formed by the individual
observer members with the OMB formed by the mean observer. This is why the ensemble ADAS schematic shown in Fig. \ref{fig:EADASflowchart} 
displays a box corresponding to the {\it observer mean}. The other place where the ensemble mean is required is during the 
re-centering step, when it is actually the mean of the ensemble of analyses that is required (see previous subsection).

In GEOS ensemble ADAS there are at least two ways to calculate the required ensemble mean. One uses the program
{\tt GFIO\_mean.x}, found in the {\tt bin} directory of a GEOS build. Another relies on the more recent 
program {\tt mp\_stats.x}, which has been introduced to address efficiency matters. 
The new program is MPI-ESMF-based.
However, the main motivation for having an alternative to {\tt GFIO\_mean.x} is to be more efficient when 
calculating ensemble RMS error and energy-based spread diagnostics.  With {\tt GFIO\_mean.x}, the amount of I/O 
required to calculate RMS error and energy-based spread is rather large; a double pass through the date is required 
to calculate RMS error: the mean must be calculated first, written out to disc, and then subtracted from each member 
during a second call to {\tt GFIO\_mean.x}. Another pass through the data is required 
when calculating the energy-based spread (using the program {\tt pertenergy.x}, and a final call to {\tt GFIO\_mean.x} needs 
to be made to average the energy-based measures.  This cascade of steps is avoided when using {\tt mp\_stats.x}. This program
calculates all the diagnostics, i.e., mean, RMS (or standard deviation) error, and spread in a single call and, 
using recursive updates, it needs to read through the members of the ensemble only once. 

Though the procedures based on {\tt GFIO\_mean.x} are still available in the scripts running the ensemble ADAS, the default is 
now to use {\tt mp\_stats.x}. The command-line showing options of usage for this new program appears in Fig. 
\ref{fig:CommandMPSTATS}. A closer look shows that {\tt mp\_stats.x} can also be used to calculate monthly averages 
and second moments beyond its ensemble ADAS applicability; averages can be calculated either regularly or recursively at will of
the user.
\begin{figure}
{\tiny
\begin{verbatim} 
Usage: mp_stat.x [options] files
 
options:
 
-o     FILE       specify output filename
-alpha NUMBER     specify multiplicative coeff to scale mean 
                    before adding result to each file read in
                    (see -tmpl)
-date  NYMD NHMS  date/time of output file(s)
                    (when absent use date of last file read)
-ene   FILE       spefify output file containing energy-based
                    measure (NOTE: this triggers the calculation)
-etmpl ENEFTMPL   template for individual energy estimates for each member
                    (e.g., -etmpl myenergy.%y4%m2%d2_%h2z)     
-nonrecene        de-activate recursive calculation of energy measure
                     (NOTE: this will sweep through the data twice)
-rms              calculate rms
-stdv  FILE       provide filename of output stdv
-umean FILE       provide available estimate of mean
                    (only used in non-recursive calcuation
                     of standard deviations)
-tmpl  FNAMETMPL  specify filename template of output files
                    NOTE: do not provide filename extension
                          (nc4 will be appended to name)
                    (e.g., -tmpl myfiles.%y4%m2%d2_%h2z)     
-vars  LIST       where LIST is a list of variable separate
 

Example usage:
 
 1. Obtain mean:
    mp_stats.x -o mean.nc4 mem0*/hy05a.bkg.eta.20120410_00z.nc4
 
    1a. calculating monthly means (i.e., files from diff times)
        can be done by specifying date of output file, e.g.,   
    mp_stats.x -o apr_mean.nc4 -date 20120401 0 mem0*/hy05a.bkg.eta.201204*z.nc4
 
 2. Recursively obtain rms subtracting user-specified mean from original fields:
    mp_stats.x -o rms.nc4 -rms mem0*/hy05a.bkg.eta.20120410_00z.nc4
 
 3. Recursively obtain stdv subtracting user-specified mean from original fields:
    mp_stats.x -o mean.nc4 -stdv stdv.nc4 mem0*/hy05a.bkg.eta.20120410_00z.nc4
 
 4. non-recursive stdv calc can be triggered by:
    mp_stats.x -o stdv.nc4 -usrmean mean.nc4 -rms -stdv NONE mem0*/hy05a.bkg.eta.20120410_00z.nc4
    in this case, the file mean.nc4 is an input such as that obtained w/ (1)
 
 5. non-recursive calc of energy-based error:
    mp_stats.x -usrmean mean.nc4 -ene ene.nc4 mem0*/hy05a.bkg.eta.20120410_00z.nc4
    in this case, the file mean.nc4 is an input such as that obtained w/ (1)
 
 6. removing mean from samples and writing out anomalies:
    mp_stats.x -tmpl anomaly.%y4%m2%d2_%h2z -alpha -1.0 -date 19990101 0 hy05a.ana.eta.*
 
 7. calculate energy-measure wrt to mean and write out individual member energy-measure estimates:
    mp_stats.x -nonrecene -ene mean_ene.nc4 -etmpl energy.%y4%m2%d2_%h2z mem0*/hy05a.bkg.eta.20120410_00z.nc4
       where: mean_ene.nc4   is ouput containing mean energy
           templated-files are ouput files containing each member's energy
 
 8. calculate energy-measure wrt central analysis and write out individual member energy-measure estimates:
    mp_stats.x -nonrecene -etmpl energy.%y4%m2%d2_%h2z -usrmean central_ana.nc4 -date 20120410 0 mem0*/hy05a.bkg.eta.20120410_00z.nc4
       where: central_ana.nc4   is input central field
           templated-files     are ouput files containing each member's energy wrt to central
 
 9. removing mean of energy fields:
    mp_stats.x -alpha -1.0 -tmpl energy_anomaly.%y4%m2%d2_%h2z -usrmean mean_energy.nc4 energy.20120410_00z.*.nc4
 
\end{verbatim} 
} % small
\captionsetup{margin=10pt,font=small,labelfont=bf}
\caption{Command line for {\tt mp\_stats.x} program. In its most basic use, this program performs similar calculations
as those done by {\tt GFIO\_mean.x}, with the main difference being that {\tt mp\_stats.x} is MPI-ESMF-based. Another
fundamental difference being that {\tt mp\_stats.x} allows single-pass through the data when calculating RMS error
and other diagnostics. Furthermore, this program is oriented towards the ensemble ADAS and its diagnostic requirements.}
\label{fig:CommandMPSTATS}
\end{figure}

Indeed, in Sec. \ref{subsec:ensENSconfig}, we have come across the resource files {\tt mp\_stats.rc} and {\tt mp\_stats\_perts.rc} 
associated with running this program in different circumstances. The contents of these resource files control MPI distribution
and resolution options. The two files just mentioned differ essentially in the way resolution is treated. The first one, 
related to calculation of required ensemble statistics, has resolution of its inputs and outputs equally set to the resolution of
the ensemble members; the second one, used to de-bias the NMC-like perturbations, has its input set to the 0.5-resolution of 
the perturbations, and its output set to the 1-degree, default, resolution of the members.

%....................................................................
\subsection{Energy-based Ensemble Spread}
%           ----------------------------
\label{subsec:ensspread}

As just mentioned above, available automatic diagnostics being produced from of the ensemble of analyses and backgrounds correspond to
measures of the ensemble spread. These provide guidance for the reliability of the ensemble. Both root-mean-square
error (with respect to the mean of the ensemble) and an energy-based RMS error measure are available as diagnostics. 
As just seen in the previous subsection, the default program used to  calculate these diagnostics is {\tt mp\_stats.x}.
The following defines the energy-based ensemble spread as calculated within this program (similar to {\tt pertenergy.x}:
\begin{equation}
   e = \sum_{m=1}^M {\bf e_m}^T {\bf E} {\bf e}_m 
\end{equation}
where the error vectors ${\bf e}_m = {\bf x}_m - {\bar{\bf x}}$, for each member $m$, and the matrix ${\bf E}$ 
is taken as a linearized form of the total energy operator. That is, an energy-based deviation of each ensemble 
member from the mean can be evaluated using either of the following expressions (see Lewis et al. 2001; Errico et al. 2007): 
\begin{eqnarray}
\label{eqTotalEnergy}
 e_{t}  & \equiv & {\bf e}_m^T {\bf T}_{t} {\bf e}_m  = \tfrac{1}{2} \sum_{i,j,k} \Delta H_{i,j} \Delta \sigma_{i,j,k}  
                                                    \left[ u^{\prime}_1 u^{\prime}_2 +
                                                           v^{\prime}_1 v^{\prime}_2 +
                                             \frac{c_p}{T_r} T^{\prime}_1 T^{\prime}_2 +
                                             \frac{R T_r}{p^2_r} p^{\prime}_{s1} p^{\prime}_{s2} 
                                            \right]_{i,j,k} \, ,  \label{eqTotalEEnergy}
                                             \\
 e_{v}  & \equiv & {\bf e}_m^T {\bf T}_{v} {\bf e}_m  = \tfrac{1}{2} \sum_{i,j,k} \Delta H_{i,j} \Delta z_{i,j,k}  
                                                    \left[ u^{\prime}_1 u^{\prime}_2 +
                                                           v^{\prime}_1 v^{\prime}_2 +
                                             \frac{c_p}{T_r} T^{\prime}_1 T^{\prime}_2 +
                                             \frac{R T_r}{p^2_r} p^{\prime}_{s1} p^{\prime}_{s2} 
                                            \right]_{i,j,k} \, ,     \label{eqTotalVEnergy}
\end{eqnarray}
where $\Delta H_{i,j}$ is a horizontal grid-box weight and the distinction between the two norms is in
how they weigh the fields in the vertical, with $\Delta \sigma_{i,j,k}$ and $\Delta z_{i,jk}$ being fractional
weights, respectively, defined as:
\begin{eqnarray}
\label{eqVertWeights}
 \Delta \sigma_{i,j,k} & = & \frac{\Delta p_{i,j,k}} {p_{s\, , i,j} - p_t} \label{eqEweight} \,  , \\
 \Delta z_{i,j,k}  & = &  \frac{\Delta \ln p_{i,j,k}} {\ln p_{s\, , i,j} - \ln p_t} \label{eqVweight} \, .
\end{eqnarray}
The physical scaling coefficients $c_p= 1004.6$ J kg$^{-1}$ K$^{-1}$, $R = 287.04$ J kg$^{-1}$ K
$^{-1}$, $T_r = 280$ K,  and $p_{r} = 1000$ hPa, are the specific heat at constant pressure, the gas
constant of dry air, and a reference temperature and pressure.

\begin{figure}[ht]
\begin{center}
\includegraphics[scale=0.3]{Figs/eneweights.pdf}
\end{center}
\captionsetup{margin=10pt,font=small,labelfont=bf}
\caption{The fractional vertical weights $\Delta \sigma$ (thin curve) and $\Delta z$ (thick curve) used for
calculating the ET- and EV-norms, respectively. The dotted vertical line indicates the model levels. All
calculated at a point where $p_s = 1000$ hPa; the model top pressure is $0.01$ hPa. (Similar to Fig. 1
of Errico et al. 2007).
 \label{figNormGridWeight}}
\end{figure}



%....................................................................
\subsection{Job Generation Script}
%           ---------------------
\label{subsec:jobgen}

Inside the machinery of the ensemble there is a procedure called {\tt jobgen.pl} that is used to 
generate job scripts to be launched within the ensemble ADAS cycle. These correspond to the various
jobs submitted to the batch system while the main {\tt atm\_ens.j} driver executes. The 
command-line usage for {\tt jobgen.pl} is shown in Fig.  \ref{fig:JobGen}.
Normally, users should not have to be concerned with this procedure. It should also be noted that
a more general and flexible version of {\tt jobgen.pl} is planned for and will eventually replace
the one in this initial release.
\begin{figure}
{\small
\begin{verbatim}
NAME
     jobgen - Generate PBS job script
          
SYNOPSIS

     jobgen [...options...] jobname
                            gid
                            pbs_wallclk
                            command
                            gotodir
                            whocalled
                            file2touch
                            failedmsg
          
DESCRIPTION


     The following parameters are required 

     jobname      name of job script to be created (will be appended with j extension)
     gid          group ID job will run under
     pbs_wallclk  wall clock time for job
     command      e.g., mpirun -np $ENSGSI_NCPUS GSIsa.x
     gotodir      location to cd to (where all input files reside)
     whocalled   name of calling script (e.g., obsvr_ensemble)
     file2touch  name of file to be touched indicating a successful execution
     failedmsg   message to be issued in case of failed execution, between quote marks

OPTIONS

     -egress       specify file to watch for completion of job (e.g., EGRESS for AGCM)
     -expid        experiment name
     -q            specify pbs queue (e.g., datamove when archiving)
     -h            prints this usage notice

NECESSARY ENVIRONMENT

  JOBGEN_NCPUS           number of CPUS
  JOBGEN_NCPUS_PER_NODE  number of CPUS per node

OPTIONAL ENVIRONMENT

  FVROOT          location of build's bin
  ARCH            machine architecture, such as, Linux, AIX, etc
  FVHOME          location of alternative binaries

\end{verbatim}
} % small
\captionsetup{margin=10pt,font=small,labelfont=bf}
\caption{Job generation script command-line usage.\label{fig:JobGen}}
\end{figure}

%....................................................................
\subsection{Job Monitor Script}
%           ------------------
\label{subsec:jobmonitor}

Another script of very significant importance in aiding the ensemble ADAS is
named {\tt jobmonitor.csh} (see command-line usage in Appendix). 
As its name implies, this script has the function
of monitoring each and every parallel process launched within the ensemble ADAS, so
proper synchronization can take place. This is illustrated in Fig. \ref{fig:EADASflowchart}, 
which shows four synchronization moments (boxes) during the ensemble ADAS cycle. The first, 
makes sure the ensemble analysis (EnKF) does not start until all observers have completed
their task. The second, makes sure the NMC-like perturbations are available before 
they are needed by the re-centering procedure. The third, halts the start of the ensemble of 
forecasts until all IAU-forcing terms are available. And lastly, the fourth makes
sure post-processing of the ensemble of forecasts is only 
started after all forecasts have indeed completed successfully.

As discussed briefly earlier, the one important environment variable controlling the
monitoring capability is {\tt JOBMONITOR\_MAXSLEEP\_MIN}. This allows the user to specify
the maximum amount of time (in minutes) that any particular set of parallel procedure 
can take to complete. For example, setting it to $60$ tells the monitoring script 
that all observers are expected to complete within one hour; similarly, this is also the
maximum time allowed for all forecasts to complete.
This is a tunable parameter, and unfortunately, rather dependent
on the ease of accessibility to the batch queue. It is possible at times to have, say, the
forecasts sit in the batch queue, waiting to execute, for longer than the time allowed by
{\tt JOBMONITOR\_MAXSLEEP\_MIN}. Therefore, even though the main ensemble ADAS script, {\tt amt\_ens.j},
may still be running and have time to continue to run, it will quit since the processes 
being monitored will have run out of time. The extreme choice for {\tt JOBMONITOR\_MAXSLEEP\_MIN} is
set when making it as long as the time allowed for {\tt amt\_ens.j} to run. However, this
is not a very good choice since the job may wait in vain, especially, if something legitimate
fails during, say, an observer or a forecast execution. Users should experiment with this environment
variable and change it according to the NCCS machines status. 

%In a typical application, with the ensemble running at roughly $1^o$-resolution, the most costly part to ensemble 
%ADAS run relates to the multiple 12-hour AGCM integrations.  Each one of these takes no more than ten minutes to complete, 
%and in an ideal world where all member AGCM integrations are done 
%concurrently, ten would be a reasonable value for the environment variable {\tt JOBMONITOR\_MAXSLEEP\_MIN}. In reality, 
%not all AGCM integrations run simultaneously. At times, depending on the configuration of the AGCM integrations (see below), 
%the jobs related to the ensemble of AGCM's might wait in the batch queue for quite sometime before being executed. In this case,
%the control job, and its monitoring counterpart, maybe counting time while nothing is actually happening. This is when
%the need for setting a value for {\tt JOBMONITOR\_MAXSLEEP\_MIN} that is considerably higher than expected comes into play.


%....................................................................
\section{Additional Features}
\label{sec:AddFeatures}

%....................................................................
\subsection{Replaying the Hybrid ADAS}
%           -------------------------
\label{subsec:ReplayADAS}

There are a number of reasons to have a replay capability in place. The simplest one is to have a safety net: in case
analysis output files are lost or corrupted we need to be able to re-generate them by re-running the analysis for the
particular cycle in question. Another, more practical reason, is the fact that not all tests and experimentations
with GEOS ADAS should require a full re-generation of the ensemble. That is to say, we can see multiple instances
when tests and experiments with the system have nothing to do with the ensemble and are expected to change results only
mildly. These cases can rely on an already-existing ensemble of backgrounds, such as those created from an
operational run, and save the user from the burden of having to run the entire ensemble-variational ADAS.
Indeed, this is the recommended mode for most developers to experiment with; only when they are satisfied with their
tests in non-hybrid mode do we suggest them to run a complete experiment.

Follow the steps below to run hybrid (central) ADAS experiments that simply rely on an already existing ensemble of backgrounds:
\begin{description}
   \item[Existing resource files.]  
         The first thing to set properly are the resource files controlling the forward and adjoint GSI runs of your
         experiment, namely, {\tt gsi.rc.tmpl} and {\tt gsi\_sens.rc.tmpl}, respectively. These files should be set to 
         run in hybrid mode. You should be careful to set up proper namelist parameters to the ensure correct resolution of 
         the ensemble, proper balance constraints, and other related parameters.  Refer back to Sec. \ref{subsec:gsiRCconfig}, 
         for the forward hybrid GSI settings, and read on to see how to set up the hybrid adjoint GSI (Sec. \ref{subsec:gsiRCconfig}).

   \item[Additional resource file.]  
         You must tell the scripts where to grab the existing ensemble of backgrounds from. For that, an acquire resource 
         file name {\tt atmens\_replay.acq} must be placed in the {\tt \$FVHOME/run} directory of the experiment. 
         All this file needs to have is a single line informing the analysis sensitivity scripts about the location of 
         the tar-ball containing the ensemble of backgrounds. For example,
         the line below shows a typical content in this resource file: \\
         {\small
           /archive/u/USER/EXP/atmens/Y\%y4/M\%m2/EXP.atmens\_ebkg.\%y4\%m2\%d2\_\%h2z.tar
         } \\
         As usual, if the experiment name of your run is not the same as that of the run holding the ensemble, the 
         naming can be redirected in the resource file to match your experiment name. For example, when the 
         experiment OEXP comes from user OPS, and your (USER) experiment is named EXP, the aquire resource 
         file should look like:
         {\small
           /archive/u/OPS/OEXP/atmens/Y\%y4/M\%m2/OEXP.atmens\_ebkg.\%y4\%m2\%d2\_\%h2z.tar\\ 
                 $=>$ EXP.atmens\_ebkg.\%y4\%m2\%d2\_\%h2z.tar
         } \\
         where the line above was broken up for ease of reading -- it must be a single line in the resource file.
         The tar-ball will be unfolded by the internal mechanisms of the analysis driver.

  \item[Additional environment variable.] 
        Recall that when the full ensemble-variational system is set to run coupled,
        the environment variable {\tt HYBRIDGSI} is used in the main hybrid ADAS job script, {\tt g5das.j}, to
        tell the scripts where to find the members of the ensemble. In that case, this variable was set to 
        {\tt \$FVHOME/atmens}. Now, in case of replaying, this variable should simply be set to {\tt \$FVWORK/atmens}. 
        This tells the mechanisms of the analysis script to unfold the ensemble tar-ball brought into the {\tt \$FVWORK} directory 
        (above) in a subdirectory of itself named {\tt atmens}. At the end of the PBS job, the ensemble members go away
        together with everything else in {\tt \$FVWORK}.

\end{description}

%....................................................................
\subsection{State-space Observation Impact}
%           ------------------------------
\label{subsec:StSpaceObsImp}

Observation impact on the forecast can be calculated using the state-space approach of Langland and Baker (2004).
This requires the availability of a model adjoint and  an adjoint of the analysis system. As briefly described in 
Sec. \ref{sec:HEnDA}, GMAO has two versions of atmospheric model adjoint codes available: one for its lat-lon hydrodynamics,
and another for its cubed-sphere core.  GMAO also has an adjoint of GSI (Tr\'emolet 2007; Tr\'emolet 2008). 
Routine calculation of observation impact on the 24-hour forecasts, using the lat-lon model adjoint and the adjoint of
GSI in its traditional 3DVar mode, is part of our operational suite. In principle, it should be rather simple to calculate 
observation impact when using the {\it hybrid} GSI. After all, it only involves the ability to reconstruct the hybrid background error 
covariance matrix during the adjoint minimization.  Assuming the ensemble members from the forward run have been saved, 
this should be a straightforward operation.

Unfortunately, this is not quite as simple. The issue relates to an implementation detail. There are currently multiple
options for conjugate-gradient (CG) minimization strategies in GSI. The adjoint GSI is only coded for the options
within the so-called square-root-${\bf B}$ preconditioning minimization strategies (i.e., standard CG, Lanczos-based CG, 
and Quasi-Newton).  On the other hand, the hybrid capability is only implemented for the so-called 
${\bf B}$-preconditioning minimization strategies (i.e., double-CG and bi-CG). 
Essentially, the GSI adjoint requires availability of a square-root operator decomposition of
each term specifying its full (climatological plus ensemble) background error covariance matrix. 
A square-root decomposition operator is available in the code when the background
error covariance matrix is purely climatological, (${\bf B} = {\bf B}_s$ only), but not when this matrix is hybridized with 
an ensemble component, ${\bf B}_e$. 
Fortunately, it is possible to approximate the adjoint when using the bi-CG minimization of 
El Akkraoui et al. (2013)\footnote{The same should be feasible to implement for the default forward double-CG 
option of Derber and Rosati (1989); but this is not an option yet.}. This case only requires access to the 
non-decomposed background error covariance (climatological plus ensemble) operators. Since it is not yet clear to the authors
how to handle sequential backward updates of the gradient in a multi-outer-loop bi-CG minimization, an approximation is 
made that consists in using only a single outer loop to run 
the ajoint bi-CG hybrid GSI. We should point out that approximating the adjoint GSI is not 
really a big issue. GMAO has been using an approximate adjoint in its operational observation impact suite for quite 
sometime: the forward GSI uses the non-linear double-CG minimization with two outer-loops and a total of 
250 iterations (100+150), whereas the backward GSI uses the linearized square-root-${\bf B}$ preconditioned standard CG minimization 
with two outer-loops and a total of 200 iterations (100+100).
  
The way to trigger the adjoint hybrid GSI involves the following settings:
\begin{description}
 \item[Existing resource file.] In order for the adjoint GSI to run in hybrid mode it is necessary to
       make sure the namelist {\tt HYBRID\_ENSEMBLE} in the resource file {\tt gsi\_sens.rc.tmpl} is set
       in the exact same way as in the resource file controlling the forward hybrid GSI, namely {\tt gsi.rc.tmpl}.  
       Furthermore, the minimization strategy for the adjoint must be set to the bi-CG, which amounts to replacing the
       entry {\tt lsqrtb=.true.} with {\tt lbicg=.true.} in the file {\tt gsi\_sens.rc.tmpl}. 

 \item[Additional resource file.] 
       Similarly to when replaying the hybrid ADAS from an existing ensemble, an acquire resource file is 
       required to tell the scripts where to grab the ensemble of background from. In this case, the acquire resource 
       file is named {\tt atmens\_asens.acq}, and must be placed in the {\tt \$FVHOME/run} directory of the experiment. 
       The contents of this file are set in complete analogy to how the replay sets its file (see Sec. \ref{subsec:ReplayADAS}).
       
 \item[Additional environment variables.]  
       The approximation of forcing only a single outer loop minimization to take place when
       running the adjoint GSI is controlled by setting the environment variable {\tt USRMITER} in the
       adjoint sensitivity job script {\tt g5asens.j}. 
       Furthermore, in analogy to when replaying the hybrid ADAS, an environment variable name
       {\tt HYBRIDGSI} must be set in this same job script. As before, this variable should point to the
       working area (where the tar-ball with backgrounds will be brought into and unfolded). That is,
       the {\tt g5asens.j} script should have the following extra entries:
       \begin{verbatim} 
          setenv HYBRIDGSI $FVWORK/atmens 
          setenv USRMITER 1
       \end{verbatim} 

\end{description}
The ability to run the hybrid adjoint GSI analysis assumes a forward hybrid ADAS experiment has been run and has saved
the ensemble of backgrounds -- by now we know this is accomplished by having the collection ``ebkg'' specified as part 
of the definition of the environment variable {\tt ENSARCH\_FIELDS} while running the forward ADAS. 

%....................................................................
\subsection{Reproducing the Ensemble ADAS}
%           -----------------------------
\label{subsec:ReproducingEADAS}

It is easy to foresee situations when users will need to reproduce a cycle of the ensemble ADAS. The reasons are 
analogous to sometimes needing to reproduce (hybrid) central ADAS cycles: lost files; corrupted files; missed output; and
others. In the ensemble case, this can only be done when the original experiment has saved its minimal set of collections 
to allow for reproducibility.  As briefly mentioned before, the minimal set of output collections that allow for reproducibility 
is the following:
\begin{description}
   \item[rndperts.dates] -- file containing dates of NMC-like perturbations taken from the database while the original 
                            cycle had run.  These files are automatically (by default) archived during
                            the experiment (or are found under {\tt \$FVHOME/atmens} before making to the archive.
   \item[ebkg] -- collection holding the ensemble of backgrounds.
   \item[erst] -- collection holding the ensemble of AGCM restarts.
   \item[stat] -- collection holding the ensemble statistics, of which only the ensemble mean is required for reproducibility 
                  purposes.
\end{description}
For example, to reproduce the ensemble ADAS analysis for 00 UTC on 28 December 2012, the user must have the following files
available:
\begin{verbatim}
 yourexp.rndperts.dates.20121228_00z.txt
 yourexp.atmens_ebkg.20121227_21z.tar
 yourexp.atmens_erst.20121227_21z.tar
 yourexp.atmens_stat.20121227_21z.tar
\end{verbatim}
where {\tt yourexp} represents the user experiment name, for the sake of argument.
Assuming all defaults are being used, the contents of the tar-balls should be placed and organized inside the directory
{\tt \$FVHOME/atmens}. The file type ``rndperts.dates'' should be placed in the top directory {\tt \$FVHOME/atmens}, all members 
from the collections ``ebkg'' and ``erst'' should be placed in subdirectories of this directory, with names identical to 
those in the tar-balls (``mem001'', ``mem002'', etc), and finally, the ``ensmean'' subdirectory should be extracted from the 
collection ``stat'' and placed as subdirectory of {\tt \$FVHOME/atmens}. 

Another thing is to create the directory defined through the environment variable {\tt RSTSTAGE4AENS}, usually {\tt \$FVHOME/atmens/RST},
and place a copy of the file {\tt yourexp.rst.lcv.20121227\_21z.bin} inside of that. This tells the main ensemble ADAS script when the
integration of the ensemble begins (remember, this file holds the valid time-stamp of the AGCM restarts). 

The last thing to do is, obviously, to submit the driving script {\tt atm\_ens.j} to the batch system.

%....................................................................
\subsection{Observation-space Observation Impact}
%           ------------------------------------
\label{subsec:ObSpaceObsImp}

In addition to observation impacts calculated with the state-space approach of Langland and Baker (2004), as briefly 
discussed in Sec. \ref{subsec:StSpaceObsImp}, observation impacts can also be calculated directly in observations space following 
the approach of Todling (2013). In particular, observation {\it impact on the analysis} can be calculated on the fly, 
within the ensemble ADAS cycle by the presence of the following resource file:
\begin{description}
  \item[GSI\_GridComp\_ensfinal.rc.tmpl.] To create this file, simply copy the file {\tt GSI\_GridComp.rc.tmpl} into 
        new name, placing it under the {\tt \$ATMENSETC} directory. Edit the new file and replace
        the template name {\tt \%s.bkg.eta.\%y4\%m2\%d2\_\%h2z.>>>NCSUFFIX<<<} with \\
        {\tt \%s.ana.eta.\%y4\%m2\%d2\_\%h2z.>>>NCSUFFIX<<<}.
\end{description}
For now, this capability can only be exercised by strategies relying on the observer, such as the EnKF. The presence
of the resource file above, together with the file {\tt obs1gsi\_member.rc}, triggers an extra call to the observer, controlled by
the script {\tt obsvr\_ensfinal.csh}. This is illustrated in Fig. \ref{fig:EADASflowchart} by the double-dashed, marbled, box called right
after the ensemble analysis controlling script. This extra observer call operates on the mean analysis. A point to note relates to the 
different frequency of backgrounds used in the observer, which is set to 3 hours and the frequency of analyses, which is set to 6 hours.
That is, the observers follow a first-guess at appropriate time (FGAT; e.g., see Massart et al. 2010, and references therein) 
strategy, while the EnKF analysis is valid only at the synoptic hours, i.e., the EnKF is a filter which solution is valid only at 
a given time. 
To get updates for the backgrounds over the two times around the central time, the {\tt obsvr\_ensfinal.csh} script proceeds according
to the  3DVar formulation; since the increment does not evolve within the assimilation time window, updates can be obtained by 
simply adding the synoptic-hour increment to the two backgrounds bracketing the synoptic time. Once the update of the off-synoptic time
background fields is complete, the observer can be called to produce the so-called observation-minus-analysis (OMA) residuals. 
Only the mean backgrounds are updated this way, thus producing ensemble mean OMA only. Observation impacts on the mean analysis can 
be calculated as in
\begin{equation}
   \delta e = [ {\bf y} - {\bf h}({\bar {\bf x}}^a) ]^T {\bf R}^{-1} [ {\bf y} - {\bf h}({\bar {\bf x}}^a) ]
            - [ {\bf y} - {\bf h}({\bar {\bf x}}^b) ]^T {\bf R}^{-1} [ {\bf y} - {\bf h}({\bar {\bf x}}^b) ] 
\end{equation}
which can be broken up into various individual observation types. The ultimate impacts calculation is done in the program
{\tt odsstats}, when called with proper arguments. The results are placed into the Observation Data Stream (ODS; da Silva and Redder 
1995) format.  To have these files stored to the archive, the collection ``eoi0'' should  be added to the environment variable
{\tt ENSARCH\_FIELDS}. We should mention that calculating the degrees-of-freedom for signal diagnostic of Lupu et al. (2011; and
references therein) is also possible, requiring only a minor script change, since {\tt odsstats} is already capable of doing this 
calculation.

A preliminary implementation of generating observation {\it impact on the mean ensemble mid-range forecasts} is presently being 
worked into the machinery of ensemble ADAS. More on this will appear in future releases of the software and of this document.

%....................................................................
\subsection{Experimenting with the Ensemble-Only ADAS}
%           -----------------------------------------
\label{subsec:PureEnsADAS}

Some of us are bound to want to experiment with the GEOS ensemble-only ADAS capability. 
Referring back to Fig. \ref{fig:EADASflowchart}, we see that
one of the very last things done in the flowchart is submit the central hybrid ADAS job script {\tt g5das.j}. Clearly,
it does not have to be this way. And indeed, it is just as simple for the driving job script of the ensemble ADAS,
{\tt atm\_ens.j}, to submit itself. This can be done by examining the definition of two environment variables
set at the top part of {\tt atm\_ens.j}, namely, the variables {\tt ENSONLY\_BEG} and {\tt ENSONLY\_END}. By default, 
the script sets them as follows:
\begin{verbatim}
  # To trigger ensemble-only set dates to anything but 0 (as yyyymmddhh)
  #setenv ENSONLY_BEG 2011111221
  #setenv ENSONLY_END 2011120321
  setenv ENSONLY_BEG 0
  setenv ENSONLY_END 0
\end{verbatim}
As the comment says, replacing the zero-settings with actual begin and end dates tells the script not to invoke the
hybrid ADAS {\tt g5das.j} but instead, to submit itself, at the end of each cycle. The very first time, before
cycling begins, the user will have to place the ``rst.lcv'' restart corresponding to the initial date and time of the cycle, 
in the directory defined by {\tt RSTSTAGE4AENS}, usually {\tt \$FVHOME/atmens/RST}. That is, if the dates above are
used as begin and end dates of the ensemble-only cycle, then the file {\tt expid.rst.lcv.20111112\_21z.bin} must
be in this directory (``expid'' being the user's experiment name).

Now, before starting the ensemble-only ADAS cycle one more factor needs to be considered. We have seen in Fig.
\ref{fig:GMAOhybSchematic} that two pieces couple the ensemble and hybrid ADAS schemes: the ensemble of backgrounds
that feed into the hybrid; and the central analysis that feeds into the ensemble.
When running in ensemble-only mode, the non-zero setting of variables {\tt ENSONLY\_BEG} and {\tt ENSONLY\_BEG} automatically
eliminates the former coupling.  The latter coupling, however, must be considered carefully. 
If nothing else is done, the default settings of the ensemble ADAS scripts will look for the central
analysis, and its corresponding satellite bias correction coefficient files, under {\tt \$FVHOME/atmens/central}.
Since the central ADAS is turned off, the files will be missing and everything will come to a halt.
The solution for this is to consider a part of the main ADAS, {\tt atm\_ens.j} script that was bypassed in Sec. \ref{sec:Design}
when we provided a step-by-step description for what takes place in this driver. This is the part that reads as below.

\begin{center}
\small{
\begin{verbatim}
  if ( -e $ATMENSETC/central_ana.rc ) then
    if ( $DO_ATM_ENS ) then
     if ( ! -e $FVWORK/.DONE_MEM001_GETCENTRAL.$yyyymmddhh) then
      if(! -d $STAGE4HYBGSI ) mkdir -p $STAGE4HYBGSI
      set spool = "-s $FVWORK/spool"
      jobgen.pl \
             -q datamove \
             getcentral          \
             $GID                \
             $OBSVR_WALLCLOCK    \
             "acquire -v -strict -rc $ATMENSETC/central_ana.rc
                      -d $STAGE4HYBGSI $spool -ssh $anymd $anhms 060000 1" \
             $STAGE4HYBGSI       \
             $myname             \
             $FVWORK/.DONE_MEM001_GETCENTRAL.$yyyymmddhh \
             "Main job script Failed for Get Central Analysis"

             if ( -e getcentral.j ) then
                if ( $ATMENS_BATCHSUB == "sbatch ) then
                   $ATMENS_BATCHSUB  -W getcentral.j
                else
                   $ATMENS_BATCHSUB  -W block=true getcentral.j
                endif
                touch .SUBMITTED
             else
                echo " $myname: Failed for Get Central Analysis, Aborting ... "
                touch $FVWORK/.FAILED
                exit(1)
             endif
      endif
    endif
  endif
\end{verbatim}
}
\end{center}

As usual, the presence of a resource file in the {\tt \$ATMENSETC} directory triggers a particular behavior in 
the cycle. In this case, the user must provide a resource file named {\tt central\_ana.rc} that contains the location 
of an existing set of analyses and bias correction files that can be used by the ensemble-only ADAS cycle.
A typical example of its content is:
{\small
\begin{verbatim}
/archive/u/dao_ops/e572p5_fp/ana/Y%y4/M%m2/e572p5_fp.ana.satbang.%y4%m2%d2_%h2z.txt
    => hy11a.ana.satbang.%y4%m2%d2_%h2z.txt
/archive/u/dao_ops/e572p5_fp/ana/Y%y4/M%m2/e572p5_fp.ana.satbias.%y4%m2%d2_%h2z.txt 
    => hy11a.ana.satbias.%y4%m2%d2_%h2z.txt
\end{verbatim}
} % small
where here, files from the operational forward processing experiment ran with GEOS 5.7 are being fed into the
user experiment (named ``hy11a''; notice lines are broken up for readability purposes only). 
The ensemble ADAS script will still be missing the analysis file needed for
re-centering. To complete the settings, the environment variable {\tt DONORECENTER} should be set to 1 (on) in
the {\tt AtmEnsConfig.csh} configuration file. This way, no re-centering will be done, and the script will
not look for the analysis file.

In summary, to run an ensemble-only ADAS:
\begin{enumerate}
  \item Edit {\tt atm\_ens.j}, and set the begin and end dates parameters {\tt ENSONLY\_BEG} and {\tt ENSONLY\_END} 
        to desirable, non-zero,  dates.
  \item Consider what to do about re-centering, and define the contents of the resource file {\tt central\_ana.rc}
        accordingly. Depending on your choice, remember to check the environment variable {\tt DONORECENTER} in
        the configuration settings of the ensemble.
  \item For now, make sure the resource file {\tt central\_ana.rc} points to existing satellite bias coefficient files
        from another (OPS) experiment.
\end{enumerate}

Ensemble purists still might dislike the fact that satellite bias correction coefficients are being brought into the 
ensemble-only ADAS from outside. This can  be remedied when running the EnKF. The code has triggers to do the
satellite bias estimation on its own and not have to rely on external information. However, we do not presently have
a knob to allow the EnKF to recycle its own bias estimates; one will be added in a follow up release.

%....................................................................
\subsection{Spinning up the Ensemble ADAS}
%           -----------------------------
\label{subsec:SpinUpEnsADAS}

There are times when spinning up the members of the ensemble ADAS will be necessary. This can be done rather simply
by essentially running in ensemble-only mode, as just seen, but with a couple of small changes. This is the
case when re-centering about an existing analysis (hybrid or not) is desirable. We can essentially follow 
Sec. \ref{subsec:PureEnsADAS}, except that the environment variable {\tt DONORECENTER} should be left out of 
{\tt AtmEnsConfig.csh} and the resource file {\tt central\_ana.rc} should now grab existing analyses from the same
place it grabs the satellite bias correction coefficient files. That is, this resource file should now be as in: 

{\small
\begin{verbatim}
/archive/u/dao_ops/e572p5_fp/ana/Y%y4/M%m2/e572p5_fp.ana.eta.%y4%m2%d2_%h2z.nc4    
    => hy11a.ana.eta.%y4%m2%d2_%h2z.nc4
/archive/u/dao_ops/e572p5_fp/ana/Y%y4/M%m2/e572p5_fp.ana.satbang.%y4%m2%d2_%h2z.txt 
    => hy11a.ana.satbang.%y4%m2%d2_%h2z.txt
/archive/u/dao_ops/e572p5_fp/ana/Y%y4/M%m2/e572p5_fp.ana.satbias.%y4%m2%d2_%h2z.txt 
    => hy11a.ana.satbias.%y4%m2%d2_%h2z.txt
\end{verbatim}
} % small

The spin up can run for as long as desired, assuming analyses and satellite bias correction files to be available for
the period of interest.

%....................................................................
\subsection{Generating Climatological-like Background Error Covariance from Ensemble}
%           -----------------------------
\label{subsec:BerrorFromEnsemble}

The main assumption behind (certainly, 3D) hybrid assimilation produces is that the underlying ensemble provides a reasonable approximation to 
the a possibly required background error covariance. Under this assumption, it is conceivable to think of deriving a climatological-like 
background error covariance based on the ensemble using the same algorithm employed to derive the climatological background error
covariance using the NMC-method. The traditional proposes to parameterize the background error covariance using differences of 24- and 48-hour 
forecasts (REF HERE). The justification for why such differences relate to background errors has always been a challenging part of the 
produce, though it does not prevent it from being widely used. Availabily of an actual ensemble of background errors, 
represented as differences between the members of the ensemble at a particular time and the ensemble mean, corresponds to how ensemble filters
derive their corresponding (implicit), time varying, background error covariances. The same error covariance parameterization procedure 
using the NMC-method can be used by replacing the 24-hour forecasts with the ensemble mean (at a given time), and the corresponding 48-hour forecasts to be the members of the ensemble. Typically, NMC-method derived covariances rely on a year's worth of 24- and 24-hour forecasts, thus
relacing these forecasts for members of an ensemble amounts to considerable reduction in the number of samples used in the parameterization
procedure. Nonetheless, as long as the procedure can be worked out to converge, it is possible to derive a parameterized background error 
covariance based on the ensemble.

In the GEOS EnADAS the script named {\it atmens\_berror.csh} is responsible for generating a ``static''-like background error covariance from
the members of the ensemble. The script can be used offline (see the unit tester {\it ut\_atmens\_berror.j}, or it can be invoked online 
together with the central hybrid analysis. In this latter case, the generation of the ensemble-based parameterized background error covariance
is called before the hybrid analysis begins, to allow the user the possibility of chosing to use this ensemble-generated ${\bf B}$ parameterization instead of the climatologically-derived one (based on the traditional NMC-method). 

Example of fields derived to formulate parameterized background error covariances appears in Fig. \ref{fig:EnsembleBasedBerror}.

\begin{figure}[ht]
\begin{center}
\includegraphics[trim=40 40 10 0,clip,height=0.25\paperheight,width=0.45\textwidth]{Figs/Rel2/Berror_ps_current_revbeta.png}
\includegraphics[trim=40 40 10 0,clip,height=0.25\paperheight,width=0.45\textwidth]{Figs/Rel2/Berror_t_current_revbeta.png}
\includegraphics[trim=40 40 10 0,clip,height=0.25\paperheight,width=0.45\textwidth]{Figs/Rel2/Berror_sf_current_revbeta.png}
\includegraphics[trim=40 40 10 0,clip,height=0.25\paperheight,width=0.45\textwidth]{Figs/Rel2/Berror_vp_current_revbeta.png}
\end{center}
\captionsetup{margin=10pt,font=small,labelfont=bf}
\caption{Three versions of standard deviation fields used in a parameterized background error covariance formulation: traditional NMC-method 
(black), ensemble-derived, and corresponding hybrid using parameters defining the GEOS EnADAS hybrid 3dVar.
 \label{fig:EnsembleBasedBerror}}
\end{figure}


%....................................................................
%\section{Data Flow}
%        ---------
%\label{sec:DataFlow}

%....................................................................
\section{Conventions}
%        -----------
\label{sec:Conventions}

Each process launched by GEOS ensemble ADAS indicates its successful termination by touching a properly named 
hidden file in the work-directory of the ongoing execution. An example of the hidden files present near completion of 
integration of a 3-member ensemble experiment is given below.
\begin{verbatim}
 .DONE_ENSMEAN_obsvr_ensemble.csh.2012032600
 .DONE_MEM001_obsvr_ensemble.csh.2012032600
 .DONE_MEM002_obsvr_ensemble.csh.2012032600
 .DONE_MEM003_obsvr_ensemble.csh.2012032600
 .DONE_acquire_ensperts.csh.2012032600
 .DONE_MEM001_ACQUIRE_ENSPERTS.2012032600
 .DONE_obsvr_ensemble.csh.2012032600    
 .DONE_PERTMEAN.2012032600
 .DONE_ENKFX_atmos_enkf.csh.2012032600
 .DONE_atmos_enkf.csh.2012032600
 .DONE_atmos_eana.csh.2012032600
 .DONE_MEM001_setperts.csh.2012032600
 .DONE_MEM001_PERTDIFF.2012032600
 .DONE_MEM002_PERTDIFF.2012032600
 .DONE_MEM003_PERTDIFF.2012032600
 .DONE_MEM002_atmens_recenter.csh.2012032600
 .DONE_MEM001_atmens_recenter.csh.2012032600
 .DONE_MEM003_atmens_recenter.csh.2012032600
 .DONE_atmens_recenter.csh_ana.eta_ana.eta.2012032600
 .DONE_post_eana.csh.2012032600
 .DONE_MEM001_atmos_ens2gcm.csh.2012032600
 .DONE_MEM002_atmos_ens2gcm.csh.2012032600
 .DONE_MEM003_atmos_ens2gcm.csh.2012032600
 .DONE_atmos_ens2gcm.csh.2012032612
 .DONE_MEM001_gcm_ensemble.csh.2012032609
 .DONE_MEM002_gcm_ensemble.csh.2012032609
 .DONE_MEM003_gcm_ensemble.csh.2012032609
 .DONE_ENSFCST
\end{verbatim}
This more or less tells the sequence of events in the run. For example, the run begins by launching 
the acquire of NMC-like perturbations to be used as additive inflations. The observers run concurrently to
this. The mean observer runs first and indicates its successful termination by touching the zero-length file
\begin{verbatim} 
.DONE_ENSMEAN_obsvr_ensemble.csh.2012032600  
\end{verbatim}
Following the mean observer, each member observer runs
concurrently to each other. As they terminate successfully, they touch corresponding hidden files 
indicating the ensemble member number, e.g., the third member touches
\begin{verbatim}
.DONE_MEM003_obsvr_ensemble.csh.2012032600
\end{verbatim}
for consistency, a time stamp is also appended to the touched hidden file.
When all observers are finished, the hidden file ``.DONE\_obsvr\_ensemble.csh.2012032600'', with the calling program 
name, appears. When all NMC-like perturbations have been retrieved, the file 
\begin{verbatim}
.DONE_MEM001_ACQUIRE_ENSPERTS.2012032600
\end{verbatim}
indicates their availability to the experiment. The file ``.DONE\_PERTMEAN.2012032600'' indicates the successful
removal of the mean from the NMC-like perturbations. Analogously, when all processes related to the ensemble analysis 
complete, the file ``.DONE\_atmos\_eana.csh.2012032600'' appears. The availability of both the mean-free NMC-like
perturbations for additive inflation and the ensemble analyses allows the re-centering to take place, with
successful termination indicated by the file ``.DONE\_atmens\_recenter.csh\_ana.eta\_ana.eta.2012032600''. 
This is followed by the completion of the post-analysis, which involves collection of statistics from the 
ensemble, and indicated by 
\begin{verbatim}
.DONE_post_eana.csh.2012032600 
\end{verbatim}
Once re-centering is finished, the IAU increments are created for each member by the procedure connecting 
the ensemble analyses wih the AGCM, 
\begin{verbatim}
.DONE_atmos_ens2gcm.csh.2012032612
\end{verbatim}
Finally, the ensemble of backgrounds 
for the next cycle is created by running instances of the AGCM, for each member. Each member of the forecast indicates 
its own successful termination, and when all are done, the file
``.DONE\_ENSFCST'' indicates the successful generation of all new members. At this point, the main job script will 
take care of submitting the central ADAS cycling script, and launching the archiving job to save request output from
the ensemble ADAS.  This brief description is sequential but by now the reader will be familiar with the 
parallelism exploited within this process. 

The namings of the hidden files are standardized. That is, their names follow the template:
\begin{verbatim}
  .DONE_CALLING_PROGRAM.YYYYMMDDHH
\end{verbatim}
where {\tt CALLING\_PROGRAM} and {\tt YYYYMMDDHH} specify by the calling program and the current date 
and time of the cycle (or background). The other convention is that the hidden files must, in the majority of
cases, be created at the level of the {\tt ENSWORK} directory. There are some exceptions, but those
are outside the scope of the present document. {\it Users wanting to implement extra features must 
follow the conventions}.

%....................................................................
\section{Handling Crashes}
%        ----------------

All systems are due to sporadic failures; the ensemble component of the ADAS is no different. Unfortunately,
since this is still in its infancy and has only been experimented with in development mode, crashes happen
more often than perhaps one would expect. Gladly, however, most crashes simply require re-submitting 
the driving script. For now, we only discuss issues related to reviving a crashed execution when the 
scheduler is not running, that is to say, when the {\tt g5das.j} submits the driving ensemble script
{\tt atm\_ens.j}, which in turn submits {\tt g5das.j}, when the cycle goes well.

%\noindent{\it Tips to revive most crashed cases}

At the time of this writing the NASA Center for Climate Simulation (NCCS) PBS systems are under the
influence of rather frequent generation of so-called {\it ghost-jobs}, when a job script is submitted to
PBS, but it bounces back to the user with its contents wiped out, as if the user had submitted an 
empty shell-script to PBS. It turns out, because of the rather large number of jobs a single cycle of the 
ensemble ADAS can span, these {\it ghost jobs} happen more often when running ensemble ADAS than when running regular 
experiments. In these instances, it might be hard to revive the experiment.

%\noindent{\it Tips to revive slightly more severe crashes}

%When say the ensemble of AGCMs or observers fail it is possible (though rarely needed) to submit by
%hand the corresponding failed job. For example, say, member 14 of AGCM integration has failed 
%and the machine refuses to run it when the {\tt atm\_ens.j} job script is re-submitted blindly. This can 
%happen when NCCS-discover is generating ghost-jobs
%for whatever reason. The user can simply submit the offending job by hand, in this, under the \$ENSWORK directory
%the user will find a job-script named {\tt agcm\_mem014.j} which can be submitted by hand. Once this job completes,
%the user should be able to re-submit the main driving ensemble script {\tt atm\_ens.j} to pick from where it left.
%Special attention must be given to when running in the configuration of multiple-work-multiple-jobs. In this case,
%the user will have to edit the offending job script, in the example above the script {\tt agcm\_mem014.j}, and 
%replace the command line using ``mpiexec'' with a valid ``mpirun'' command line. This is a simple replacement of
%the command line and it works in all cases. The example below illustrates this case.

%Look for the job script agcm\_mem014.j found under {\tt \$ENSWORK/mem014}, whose contents are shown below:
%\begin{verbatim}
%#!/bin/csh -xvf
%#SBATCH --account=g0613
%#SBATCH --job-name=agcm_mem014
%#SBATCH --time=1:00:00
%#SBATCH --partition=nccs1
%#SBATCH --ntasks=48
%#SBATCH --ntasks-per-node=8
%#PBS -N agcm_mem014
%#PBS -l select=6:ncpus=8:mpiprocs=8
%#PBS -l walltime=1:00:00
%#PBS -S /bin/csh
%#PBS -V
%#PBS -j eo
%#PBS -q nccs1
% setenv FVROOT /discover/swdev/$user/g580/GEOSadas/Linux
% source $FVROOT/bin/g5_modules
% set path = ( . $FVROOT/bin $path )
%
% cd /discover/nobackup/$user/enswork.HY05C/mem014
% /bin/rm .SUBMITTED
% touch .RUNNING
% if ( -e EGRESS ) then
%    /bin/rm EGRESS
% endif
% mpiexec -machinefile /discover/nobackup/$user/enswork.HY05C/agcm_machfile0.1 -np 48 \
%          GEOSgcm.x |& tee -a /discover/nobackup/$user/enswork.HY05C/agcm_mem014.log
% if ( ! -e EGRESS ) then
%      echo " gcm_ensemble.csh: Atmos AGCM Failed, Aborting ... "
%      exit(1)
% endif
% update_ens.csh hy05c 014 bkg /discover/nobackup/$user/enswork.HY05C/mem014 NULL nc4
% /bin/rm .RUNNING
% touch /discover/nobackup/$user/enswork.HY05C/.DONE_MEM014_gcm_ensemble.csh.2012050421
%\end{verbatim}
%and replace the ``mpiexec line'' with the following line: \\
%\begin{verbatim}
% mpirun -np 48 GEOSgcm.x \
% |\& tee -a /discover/nobackup/\$user/enswork.HY05C/agcm\_mem014.log
%\end{verbatim}
%and submit the resulting job by hand. You can add the mpirun option {\it -perhost} if you which to.
%If this is the only offending job, when it is complete it will create the necessary information (the 
%file in the ``touch'' command line), from which point you can resubmit the ensemble ADAS control job {\tt atm\_ens.j}.

%\noindent{\it When all else fails}

When various attempts to revive a crashed cycle fails, the last-resort thing to do is simply to remove
the {\tt \$ENSWORK} directory and re-submit the ensemble job control script, {\tt atm\_ens.j} from start. 
Before doing so, make sure the original ensemble for the present cycle of interest remains intact.

%....................................................................
\section{Frequently Asked Questions}
%        --------------------------

\begin{enumerate}
  \item {\it What's the easiest way to create a set of ensemble members from scratch?} \\
        After multiple trials, with multiple schemes, we have come to find that a viable ensemble can be created by simply
        reproducing the initial conditions at the time of interest to create as many members as desired. This applies to both
        the AGCM restart files and to the background fields. For example, follow these steps:
        \begin{description}
           \item[Inside \$FVHOME/atmens:] Create as many directories for the members as needed, for example, for 32 members,
              create the directory {\tt \$FVHOME/atmens} and do:
              \begin{verbatim}
              cd $FVHOME/atmens
              @ n = 0
              @ nmem = 32
              while ($n < $nmem) 
                 @ n++
                 set memtag  = `echo $ic |awk '{printf "%03d", $1}'` 
                 mkdir mem$memtag
              end
              \end{verbatim}
           \item[Model restarts:] Generate model restarts for desired ensemble resolution\footnote{Use the {\tt regrid.pl} utility 
               to convert files from GMAO operational runs.} and place them in, say, a subdirectory of {\tt \$FVHOME/atmens}
               named {\tt rst}. Now, link the binary restarts of the AGCM into the member directories by doing:
              \begin{verbatim}
              cd $FVHOME/atmens
              foreach dir (`ls -d mem0*`)
                 cd $dir
                 ln -sf ../rst/*.bin .
                 cd -
              end
              \end{verbatim}
           \item[Background files:] Convert background files from start-up experiment (say, operational run) to desired
              ensemble resolution. For example, assuming you have the so-called ``bkg.eta'' files under {\tt \$FVHOME/atmens/rst}
              as described above for the binary restarts, you can convert each of the background restarts by doing:
              \begin{verbatim}
                  $DASBIN/dyn2dyn.x -g5 -res c \
                  -o hy11a.bkg.eta.20121226_03z.nc4 \
                  e572p5_fp.bkg09_eta_rst.20121226_03z.nc4
              \end{verbatim}
              where {\tt \$DASBIN} is the location of an ADAS build {\tt bin} directory, and the example takes a background file
              from the GEOS-5.7 operational series to 1-degree resolution. You should not need to worry about the
              resolution of the surface background files - the observers take care of resolution discrepancies on the fly.
              After converting the backgrounds, you can then do a similar ``foreach'' loop as shown above to link 
              all ``bkg.eta'' and ``bkg.sfc'' from {\tt \$FVHOME/atmens/rst} into the directories of the individual members.
           \item[Ensemble mean files:] Since all background files are identical, you can use the same set of 
              resolution-converted background files to feed a directory named {\tt ensmean}, under 
              {\tt \$FVHOME/atmens}.
        \end{description}
         You can check the usage of the scripts {\tt gen\_ensbkg.csh} and {\tt gen\_ensrst.csh} since they basically do
         what has just been described (see Appendix).
    
  \item {\it What do I submit next?} \\
        Sometimes, when a cycling job stops, one might have difficulty knowing what to submit next namely, either the central
        ADAS script {\tt g5das.j} or the ensemble ADAS script {\tt atm\_ens.j}. The first thing to check is the run directory to 
        try and determine which PBS output file has been written out last.
        If that does not help, the next thing to check is the presence of the directory defined by {\tt RSTSTAGE4AENS}\footnote{Usually 
        set to {\tt \$FVHOME/atmens/RST}.}; if it is present, chances are the ensemble ADAS was running when the job stopped, 
        and what likely needs to be submitted next is the ensemble ADAS script. Another possibility is to check for the presence of 
        work area directory {\tt \$ENSWORK}. If this directory is present, then most certainly the ensemble ADAS was running when 
        the job stopped and {\tt atm\_ens.j} should be re-submitted. 

  \item {\it How do I add new output streams to the ensemble of AGCMs?} \\
        The file {\tt \$ATMENSETC/HISTAENS.rc.tmpl} controls the history of each ensemble member dealt with by the AGCM. In principle, 
        nothing forbids this history to be as complex and complete as that of the AGCM when running the central ADAS, that is, the 
        {\tt HISTORY.rc.tmpl} file under {\tt \$FVHOME/run}. However, each data stream added to the ensemble members results in 
        increased wall-clock time when running each member of the ensemble. Therefore, though the capability is there, the 
        flexibility to get the extra output stream saved (archived) is somewhat hidden. Not only will the user need to edit 
        {\tt \$ATMENSETC/HISTAENS.rc.tmpl} but there will also be need to edit the update and archiving scripts: 
        {\tt update\_ens.csh} and {\tt atmens\_arch.csh}, to add logics to handling the new data stream. Eventually, as long 
        as the same mechanism of handling archiving output is used, the environment variable {\tt ENSARCH\_FIELDS} will 
        need to be edited by changing its default setting in the configuration script {\tt \$ATMENSETC/AtmEnsConfig.csh}.
        Bare in mind that not only the members of the ensemble AGCM forecasts will run slower, but the archiving mechanism will 
        be overloaded when new output streams are added.

  \item {\it Can I rename the ensemble ADAS script, {\tt atm\_ens.j}?} \\
        Yes. Just like the main ADAS script {\tt g5das.j} can be renamed at user's will, so can the script driving the 
        ensemble -- if this is done, both these scripts must be edited and changed accordingly since they contain their own 
        names and each other's name. Renaming of the main ADAS script is usually done during the (fv)setup procedure. However,
        renaming of the {\tt atm\_ens.j} must be done by hand. In the future, a smart step up may come to aid.
        
  \item {\it Why aren't the observers run in conjunction with the atmospheric AGCM integrations?}
        GEOS AGCM and the GSI observer are hooked through an ESMF gridded-component so that it is conceivable to invoke the GSI
        observer while running the AGCM (this has been developed for the GEOS 4DVar system) -- call it the online observer. 
        This question is thus rather pertinent, since the online observer would provide for a more efficient way of running
        the ensemble of observers in the ensemble ADAS. Unfortunately, details related to how GSI does its quality control based
        on a certain snap-shot of the background fields are such that an off-line (regular) observer always ends up
        taking in more observations than its online counterpart. We may revisit this at some point in the future, but for the
        time being the most effective way of maximizing usage of the observations is by running off-line observers as currently
        done in the ensemble ADAS (and in our experimental 4DVar, for that matter). 

  \item {\it What happens inside the work directory?}
        As with the regular ADAS, the ensemble ADAS works within its own reserved space area. Lots of things happen inside this
        area. Assuming the default options are being exercised, the most important sequence of events follows the flow
        diagram shown in Fig. \ref{fig:EADASflowchart} and do: 
         \begin{enumerate}
            \item Links to the NMC-like perturbations are created within a subdirectory
                  named {\tt addperts} created inside the work directory --- a subdirectory of this, named {\tt tmperts}, is used while removal
                  of the mean from these perturbations is taking place --- once this process completes, the perturbations used in the 
                  additive inflation will reside under {\tt addperts}. 
            \item A directory {\tt ensmean} is created inside the work area and observations are brought from the archive into 
                  this directory; links are also created pointing to the mean background files normally seating under {\tt \$FVHOME/atmens/ensmean};
                  when the observer mean is completed, directories for each member are created inside the work area {\tt \$ENSWORK}, and the 
                  post-quality-control observation files generated by the mean observer are linked inside each member directory together with 
                  the corresponding backgrounds found under {\tt \$FVHOME/atmens}.
            \item Once the observers are finished, all observer GSI diagnostics output files and corresponding background files 
                  are directly linked inside the main work area directory: the EnKF runs here, and analysis files are originally
                  written out in this directory. 
            \item Completion of the ensemble analysis triggers move of the analyses files from the work area into a subdirectory 
                  of this area named {\tt updated\_ens}. Inside this directory, each analysis member is placed in its own subdirectory, such as,
                  {\it mem001, mem002, etc}; links are then created back to the original member directories under the work area --- note,
                  at this stage we have directories named {\it mem001, mem002, etc}, under the work area, as well as under 
                  {\tt updated\_ens}, though these are physically distinct directories.
            \item The ensemble mean analysis calculation can now take place and the resulting mean analysis file is placed under
                  the subdirectory {\tt updated\_ens/ensmean}; similary, second-order statistics are placed under 
                  {\tt updated\_ens/ensrms}.
            \item With availability of the ensemble mean analysis, the member analyses re-centering and inflation can take place inside
                  {\tt \$ENSWORK/updated\_ens}. Ultimately, the EnKF analyses are overwritten. Notice that due to the links created earlier,
                  the original member directories under {\tt \$ENSWORK} see the re-centered and inflated updated analysis files.
            \item Creation of the IAU-forcing terms for each member now takes place inside the member directories under {\tt \$ENSWORK}.
            \item Links to the AGCM restart files are created from their original location {\tt \$FVHOME/atmens} into the work area 
                  member directories inside {\tt \$ENSWORK}. Edited resource files and links to boundary condition files are also placed inside
                  each member directory, and the AGCM ensemble is then integrated forward.
            \item The output of each member AGCM integration is moved to the corresponding location under {\tt \$ENSWORK/updated\_ens}. This
                  directory now has a completely new ensemble with information needed for the next analysis cycle. 
            \item At this point, the main script swaps the old ensemble with the new in the original location.
                  That is, as discussed before, this is what happens in the main driver: \\
                  {\small
                      /bin/mv \$ATMENSLOC/atmens     \$ATMENSLOC/atmens4arch.\$\{nymdb\}\_\$\{hhb\} \\
                      /bin/mv \$FVWORK/updated\_ens  \$ATMENSLOC/atmens
                  }
            \item Post-processing of the output from the AGCM takes place; mean and other statistics from the members of the
                  ensemble are calculated.
            \item The main job script driving the ensemble ADAS can now launch the hybrid ADAS script, as well as the archiving 
                  script that works to permanently store the members from the {\it previous cycle} 
                  (under {\tt \$ATMENSLOC/atmens4arch.\${nymdb}\_\${hhb}}).
         \end{enumerate}

  \item {\it What else should I watch out for?}
        \begin{itemize}
           \item When creating an initial ensemble from scratch, and placing it under {\tt \$HYBRIDGSI} (i.e.,
                 {\tt \$FVHOME/run/atmens}), remember to touch a hidden file named {\tt .no\_archiving}
                 inside this directory. This is required to prevent the archiving procedure of the central ADAS
                 from looking inside this directory for files to be archived. Remember that each subdirectory 
                 of {\tt \$HYBRIDGSI}, holding each member of the ensemble, will have files with typical ADAS names, 
                 for example, files fitting a template of the type {\tt \%s.bkg.eta.\%y4\%m2\%d2\_\%h2z.nc4} will be 
                 under each member directory. Once the archiving procedure sees these files, it will work to place them 
                 in the archive, possibly overwriting whatever the ADAS has placed there. The presence of 
                 the hidden file {\tt .no\_archiving} in the top directory of a chain of directories is enough
                 for the archiving procedure to ignore the directory and its subdirectories.
                 
           \item Unlike when running the (hybrid) ADAS, the scripts running the {\it ensemble} ADAS never copy their resource
                 files into the working area. Therefore, if you decide to make changes to any of the resource files 
                 under the experiment directory defined by {\tt ATMENSETC}, while the ensemble job in running, the 
                 changes will be instantly picked up by the run. The exception being changes made to {\tt AtmEnsConfig.csh}. 
                 We advise strongly against making such changes while the job is running, unless you really understand
                 the potential consequences.
        \end{itemize}

\end{enumerate}

%....................................................................
\section{Future Releases}
%        ---------------
\label{sec:FutureRel}

%\begin{quotation}
%   {\it Whereof one cannot speak, thereof one must be silent.} \\
%   {\tiny Tractatus Logico-Philosophicus, L. Wittgenstein, 1918.}
%\end{quotation}

The worth of a software system is in its flexibility and friendliness to its users, and clarity to its developers.
This initial release of the Ensemble ADAS and its ability to link up with the regular ADAS to form a 
hybrid ensemble-variational data assimilation system has only been tested by these two writers. Its real
test starts now, with its release to our GMAO colleagues. We expect to receive feedback from 
those of you coming across still lingering deficiencies, and (hopefully) minor mistakes.
We will work with users to come up with improved future releases. At this time we recognize 
a few weaknesses and plan to address most of them during the next few months. A known list of
work to be done follows.

\begin{description}
  \item[Scripts.] Polyglot programmers will find coding in c-shell to be limited and even annoying. Though we recognize 
                  the power of modern languages such as Perl and Python, we feel c-shell provides the clarity that
                  other programming languages lack. However, as new flexibilities and options are added to the
                  scripts we will be looking into promoting some of them to more modern (perhaps object-oriented) languages.
  \item[Scheduler.] As mentioned in different stages of this manuscript, the initial release of our hybrid ensemble-variational
                    system does not exploit parallelism between the ensemble ADAS and the hybrid (central) ADAS. This is
                    one of our priorities. A complete and functional version of the prototype Scheduler introduced here 
                    will be matured and released.
  \item[Event Log.] Another priority to work on relates to the development of an Event Log mechanism that looks inside
                    the work area and is capable of telling the user what processes are running at any given time of the
                    ensemble ADAS integration. Indeed, such an Event Log is expected to give
                    proper hints into what might have failed in cases when the run stops.
  \item[Archiving.] The archiving mechanism tends to be rather time consuming. In the next couple of months we plan to
                    have another look at the mechanism we presently use. Perhaps joint work with some of our colleagues will 
                    lead us into more efficient ways of storing the massive amount of information presently generated
                    by the ensemble, not to mention the ability to handle the potential information presently not 
                    put out by the ensemble.
  \item[AGCM Initialization.] One contributor to the large volume of data that needs to be handled  by the archiving
                    mechanism is the AGCM ensemble and its corresponding restart files (initial conditions). We plan to
                    test a version of the ensemble ADAS that essentially bootstraps the physics each cycle. This has the
                    potential to reduce dramatically the number of restarts to be carried along by each member. As 
                    cited above, the ensemble used in the hybrid version of NCEP 3DVar follows such bootstrapping 
                    mechanism, thus giving us encouragement to pursue similar strategy in the GEOS ensemble ADAS. Needless
                    to say, the central (hybrid) ADAS restarts will continue to be handled as usual.
  \item[Environment Variables to Revisit.]  The environment variable {\tt AENSADDINFLOC} controls where the perturbations
                    used in the additive inflation procedure are to be placed. It is presently set in the ensemble configuration
                    file {\tt AtmEnsConfig.csh} and giving the user freedom to change it at will. In reality, this is more
                    of an internal variable that only the ensemble procedure should have control over. Future releases
                    will have this revisited.
  \item[Efficiency of the post-GCM step.]  Presently, the post-processing script handling calculation of the statistics of 
                    the outputs of the ensemble of AGCM treats each output type, at each time, sequentially. Treating the
                    output in parallel is rather straightforward given all the tools we have in place; this will be
                    tackled soon as well.
  \item[Ensemble forecasting capability.] As mentioned in Sec. \ref{subsec:gcmRCconfig}, the machinery in our ensemble scripts 
                    already supports launching ensemble of mid-range forecasts. This can be easily accomplished while running 
                    the ensemble ADAS.  It might be helpful to also have an independent procedure capable of launching ensemble of 
                    forecasts without need of holding the cycle (or even running the ensemble ADAS). At the time of this writing, 
                    a procedure is being tested for this purpose --- part of its functionality is illustrated by the double-dashed,
                    marbled, box in Fig. \ref{fig:EADASflowchart} referring to the script {\tt aens\_fcst.j}.
                    This is not being released in this first release since it does not fall under the GMAO priority list.
  \item[Observation-space observation impact on the forecast.] It is rather simple to join the ensemble forecasting capability
                    just mentioned with the ability to calculate observation-space-based observation impacts on mid-range 
                    forecasts. This is illustrated by the script called {\tt aens\_obimp.j}, appearing in the double-dashed, marbled,
                    box beneath the ensemble of forecasts box in Fig. \ref{fig:EADASflowchart}, and is also planned to be part of 
                    the next release.
  \item[EnKF Cycled Satellite Bias Estimates.] As mentioned in Sec. \ref{subsec:PureEnsADAS}, the EnKF software of 
                     J. S. Whitaker is capable of estimating satellite biases. However, our scripts are not yet
                     enabled to cycle these estimates properly. A  knob will be added to allow to handle this feature 
                     and permit fully independent experimentation with ensemble-only strategies.
  \item[Science and Software.] Lastly, but most importantly, are some open science questions emanating from our
                     experiments that still need answers and work. Among these are the apparently neutral improvement in 
                     temperature observation residual statistics when comparing hybrid with traditional experiments;
                     the noticed deterioration of the tropical temperature short-range forecasts in the low-to-mid troposphere
                     (up to 36 hours); and the fact that we have only tried to keep results in the stratosphere from
                     changing beyond what they originally are with traditional 3DVar -- an effort needs to be made to see 
                     the possibility of using the ensemble to improve upon the stratosphere. Future improvements are expected 
                     to come not only from addressing these issues, but also from upgrades to the software itself.
                     Among these, having so-called cube-to-cube regrid capability to handle restarts should permit better 
                     testing our filter-free approach in a dual-resolution scenario; scale-selective weights should allow
                     reduction of wind aliasing and better balance properties in the hybrid GSI; and implementation of 
                     the hybrid capability for the square-root-${\bf B}$ preconditioning conjugate gradient minimization 
                     strategies in GSI to enable various GSI adjoint options and allow for comparison of the Hessian
                     between traditional and hybrid 3DVar experiments. This is only a small list of 
                     what we believe possible to address in the short term. 
               

\end{description}

%.........................................................................
\vspace{1in}
\centerline{\huge\bf Acknowledgments}

\addcontentsline{toc}{section}{Acknowledgments}

\vspace{0.3in}

The authors are thankful to David F. Parrish, Darly Kleist, Russ Treadon, and John Derber, from NOAA/NCEP, for the multiple discussions
during the period of implementation of the hybrid components of the GEOS ADAS, particularly in what refers GSI- and
EnKF-related settings. The authors are thankful to Jeffrey S. Whitaker, from NOAA/ESRL, for a number of discussions throughout the course
of implementation of his EnKF software in our system. Thanks are also due in advance to our GMAO colleagues for their patience
to going over this document, using our hybrid and ensemble assimilation systems, coping with possible inflexibilities in some of 
the software, and for giving us some expected feedback. Finally, we express our gratitude to Michele M. Rienecker for her 
patience and encouragement during development of the GMAO hybrid data assimilation system. 

%.........................................................................

\newpage

\centerline{\huge\bf Glossary}

\addcontentsline{toc}{section}{Glossary}

\begin{description}

\item{}central analysis (or DAS) -- refers to the main hybrid-variational part of the assimilation cycle; this
                                    is the part most users are familiar with when running the DAS (except that
                                    an ensemble of background fields is normally not required); this is the part
                                    controlled by the script {\tt g5das.j}.
\item{}(A)DAS -- (atmospheric) data assimilation system.
\item{}3/4DVar -- three/four-dimensional variational (analysis).
\item{}build -- indicates version of GEOS ADAS that has been compiled, installed, and is ready to provide 
                executables for running the ADAS. 
\item{}EDAS -- Ensemble Data Assimilation System.
\item{}EnKF -- Ensemble Kalman Filter.
\item{}EVDAS -- Ensemble-Variational Data Assimilation System.
\item{}ESRL -- NOAA Earth System Research Laboratory.
\item{}GCM -- General Circulation Model (and Global Climate Model).
\item{}GMAO -- Global Modeling and Assimilation Office.
\item{}GEOS -- Goddard Earth Observing System.
\item{}ghost jobs -- term used to refer to jobs that bounce right back out of NCCS's PBS without doing
                     any work and looking as though an empty shell has been submitted instead of an actual 
                     shell instruction script.
\item{}GSI -- Gridpoint Statistical Interpolation (analysis system).
\item{}NASA --  National Aeronautics and Space Administration.
\item{}NCEP --  National Centers for Environmental Prediction.
\item{}NCCS --  National NASA Center for Climate Simulation.
\item{}NOAA --  National Oceanic and Atmospheric Administration. 
\item{}OPS  --  Refers to GMAO operations and its operational experiments.
\item{}PBS  --  Portable Batch System.
\item{}TLNMC -- Tangent Linear Normal Mode Constraint

\end{description}
%.........................................................................

\newpage

\centerline{\huge\bf References}

\addcontentsline{toc}{section}{References}

\begin{description}

\item{}Bloom, S. C., L. L. Takacs, A. M. da Silva, and D. Ledvina, 1996:
       Data assimilation using incremental analysis updates. {\it Mon. Wea. Rev.}, {\bf 124},
       1256-1271.

\item{}Charron, M., G. Pellerin, L. Spacek, P. L. Houtekamer,  N. Gagnon, H. L. Mitchell, and L. Michelin,
       2010: Toward random sampling of model error in the Canadian Ensemble Prediction System.
       {\it Mon. Wea. Rev.}, {\bf 138}, 1877-1901. 

\item{}Chou, M.-D., and M. J. Suarez, 1999: A solar radiation parameterization for atmospheric
       studies. NASA Tech. Memo., 104606, Vol. 15, 40 pp.

\item{}Clayton, A. M., A. C. Lorenc, and D. M. Barker, 2012: Operational implementation of a hybrid 
       ensemble/4D-Var global data assimilation system at the Met Office. {\it Q. J. Royal. Meteorol. Soc.}, 
       {\it Q. J. R. Meteorol. Soc.}, {\bf 139}, 1445-1461. doi: 10.1002/qj.2054.

\item{}Cohn, S. E., A. da Silva, J. Guo, M. Sienkiewicz, and D. Lamich, 1998: Assessing the Effects of Data Selection 
       with the DAO Physical-Space Statistical Analysis System. {\it Mon. Wea. Rev.}, {\bf 126}, 2913-2926. 

\item{}Colarco, P., A. da Silva, M. Chin and T. Diehl, 2010: Online simulations of global aerosol distributions 
       in the NASA GEOS-4 model and comparisons to satellite and ground-based aerosol optical depth. 
       {\it J. Geophys. Res.}, {\bf 115}, D14207, doi:10.1029/2009JD012820. 

\item{}Collins, N., G. Theurich, C. Deluca, M. Suarez, A. Trayanov, V. Balaji,
       P. Li, W. Yang, C. Hill, and A. da Silva, 2005. Design and implementation of
       components in the earth system modeling framework. {\it Intl. J. High Perform.
       Comput. Appl.}, {\bf 19}(3), 341-350.

\item{}da Silva, A. M. and C. Redder, 1995: Documentation of the GEOS/DAS ODS. NASA/DAO Office Note, 95-01, 37 pp.

\item{}Derber, J. C., and A. Rosati, 1989: A global oceanic data assimilation technique. 
       {\it J. Phys. Oceanogr.}, {\bf 19}, 1333-1347.

\item{}Derber, J. C., and W.-S. Wu, 1998: The use of TOVS could-cleared radiances in the NCEP SSI
       analysis system. {\it Mon. Wea. Rev.}, {\bf 126}, 2287-2299.

\item{}El Akkraoui, A., Y. Tr\'emolet, and R. Todling, 2013: Preconditioning of variational data assimilation 
       and the use of a bi-conjugate gradient method. {\it Q. J. Royal. Meteorol. Soc.}, 
       {\bf 139}, 731-741.

\item{}Errico, R. M.,  R. Gelaro, E. Novakovskaia, and R. Todling, 2007: General characteristics
       of stratospheric singular vectors. {\it Meteorologische Zeitschrift}, {\bf 16}, 621-634.

%\item{}Gelaro, R., R. H. Langland, S. Pellerin, and R. Todling, 2010: The THORPEX observation impact inter-comparison 
%       experiment. {\it Mon. Wea. Rev.}, {\bf 138}, 4009-4025.

\item{}Giering, R., T. Kaminski, R. Todling, R. Errico, R. Gelaro, and N. Winslow, 2005: Generating tangent 
       linear and adjoint versions of NASA/GMAO's Fortran-90 global weather forecast model.
       In H. M. B\"ucker, G. Corliss, P. Hovland, U. Naumann, and B. Norris, editors, Automatic 
       Differentiation: Applications, Theory, and Implementations, volume 50 of Lecture 
       Notes in Computational Science and Engineering, pages 275-284. Springer, New York, NY.

\item{}Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. 
       {\it Mon. Wea. Rev.}, {\bf 129}, 550-560.

\item{}Hamill, T. M., and C. Snyder, 2000: A hybrid ensemble Kalman
       filter-3D variational analysis scheme. {\it Mon. Wea. Rev.}, {\bf 128}, 2905-2919.

\item{}Hamill, T. M. and J. S. Whitaker, 2005: Accounting for the error due to unresolved scales in 
       ensemble data assimilation: A comparison of different approaches. {\it Mon. Wea. Rev.}, {\bf 133},
       3132-3147.      
  

%\item{}Houtekamer, P. L., H. L. Mitchell, G. Pellerin, M. Buehner, M. Charron, L. Spacek, 
%       and B. Hansen, 2005: Atmospheric data assimilation with an ensemble Kalman filter: 
%       Results with real observations. {\it Mon. Wea. Rev.}, {\bf 133}, 604–620.


\item{}Kleespies, T.J., van Delst, P., McMillin, L.M., and J. Derber, 2004: Atmospheric 
       transmittance of an absorbing gas. 6. OPTRAN status report and introduction to the
       NESDIS/NCEP Community Radiative Transfer Model. {\it Appl. Opt.}, {\bf 43}, 3103-3109.

\item{}Kleist, D. T., D. F. Parrish, J. C. Derber, R. Treadon, W.-S. Wu, and
       S. Lord, 2009a: Introduction of the GSI into the NCEP’s Global
       Data Assimilation System. {\it Wea. Forecasting}, {\bf 24}, 1691-1705.

\item{}Kleist, D. T., D. F. Parrish, J. C. Derber, R. Treadon, R. M. Errico, and 
       R. Yang, 2009b: Improving incremental balance in the GSI 3DVAR analysis system.
       {\it Mon. Wea. Rev.}, {\bf 137}, 1046-1060. 

\item{}Kleist, D. T., 2012: An evaluation of hybrid variational-ensemble 
       data assimilation for the NCEP GFS. {\it Ph. D. Thesis}, University
       of Maryland, 163 pp. 
       [Available online at: http://www.emc.ncep.noaa.gov/gmb/wd20dk/docs/phd/DarylKleist\_PhDThesis\_Revised.pdf]

\item{}Koster, R. D., M. J. Suarez, A. Ducharne, M. Stieglitz, and P. Kumar, 2000: A catchment-based 
       approach to modeling land surface processes in a GCM, Part I: model structure. 
       {\it J. Geophys. Res.}, {\bf 105}(D20), 24809-24822.

\item{}Langland, R. H., and N. L. Baker, 2004: Estimation of observation
        impact using the NRL atmospheric variational data assimilation adjoint
        system.  {\it Tellus}, {\bf 56A}, 189-201.

\item{}Lewis, J., K. D. Raeder, R. M. Errico, 2001: Vapor flux associated with 
       return flow over the Gulf of Mexico: A sensitivity study using adjoint modeling. 
       {\it Tellus}, {\bf 53A}, 74-93.

\item{}Lin, S.-J., 2004: A vertically Lagrangian finite-volume dynamical core for general circulation models.
       {\it Mon. Wea. Rev.}, {\bf 132}, 2293-2307.

\item{}Lorenc, A. C., 2003: The potential of the ensemble Kalman filter for
       NWP--A comparison with 4D-Var. {\it Q. J. R. Meteorol. Soc.}, {\bf 129}, 3183-3203.
 
\item{}Lupu, C., P. Gauthier, and S. Laroche, 2011: Evaluation of the impact of observations on analyses in 
       3D- and 4D-Var based on information content. {\it Mon. Wea. Rev.}, {\bf 139}, 726-737.

\item{}Massart, S., B. Pajot, A. Piacentini, and O. Pannekoucke, 2010: On the merits of using a 
        3D-FGAT assimilation scheme with an outer loop for atmospheric situations governed by transport. 
        {\it Mon. Wea. Rev.}, {\bf 138}, 4509-4522.

\item{}Moorthi, S., and M. J. Suarez, 1992: Relaxed Arakawa-Schubert: A parameterization of moist
       convection for general-circulation models. {\it Mon. Wea. Rev.}, {\bf 120}, 978-1002.

\item{}Rienecker, M. M., M. J. Suarez, R. Todling, J. Bacmeister, L. Takacs,
       H.-C. Liu, W. Gu, M. Sienkiewicz, R. D. Koster, R. Gelaro, and I. Stajner, 2008: 
       The GEOS-5 Data Assimilation System - Documentation of Versions 5.0.1, 5.1.0, and 5.2.0.
       NASA, TM 104606, Vol. 27, 118 pp.

\item{}Stieglitz, M., A. Ducharne, R. D. Koster, and M. J. Suarez, 2001: The impact of detailed snow physics
       on the simulation of snow cover and subsurface thermodynamics at continental scales. {\it J.
       Hydrometeor.} {\bf 2}, 228-242.

\item{}Tr\' emolet, Y., 2007: First-order and higher-order approximations of observation impact. 
       {\it Meteorologische Zeitschrift}, {\bf 16}, 693-694.

\item{}Tr\' emolet, Y., 2008: Computation of observation sensitivity and observation impact in
       incremental variational data assimilation. {\it Tellus}, {\bf 60A}, 964-978.

\item{}Todling, R., 2013: Comparing two approaches for assessing observation impact. 
       {\it Mon. Wea. Rev.}, {\bf 141}, 1484-1505.

\item{}Whitaker, J. S., T. M. Hamill, X. Wei, Y. Song, and Z. Toth, 2008:
       Ensemble data assimilation with the NCEP Global Forecast System. 
       {\it Mon. Wea. Rev.}, {\bf 136}, 463-482.

\item{}Wang, X., C. Snyder, and T. M. Hamill, 2007: On the theoretical 
       equivalence of differently proposed ensemble/3D-Var hybrid analysis 
       schemes. {\it Mon. Wea. Rev.}, {\bf 135}, 222-227.

\item{}Wu, W., R. J. Purser, and D. F. Parrish, 2002: Three dimensional
       variational analysis with spatially inhomogeneous covariances. 
       {\it Mon. Wea. Rev.}, {\bf 130}, 2905-2916.

\end{description}

%..........................................................................

\newpage

\addcontentsline{toc}{section}{Revision History}

%\vspace*{\fill}

\centerline{\huge\bf Revision History}

\bigskip

\bigskip

\begin{center}
\begin{tabular}{|l|l|l|l|}\hline
{\bf Version } & {\bf Version} & {\bf Pages Affected/}   \\
{\bf Number  } & {\bf Date}    & {\bf Extent of Changes} \\
\hline
\hline
Version 1.00 & May, 2013     & Initial Documentation \\
\hline
\end{tabular}
\end{center}

\vspace*{\fill}

%\appendix

\addcontentsline{toc}{part}{APPENDIX}

\centerline{\huge\bf APPENDIX}

%-------------------- USAGE FROM HERE DOWN --------------------------

\bigskip

This apprendix provides the command-line usage for all scripts behind the 
GEOS ensemble atmospheric data assimilation system. 

\input usage.txt

%-------------------- USAGE COMPLETE --------------------------
 
!EOI
