
 _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
 _/                                                        _/
 _/           A  BEGINNER'S  GUIDE  TO  USING  THE    	   _/
 _/                                                        _/
 _/                   N A S A / N C A R                    _/
 _/                         Joint                          _/
 _/                     Finite  Volume                     _/
 _/                General Circulation Model               _/
 _/                                                        _/
 _/                      1  February  2002                 _/
 _/                                                        _/
 _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
 _/                                                        _/
 _/                                                        _/
 _/                The current Version # is                _/
 _/                                                        _/
 _/                      $VER = 1.3                        _/
 _/                                                        _/
 _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/




0. Statement of Purpose

This tutorial is a companion to, not a substitute for, the README file.
The tutorial is more helpful to the beginning user, and gives more
explicit procedures for rescuing crashed model runs.
The README file includes more advanced topics of interest to developers.

Examples of computer commands are shown in this tutorial preceded 
by the prompt ">".


1. Where to run the model

This released version of FVGCM could be running on the following
platforms, SGI OK2, IBM SP3, and PC Linux. At present local users
at DAO could run the model on NAS production machines (e.g.,
kalnay, jimpf0, or jimpf1) or NCCS machines (e.g., daley, mintz).
The home directory of DAO users physically resides on jimpf0 but
is mirrored by all of the above machines.  The model 
output is sent to the mass storage system (MSS) on helios1.


For security, login remotely to your NAS account using Secure SHell
> ssh yourid@kalnay.nas.nasa.gov
A password that will satisfy all of the NAS machines must have
at least 8 characters, with at least one capitalized, and at least
one numeric or special character.
Once you are logged into one NAS machine, you can rlogin to the others
without being prompted again for your password.

To ensure that background jobs will keep running after you log out,
your home directory must have a .logout file containing the command:
nohup
This means "NO Hang-UP".


2. Directory tree for model runs

Our convention is to install the model fvgcm-$VER into directory
$HOME/fvgcm-$VER, also known as the $SourceDir or "top" directory 
for this model.

Underneath the top directory is a case directory, $HOME/fvgcm-$VER/$CASE.
The "case" of the model refers to the name of a particular run.
We usually use a code name which describes the grid resolution,
such as "b55" meaning 2x2.5 degree horizontal grid and 55 
vertical layers.  Other case names might be "amip" or "test".

When the model is run, the job will be submitted from the case
directory.  The reason for these conventions is that the model
resolution is incorporated in the Fortran code as include files.
Thus each resolution case requires a new compilation of the model.

It is also important to set up a directory to hold the initial 
conditions and boundary conditions.  This is $FVData. For
DAO users, $FVData is pointed to /share/fvccm.


3. I.C. and B.C. 

The model requires that the user provide the following 
initial conditions and boundary conditions:

file content	file type    data format	example

water vapor 	zonal mean   direct access 	RandelH2O_2.bin
ozone       	zonal mean   netcdf format	o3.amip2_uars_fub_2deg.nc
SST & sea ice	map grid     netcdf format	sstsice_144x91.nc
surface data	map grid     netcdf format	surf.data_144x91

At this time we are working to adopt conventions for the names and
formats of these files in FVData.  Our goal is to always
include the number of horizontal grid elements in the filename
(ie: 144x91 for a 2x2.5 degree model).  There is also a dichotomy
between the use of netcdf format, an NCAR convention, versus direct
access GrADS files, a DAO convention.  For now you need to put the
files into the formats above, because this is what the code expects.
You may name them as you wish, but make sure these names are listed 
correctly in the User Configuration section of the job script.

SST data may be either daily, weekly, or monthly.
Run scripts and model will automatically recognize and use each time format.

4. Getting the latest model version

At present, the source code is maintained in a CVS repository on
hera.gsfc.nasa.gov. DAO Users could check out codes directly
via cvs commands. For non-local users, a UNIX tar file could be obtained
via email. Please contact Dr. Shian-Jiann Lin at lin@dao.gsfc.nasa.gov.



5. Compiling the model

This is also known as "making" the model.  The script configure
is used to choose a right Make.macros in different platforms,
and the script make.j will
automatically build the model executable using an approach that is
best for the given computing environment (example: parallel or not).
Thus it is very important to submit the script on the same machine 
where you will run the model, because often the operating systems
of the various NAS machines have different version numbers. 

> configure
> qsub -q @machine make.j


6. Installing the executable and run scripts

> make install

This step generates the three job scripts monthly.j.tmpl, weekly.j.tmpl, 
and fvgcm.j.  They are used, respectively, to run the model for a month
at a time, a week at a time, or more general cases.

Since version 0.9.8, the installation step will also prompt the user
interactively to bring over copies of the initial and boundary conditions 
that go into the $FVData directory.


7. Restart files

The model is iterative in time.  At the end of any given time step, 
a snapshot of the model state is written as "restart files".
In order to start the model from scratch, you need restart files
created by spinup.  In order to restart the model after a crash,
you need to supply the last intact set of restart files.  

The current restart files reside in the directory $HOME/fvgcm-$VER/$CASE, 
have generic names, and are continuously rewritten.  Therefore when you 
get the first set of restart files from spinup, you should put a backup 
copy in the $HOME/fvgcm-$VER directory where they won't be overwritten.

The generic names and contents of the restart files are as follows:

d_rst		dynamics restart
p_rst		physics  restart
lsm_rst		land-surface model restart
lsm_hst		land-surface model history
lsm.rpointer	two-line pointer file

When the model stops, either intentionally or due to a crash, you need
to know to what time step the restart files correspond, because they
represent the last complete iteration.  There are two ways of finding out.

Method one is to list the contents of lsm.rpointer, which gives the
date of the restart file and the date of the history file, which is
ALWAYS one month earlier.  If the simulation did not run over a month
or longer, no history file is written.
Thus the contents of lsm.rpointer will look 
either like this		or like this
./lsmr_19930101_00000		./lsmr_19960316_00000                  
./lsmh_1992-12.nc		history_file_not_open

Method two is to use the "restart date" utility, called rst_date,
to read the date of the d_rst file.  This will appear in 
$HOME/fvgcm-$VER/util/rst_date after you compile the model.
> cd util
> ./rst_date $HOME/fvgcm-$VER/$CASE/d_rst


8. How the template script controls the model

Each of the scripts, monthly.j.tmpl, weekly.j.tmpl, or fvgcm.j,
is a "template" for repeated iteration of the model.  The user 
copies and renames the template, edits the User Configuration to 
run a particular case, and submits it to the run queue (more on 
queue selection later).  The model will complete the specified set
of iterations, and then return to the template file to automatically
prepare a new job script for submission.  This will repeat until
the model reaches the end time specified in the script. 

A parameter "MaxCount" has been added to the script which increases 
the number of iterations run in a single job.  
The motivation for this parameter is to maximize the amount of time 
that the job spends in the queue before it has to be resubmitted 
and wait again for a slot.  In order to use MaxCount effectively, 
you need to know approximately how much real time each iteration 
takes to run (called "walltime"), versus how much time you are 
allowed in the queue per job.  When in doubt, leave MaxCount = 1.


9. Editing the User Configuration

Now you have all the pieces in place and you are ready to run the model.
To set the model parameters for your particular case, you need to edit
the User Configuration section of the model script.  This is the only
section of the script that the general user should ever change, on
pain of death (just joking).

Always copy the template and save the original before making changes.

Below we provide an example of the User Configuration in the script 
monthly.j.tmpl on the left side of the page, with parameter 
explanations on the right side (after #).  
All of these parameters are explained in the README file, where they 
are grouped according to whether they control the system configuration 
or the model configuration.  Here we present the parameters as they 
appear in the script, and comment only on those that the general user
needs to know.  By UNIX convention, 1 means TRUE and 0 means FALSE.

#
# ##############################
# #  Start User Configuration  #
# ##############################
#
# ... System Configuration
#
 limit stacksize   4000000              # KB
 setenv MP_SLAVE_STACKSIZE 100000000    # bytes

 setenv NCPUS      32 			#
 setenv VER        fvgcm-0.9.8		# An example of an older version
 setenv CASE       amip			# This case was named "amip"
#
 set SourceDir   = ${HOME}/${VER}	#
 set FVData      = ${HOME}/FVData	#
#
 set WorkDir     = ${SCRATCH1}		# Choose 1or 2 based on disk free
#
# Postprocessing:
#
#     Interpolate diagnostics output from eta to pressure 
#     coordinates, with an option to save the interpolated
#     dataset (at pressure levels) instead of the original
#     dataset (at eta levels). SavePres=1 implies that the
#     eta dataset will not be archived to the MSS.
#
 set PostProc    = 1            	# You probably want post-processing.
 set SavePres    = 1            	# 
#
 set SaveToMSS   = 1            	# You want Yes. 
 set MSSHost     = helios1		# Currently the only MSS option
#
 set SpinUp      = 0            	# Is this a spinup run?
 set Benchmark   = 0            	# 
 set CleanSilo   = 1            	# You want Yes.  This means that after
					# output data transfers to MSS, the
					# Silo area of disk is erased.
#
 set MaxCount    = 3			# See "How the script controls the model"
#
# ... Model Configuration
#
 set Days   = (31 28 31 30 31 30 31 31 30 31 30 31)
 set Months = (Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
 
 set Year   = YEAR           		# model  year, example: 1992
 set Mon    = MONTH          		# model month, example: 02

 set Year   = `echo ${Year} | awk '{printf "%4.4d", $1}' -`
 set Mon    = `echo ${Mon}  | awk '{printf "%2.2d", $1}' -`
 set mmsave = ${Mon}
 set days   = ${Days[${Mon}]}
#
 set count  = 1
 while ( ${count} <= ${MaxCount} )      # time loop (monthly)
#
 cat >! ${SourceDir}/${CASE}/ccmrun.namelist << EOF
#
 &INPUT
  JOB      = 'fvgcm_${CASE}',		#
  NCPATH   = '${FVData}',		#
  SSTDATA  = 'sstsice_144x91.ieee64.nc',# this single file contains both
					# Sea Surface Temperatures and Sea ICE
					# wherever temp < 1.8 C
  SSTCYC   = .false.,			# SST CYCLE:
					# .false. = using real timeseries data
					# .true.  = using climatology average
#
  OZNDATA  = 'o3.amip2_uars_fub_2deg.nc',# ozone climatology
  H2ODATA  = 'RandelH2O_2.bin',		# water vapor climatology
  SRFDATA  = 'surf.data_144x91',	# surface data
#
  NYMDB    = 19780131,			# Beginning YMD = Year/Month/Day
					# set to end of spinup
  NHMSB    = 000000,			# Beginning HMS = Hour/Min/Sec
  NYMDE    = 19960401,			# Ending    YMD = Year/Month/Day
					# for entire experiment run
  NHMSE    = 000000,			# Ending    HMS = Hour/Min/Sec
  NDAY     = ${days},			#
  PDT      = 1800  ,			#
  MDT      = 1800  ,			#
  NDOUT    = 060000 ,			#
  NGOUT    = 240000 ,			#
      nsplit = 5,			#
      zstat = .true.  ,			#
      ccm3  = .true.  ,			#
      rayf = .true.   ,			# Rayleigh Friction
      iuhs = 80 ,			#
      iuic = 81 ,			#
      iout = 82 ,			#
      izou = 99 ,			#
      nsrest = $Restart ,		#
      diag = .true.,			#
 &END
EOF

#
# Flags to drive DCA and GWD, added by Bowen 06/16/99
#
cat >! ${SourceDir}/${CASE}/ccmflags.namelist << EOF
 &FLAGS
      dcaf = .true. ,			# Dry Convection Adjustment
					# needed near mesopause
					# default=.true. (was off for amip)
      nlvdry = 5    ,			# number DCA levels
      gwdf = .true. ,			# Gravity Wave Drag
					# default=.true. 
 &END
EOF#
# For lsm 
#
 set locpnr     = .
 set rest_pfile = lsm.rpointer
#
# ############################
# #  End User Configuration  #
# ############################
#


10. Queue selection 

First check the queue status to see which of the production machines
(kalnay, jimpf0, jimpf1) is least busy.  (The Users Forum is talking
about consolidating the separate machine queues).

> qstat @machine

In practice, there are separate queues for jobs that request 
8, 16, 24, or 32 cpu.  There are also unwritten conventions as to
who is allowed on which queues at what times.  Check to make sure
you have permission from the powers that be to use the amount of
resources that your job requests.  The resource request of your job
appears in the PBS commands at the top of your script, and again
in the User Configuration.  Here are sample parameter settings for
running a 16 cpu or 32 cpu job:

# ------------------------------	# ------------------------------
#PBS -l ncpus=16			#PBS -l ncpus=32
#PBS -l walltime=18:00:00		#PBS -l walltime=12:00:00
#PBS -l mem=4gb				#PBS -l mem=8gb
#PBS -S /bin/csh			#PBS -S /bin/csh
#PBS -m be				#PBS -m be
#PBS -V					#PBS -V
#PBS -j eo				#PBS -j eo
# ------------------------------	# ------------------------------

also, in User Configuration:		also, in User Configuration:
 setenv NCPUS      16 			 setenv NCPUS      32

(Remember to adjust MaxCount as needed if you change queues.)

The user should also choose between Scratch areas 1 and 2 
(specified inside the User Configuration) on the production machine 
based on available space reported by the command "disk free"
> df 


11. Submit your job to the run queue

> cd $HOME/fvgcm-$VER/$CASE
> qsub -q @machine monthly.j.MM 

If you change your mind about the submission, 
get the job [id#] from qstat and 
> qdel [id#]


12. Sample job output

Once a given iteration of the model has run successfully, the output data
are transfered from the $Silo area of the production machine to the MSS
machine helios1.  The file names, dates, and sizes follow a pattern that
will become familiar to you, enabling you to tell at a glance if the
output data set is complete.

For example, a one month run of the amip case over Dec1992 will generate
the following output files:

> rlogin helios1
> ls -al 
     28933668 May 26 07:38 amip_d19930101.000000
   1074056256 May 26 07:39 amip_diag_prs_19921201.000000-19930101.000000
         3989 May 26 07:39 amip_diag_prs_19921201.000000-19930101.000000.ctl
     34646976 May 26 07:38 amip_diag_prs_tm_19921201.000000-19930101.000000
         3992 May 26 07:38 amip_diag_prs_tm_19921201.000000-19930101.000000.ctl
      2725816 May 26 07:38 amip_p19930101.000000
   2177692960 May 26 07:38 amip_rout_19921201.000000-19930101.000000
       204600 May 26 07:38 amip_zavg_19921201.000000-19930101.000000
          162 May 26 07:38 lsm.rpointer
      8354292 May 26 07:38 lsmh_1992-12.nc
     11647528 May 26 07:38 lsmr_19930101_00000

All of the file names begin with the case name, which in this example 
is "amip".  Some of the output files are snapshots taken at the end 
of the time period, while others are averages which are labeled by the 
entire time frame they cover.  The diagnostic outputs are GrADS files 
which are accompanied by control files that enable the user to look at 
them with the Grid Analysis and Display System program package.

The snapshot output files correspond to the restart files that are 
written to the directory $HOME/fvgcm-$VER/$CASE:

amip_d19930101.000000 == d_rst
amip_p19930101.000000 == p_rst
lsmr_19930101_0000    == lsm_rst
lsmh_1992-12.nc	      == lsm_hst
lsm.rpointer	      == lsm.rpointer

In general, the output files 
CASE_diag_prs_YYYYMMDD.hhmmss-YYYYMMDD.hhmmss
and CASE_rout_YYYYMMDD.hhmmss-YYYYMMDD.hhmmss
will vary in size depending on the number of days in the month, 
whereas the others are always the same size.  Because these two files 
are written in direct access format, with no spacers between records, 
you should be able to verify that the file size scales exactly as 
the number of days in the month.  

The file 
CASE_rout_YYYYMMDD.hhmmss-YYYYMMDD.hhmmss 
contains instantaneous model output at 6-hour time intervals
on the "eta levels" of the vertical grid.  The file
CASE_diag_prs_YYYYMMDD.hhmmss-YYYYMMDD.hhmmss
contains daily averages of model output on pressure levels, and
CASE_diag_prs_tm_YYYYMMDD.hhmmss-YYYYMMDD.hhmmss are monthly means.


13. Monitoring your job as it runs

> ps -u yourid
This command will show any running processes.

While the job is running, you will continually see it in the queue.
> qstat @machine
A sample response could be:

Job id           Name             User             Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
10404.kalnay     amip_m.12        yourid           37:57:40 R q32_8gb 

The script will generate Email to your NAS account each time it 
begins or ends another job.  This is governed by a PBS command 
near the top of the script:
#PBS -m be
A job that finished without incident will have Exit_status = 0

You will see output files from the model appearing on the MSS machine 
helios1 under directory /silo4/yourid/fvgcm-$VER/$CASE, and you can 
check that they have the correct names, dates, and sizes.

Each job and data transfer that you run will create a log file in your
directory $HOME/fvgcm-$VER/$CASE, with names (for monthly iterations)
CASE_m.MM.eXXXX and CASE_xfr.MM.eXXXX,
where MM is the month and XXXX is the job ID.
Perusing the contents of the log file will give you some idea how the 
model is running, by looking at the maximum and minimum values of 
variables such as pressure, temperature, wind, and specific humidity.  


14. Restarting the model to run more iterations

Suppose you have run one year of simulations, the results look good, 
and you want to run another year.  You can restart the model with
these four steps:
 
(1) Check that the correct restart files are located in the case directory.
(2) Edit the script parameters Year and Mon to reflect the current time.
(3) Edit the script parameters NYMDE and NHMSE to the new end time.
(4) Submit the script to the queue.

DO NOT change the beginning time; that remains set to the end of the 
spinup time period.  Optionally, you may want to recheck the amount 
of free disk space and change the scratch area.  


15. "My job has fallen and it can't get up"; How to tell what went wrong

When you are first starting a model run, the most likely cause of a crash
is some mistake in the setting of the model parameters and data files.
Once the model has been running for a few iterations, the most likely
cause of a crash is a problem with the NAS computing environment.
That environment could best be described with words like "changeable",
"volatile", "unstable", "aggravating", or the ever popular "$#*@^!"

Serious problems on NAS machines are announced by Email to 
dao-users@nas.nasa.gov; you should get on this mailing list.  
If you can login to one of the NAS machines, recent news regarding
system status is available online via the command
> nasstat

The best source of information on a job crash is the error message
you will find at the end of the most recent log file.


16. How to restart your model after a crash

Remember that the model is iterative.  You can always restart from the 
last good iteration.  You just have to figure out when that was and 
make sure the appropriate restart files are in your case subdirectory.  
Then edit the current time and end time (not the beginning time!) 
in the job script and resubmit.

If you have to copy restart files from MSS back to the front end or
production machine, use ftp, because the use of rcp is frowned upon 
at NAS and ftp is easier than writing a transfer script.  

Sometimes when the production machine is rebooted, the job restarts 
itself with no problem, but the output is no longer transfered to MSS.
Again, move it by ftp.  If the model continues to dump the output 
locally, change the end time in the template to the end of the current
iteration.  That will stop the model; when you start it again the
transfer process will reset correctly.


17. Post Processing

In the past, the "post" processing was done after the model script 
had run, but it is now incorporated as an option in the job script.  
This processing refers to the interpolation to pressure coordinates 
and the creation of time-mean diagnostics.
The program $HOME/fvgcm-$VER/util/diagpp.F is run on files like 

amip_diag_prs_19921201.000000-19930101.000000
to create files like
amip_diag_prs_tm_19921201.000000-19930101.000000

Alternatively, time-mean files can be created from the _rout_ 
monthly outputs by running George Lai's program reftm.F
(this does NOT include pressure interpolation?)

> reftm CASE_rout_YYYYMMDD.hhmmss-YYYYMMDD.hhmmss
creates the GrADS file
        CASE_rout_tm_YYYYMMDD.hhmmss-YYYYMMDD.hhmmss


18. Sharing your results

The NAS default security is full protection from group or others.
In order to let members of your group read and execute your files,
you must edit the files .cshrc and .login to add the command 
umask 027
In addition, you need to change the permissions mode of your output
files on helios1, as well the permission mode of the directory tree
leading to those files.
helios1.yourid> cd fvgcm-$VER/$CASE/$YEAR
helios1.yourid> chmod g+r *
helios1.yourid> cd ..
helios1.yourid> chmod g+r $YEAR
helios1.yourid> cd ..
helios1.yourid> chmod g+r $CASE


19. Need more help?

Email the model development team at fvccm@dao.gsfc.nasa.gov
