TUTORIAL: Using the I/O API

Introduction

The Models-3/EDSS I/O API is intended to provide an easy-to-learn, easy-to-use interface to data files for the model and model-related-tool developer. There are only a few topics you'll need to know about: The I/O API is a selective and direct-access interface to the data: you tell the system what variables and dates and times you're talking about and it figures all the stuff about record numbers, etc., for itself. Also, you don't have to read the data in consecutive order, or to write it in order, either -- you just ask for what you want, and the I/O API finds it for you (although there are moderate performance penalties for writing data out-of-order). The files are self-describing files -- that is, the file headers have all the dimensioning and descriptive information needed about the data in them.

There are versions of the I/O API callable from both Fortran and from C . This document describes the Fortran interface; the C interface is very similar. The major difference between the two are that Fortran LOGICAL functions returning .TRUE. or .FALSE. correspond to C functions returning 1 or 0, Fortran ".EXT" include-files correspond one-to-one to C ".h" include files, and the C calls look much like the Fortran calls, except that file descriptions are passed via pointers to data structures typedeffed in fdesc3.h instead of a COMMON found in FDESC3.EXT

There are 9 routines you'll need to know: INIT3() and SHUT3() to start up and shut down the I/O API; OPEN3() to open files; DESC3() to get file descriptions; READ3(), INTERP3(), XTRACT3(), and DDTVAR3() to access data values; and WRITE3() to store data to files. Additionally, M3EXIT() is a useful utility routine which works with the I/O API to generate exit (or error) messages to the program log, call SHUT3(), and then terminate the program with a user-supplied exit status (which should be 0 for success and nonzero for failure). There are three INCLUDE files you'll have to worry about. Each has extensive in-line documentation describing how it is used. PARMS3.EXT contains the dimensioning parameters and the "magic number" parameters used to control the operation of various routines in the I/O API. FDESC3.EXT has commons that hold file descriptions (more about that later); it needs PARMS3.EXT for its own dimensioning. Finally, IODECL3.EXT has declarations and usage comments for the various functions in the I/O API (it's really a short manual on the I/O API in its own right).

Files , Logical Names and Physical Names

The I/O API stores and retrieves data using files and virtual files , which have (optionally) multiple time steps of multiple layers of multiple variables . Files are formatted internally so that they are machine and network-independent -- you can FTP them freely across a wide variety of machines, or read files NFS-mounted from other machines, as well. (This behavior is unlike Fortran files, whose internal formats are vendor specific, so that the files don't FTP or NFS-mount very well). Each file has an internal description, consisting of the file type , the grid and coodinate descriptions , a set of descriptions for the file's set of variables, i.e., names, units specifications, and text descriptions. Each variable a set of layers and a sequence of time steps (quite possibly only one layer for some kinds of data, if you want, or only one time step, for time-independent data). In dealing with the files, we'll refer to files and variables by names, layers by number (from 1 to the number of layers in the file), and dates and times according to conventions described later. Rather than forcing the programmer and program-user to deal with hard-coded file names or hard-coded unit numbers, the I/O API introduces the concept of logical file names . As a modeler, you can define your own logical names as properties of a program (or even prompt the user for his own preferred logical names at run time) and then at run-time connect up the logical names to any "real" file name you want to, using the UNIX csh setenv command. Additionally, there are four standard logical names: LOGFILE, SCENFILE, and EXECUTION_ID, which may be used for the program log file, scenario-description file, and execution identifier, and GRIDDESC , for ASCII grid and coordinate system databases used by utility routine DSCGRID(). For programming purposes, the significant facts are that names should not contain blanks (except at the end: "foo " is OK; "f oo" is not), and are at most 16 characters long. When you run a program that uses the I/O API, you begin with a sequence of setenv commands that set the values for the program's logical names, much as you begin a (normal) Cray Fortran program with a sequence of ASSIGN commands for its files. For example, if "myprogram" has logical names "foo" and "bar" that I want to connect up to files "somedata.mymodel" and "otherdata.whatever" from directory "/tmp/mydir", the script for the program would look something like:
    ...
    setenv foo          /tmp/mydir/somedata.mymodel
    setenv bar          /tmp/mydir/otherdata.whatever
    setenv qux          "/tmp/mydir/volatilestuff.mymodel -v"
    setenv LOGFILE      /tmp/mydir/mymodel.log
    setenv SCENFILE     /tmp/mydir/test17a.description
    setenv EXECUTION_ID TEST17A
    /user/mydir/myprogram
    ...
VOLATILE files are indicated by a trailing  -v in the setenv command, as above, in order to tell the I/O API to perform disk-synch operations before every input and after every output operation on that file. Such files can be accessed by other programs while the generating program is still running, and are readable even if it fails to do a SHUT3() or M3EXIT() (or if it crashes unexpectedly).

BUFFERED virtual files can be used to provide safe, structured exchange of data -- of "gridded", "boundary", or "custom" types only -- between different modules in the same program. If you setenv the value of a logical name to the value BUFFERED, as given below:

    ...
    setenv qux BUFFERED
    ...
    /user/mydir/myprogram
    ...
then the I/O API will establish in-memory buffers and time indexing for "qux" instead of creating a physical file on disk. One module can then use WRITE3() (see below) to export data for sharing, which other modules would then use READ3() or INTERP3() to import. Note that since these routines associate the data with its simulation date-and-time, the system will notice the error (and warn the user) if you attempt to get and use data before it has been produced. Note also that by changing the setenv in the script between "BUFFERED" and a physical file-name, you can change between efficient data sharing between modules and high-resolution instrumentation of the data being shared, without changing the underlying program at all.

COUPLING-MODE virtual files can be used to provide PVM-based data exchange between cooperating programs using exactly the unchanged I/O&API programming interface, with the kind of name-based direct-access semantics that provides, with the extra scheduling condition that requests for data that has not yet been written put the requester to sleep until it becomes available (at which time the requester is awaked and given the requested data). The decision of which files are disk-based and which are COUPLING-MODE virtual files is also made by setenv commands at program-launch, the value being of the form "virtual <communications-channel-name>":

    ...
    setenv  zok  "virtual CHEM_CONC_3D_G3"
    /user/mydir/myprogram
    ...

Except for INIT3(), all of the I/O API routines are LOGICAL functions returning TRUE for success and FALSE for failure.

There are a number of dimensioning parameters and "magic number" token values for the I/O API. Throughout the I/O API, names (logical file names, variable names, and units) are character strings with maximum length NAMLEN3 = 16; descriptions are either one or MXDESC3 = 60 lines of length at most MXDLIN3 = 80. The I/O API currently supports up to MXFILE3 = 50 open files, each with up to MXVARS3 = 120 variables.

What Are the File Descriptions and Types of Data Supported

All files manipulated by the I/O API have multiple variables, each having possibly multiple layers. Within a file, all the variables are data arrays have the same dimensions, number of layers and the same structure-type of data , although possibly different basic types (e.g., gridded and boundary varaibles can't be mixed within the same file, but real and integer variables can). Each file has a time step structure shared by all of its variables, as well. There are three kinds of time-step structure supported: There are eight structure-types and three basic types of data supported by the I/O API. The structure-types are associated with file type parameter values ("magic numbers") CUSTOM3, DCTNRY3, GRDDED3, BNDARY3, IDDATA3, PROFIL3, GRNEST3, and SMATRX3 (for which GRDDED3 and BNDARY3 will account for almost all CTM uses), which are defined in INCLUDE-file PARMS3.EXT. The basic types are associated with "magic numbers" M3INT, M3REAL, and M3DBLE, also defined in INCLUDE-file PARMS3.EXT. Each of these data types supports multiple time steps of multiple layers of multiple user-defined variables, as indicated below. In some cases, there are additional system-defined variables which are part of the data structure (e.g., the NUMIDS in the ID-referenced data structure, below) . Where such system-defined variables are present, the operations READ3() and WRITE3() act on entire time steps (all variables) at once; otherwise, they can be used to store or retrieve time steps of individual variables one at a time. There are moderate performance advantages to writing the variables for a time step in the same order that they appear in the file description, and for writing the time steps in consecutive order; however, this is not required by the I/O API (which permits any access order to the data, for both read and write operations). The structural types are as follows, together with declarations for sample time step records of these data types: (In the examples, declarations are given for M3REAL variables in terms of REAL*4, etc, instead of merely REAL, to protect you in the cases that your compiler has a "-r8" flag, etc., which silently changes all REALs from 4-byte to 8-byte -- and causes you accidentally to be linked with an incompatible version of the library; if you never use this flag, don't worry.) The structure-types are: The include-file FDESC3.EXT contains heavily annotated declarations for all the variables in a file description, together with the two commons which are used by the I/O API to store and retrieve the file descriptions. The DESC3() routine takes a file and puts its description into the FDESC3 commons; OPEN3() does roughly the reverse when dealing with new or unknown files, taking a description from the FDESC3 commons and building a new file according to those specifications, or performing a consistency check with the description stored in the file's header. A typical call to DESC3() might look like:
        ..
        IF ( .NOT. DESC3( 'myfile' ) ) THEN
        ...(error:  probably the file hasn't been opened yet)
        END IF
Some of the items in a file description, such as the dates and times for file creation and update, and the name of the program which created the file, are maintained automatically by the system. Others describe the variables in the file: the file type (as described above), the number of variables, their names, unit designations, and descriptions, as well as the description of the file as a whole. Still others dimension the data: the number of layers and the grid dimensions (where for ID and profile files, the number of sites is mapped onto the rows dimension; for profile files, the number of vertical levels is mapped onto the columns dimension). Still other parts of the file description specify the geometry of the grid: the map projection used, its projection parameters, and the grid's location and cell-size relative to that map projection; the vertical-grid-coordinate type and the boundary values separating the model layers.

How to Start Up and Shut Down the I/O API

In order for the I/O API to start itself up correctly, and in order to make sure that files are closed (and that file headers are updated) correctly, you need to call the INIT3() function at the start of your program, and the SHUT3() function (which flushes headers for, and closes all files currently open) at the end, or else the CLOSE3() function for each file opened. Note that the utility routines M3ERR() and M3EXIT(), when used to shut down a program, will call SHUT3() correctly (as well as writing explanatory messages to the log).

INIT3() is an integer function. It returns the unit number to be used for the program's log (if you setenv LOGFILE, the I/O API's log and error messages will be written to this unit; otherwise, they go to standard output, unit 6). INIT3() can be called as many times as you want, to get the unit number for the program log.

SHUT3() is a logical function that returns TRUE if the system successfully flushed all I/O API files to disk and shut itself down, and FALSE if it failed. If it failed, there probably was a hardware problem -- not much you can do about it, but at least you ought to be able to know. It is legal to call SHUT3() and close down all files currently open, and then to call INIT3() again and open new ones. NOTE that utility routine M3EXIT() calls SHUT3() as the final step of its operation (in addition to generating a log-message with the current simulation date-and-time and the indicated message-text). CLOSE3() is a logical function that returns TRUE if the system successfully flushed the indicated file to disk and closed it, and FALSE if it failed.

How to Open (Create) and Get Descriptive Info About Files

Use OPEN3() to open files, whether files that already exist or files that are new. OPEN3() is a logical function that return TRUE when it succeeds, and FALSE when it fails. It also maintains much audit trail information stored in the file header automatically, and automates various logging activities. A couple of additional pieces of audit trail information requires a bit of work from you in setting up standard environment variables, if you want to take advantage of it: if you define the description of your program run in a text file of up to 60 lines of up to 80 characters each, and then setenv SCENFILE to that file before you run the program, then OPEN3 will copy the SCENFILE information into the headers of any output files for that program. Also, if you setenv EXECUTION_ID to your own identifier for the program execution, it will automate the storage and the logging of that identifier. Finally, if you setenv IOAPI_CHECK_HEADERS YES, then the I/O API will perform a sanity check on internal file descriptions -- checking that grid description parameters are in range, for example, or that vertical levels are either systematically increasing or systematically decreasing.

The arguments to OPEN3 are the name of the file, an INTEGER "magic number" indicating the type of open operation, and the caller's name for logging and audit-trail purposes. You can call OPEN3 many times for the same file without hurting anything, if you want -- as long as you don't first open it read-only and then try to change your mind, or try to open it as a NEW file after it is already open. Names and values for the mode-of-opening magic number argument are defined in PARMS3.EXT as the following:

In the last three cases, "new" "unknown" and "create/truncate," you fill in the file description from the INCLUDE file FDESC3.EXT to define the structure for the file, and then call OPEN3(). If the file doesn't exist in either of these cases, OPEN3() will use the information to create a new file according to your specifications, and open it for read/write access. In the "unknown"case, if the file already exists, OPEN3() will perform a consistency check between your supplied file description and the description found in the file's own header, and will return TRUE (and leave the file open) only if the two are consistent. Sample calls to OPEN3() for an input file 'myfile' and an output file 'my_newfile' might look like the following:
    ...
    IF ( .NOT OPEN3( 'myfile', FSREAD3, 'my program') ) THEN
    ...(some kind of error happened--deal with it here)
    END IF
    ...
    ...  (First, fill in the file's description for 'my_newfile'.
    ...   Then open it:)
    IF ( .NOT. OPEN3( 'my_newfile', FSNEW3, 'my program' ) ) THEN
    ...(some kind of error happened--deal with it here)
    END IF
There are also three sample programs that demonstrate how to use the I/O API to create various kinds of files -- gridded, boundary, and ID-referenced, with one or multiple layers, and either time-stepped or time-independent.

NOTE: Joan Novak (EPA) and Ed Bilicki (MCNC) have declared as a software standard that modeling programs may not use FSCREA3 as the mode for opening files. FSCREA3 is reserved for use by analysis/data extraction programs only.

To get a file's description, you use the DESC3() function. When you call DESC3(), it puts the file's complete description in the standard file description data structures in FDESC3.EXT . Note that the file must have been opened prior to calling DESC3(). A typical call might look like:

        
    ...
    IF ( .NOT. DESC3( ' myfile' ) ) THEN
    ...(some kind of error happened--deal with it here)
    ELSE
    ...(the FDESC3 commons now contain the file description:
    ... data type, dimensions, starting date&time, timestep, 
    ... list of variables and their descriptions, etc.)
    END IF
    ...

How to Read Data from Files

There are four routines with varying kinds of selectivity used to read or otherwise retrieve data from files: READ3() , XTRACT3() , INTERP3() , and DDTVAR3(). All of them are logical functions that return TRUE when they succeed, FALSE when they fail. The first two require that the time step you request be on the file -- they won't give you data for the half-hour, for example, if the file has hourly data only. Because it optimizes the interpolation problem for you, INTERP3 is probably the most useful of these. Note that it has a size argument -- you tell it how much data you expect, and it checks that against how much data the file thinks you ought to get, for error checking purposes. A typical INTERP3 call to read/interpolate the variable HNO3 to 12:30 PM on February 4, 1987 might look like:
        ...
        CHARACTER*16 FNAME, VNAME
        REAL*4       ARRAY( NCOLS, NROWS, NLAYS )
        ...
        IF ( .NOT. INTERP3( 'myfile', 'HNO3', 1987035, 123000,
        &                   NCOLS*NROWS*NLAYS, ARRAY ) ) THEN
        ...(some kind of error happened--deal with it here)
        END IF
With READ3() and XTRACT3(), you can use the "magic values" ALLVAR3' (= 'ALL', defined in PARMS3.EXT ) or ALLAYS3 (= -1, also defined in PARMS3.EXT) as variable name and/or layer number to read all variables or all layers from the file, respectively. For time independent files, the date and time arguments are ignored.

How to Write Data to Files

You use the logical function WRITE3() to write data to files. For gridded, boundary, and custom files, you may write either one time step of one variable at a time, or one entire time step of data at a time (in which case, use "magic value" ALLVAR3 (= 'ALL', defined in PARMS3.EXT ) as the variable-name. For ID-referenced, profile, and grid-nest files, you must write an entire time step at a time (i.e., the variable-name must be ALLVAR3). WRITE3() is affected by standard environment variable IOAPI_LOG_WRITE (which has default value "YES"); normally WRITE3() generates a log message for each write-operation successfully completed. However, if you setenv IOAPI_LOG_WRITE NO then these messages will be suppressed. Typical WRITE3() calls to write data for date and time JDATE:JTIME might look like the following:
    ...
    REAL*4       ARRAY( NCOLS, NROWS, NLAYS, NVARS )
    ...
    IF ( .NOT. WRITE3( 'myfile', 'HNO3', JDATE, JTIME, ARRAY ) ) THEN
    ...(some kind of error happened--deal with it here)
    END IF
    IF ( .NOT. WRITE3( 'afile', 'ALL', JDATE, JTIME, ARRAYB ) )  THEN
    ...(some kind of error happened--deal with it here)
    END IF

How to Manipulate Dates and Times, and Other Conventions

Throughout the EDSS and Models-3 systems -- and particularly in the I/O API -- dates and times (and time-steps) are stored as integers, using the coding formats
    HHMMSS  = 10000 * hour  +  100 * minutes  +  seconds
    YYYYDDD =  1000 * year  +  day
where the year is 4-digits (1994, say, rather than just 94), and the day is the Julian day-number (1,...,365 or 366). By convention, dates and times are stored in Greenwich Mean Time. There are two utility programs, juldate, for converting calendar dates to Julian dates, and gregdate, for converting Julian dates to calendar dates and reporting the day-of-the-week. Both of these programs also report whether daylight savings time is in effect for the specified date. There are also a number of utility routines available for manipulating dates and times. Note that for these routines, time steps may perfectly well be negative -- just make sure you keep the parts all positive or all negative; a time step of -33000 means to step three and a half hours into the past, for example. This way of representing dates and times is easy to understand and manipulate when you are watching code in the debugger (you don't have to turn "seconds since Jan. 1, 1970" into something meaningful for your model run, nor do you have to remember whether April has 30 days or 31 when your model run crosses over from April to May). The utility routines for manipulating dates and times are the following:

Diagrams showing the relationship of the grid and its layers to the header attributes XORIG3D, YORIG3F, VGLVS3D, etc., are available in Postscript, X bitmap, JPEG, and GIF image formats. Note that Layer 1 is the bottom layer in the modeling grid. Some development work needs yet to be done: we need to do some more work about the definition of grids and map projections , and then to define and write utility routines having to do with map projections, transformation of locations from one map projection to another, and interpolation of data from one grid to another (on possibly different map projections).


Previous: I/O Introduction Chapter

Next: Changes from the Previous I/O API Version

Related: Sample Programs

To: Models-3/EDSS I/O API: The Help Pages