University of Chicago
CENTER FOR INTEGRATING
STATISTICAL AND ENVIRONMENTAL SCIENCE *
* A research
center funded by the United States Environmental Protection Agency
Tutorial
- ioapiTools
Author: Alexis Zubrow
Last modified 3/1/2005
A. Opening a File for Reading
B. Extracting a Variable
C. Projection and Domain Info
D. Subsetting
E. Modifying Variable Data
F. Modifying Metadata
G. Writing to a FileA. Concatenating
B. Modifying Iometa Metadata
C. Coordinate Conversion
D. Changing between iovar, cdms
and Numeric Arrays
6. Cheat Sheet of Functions and Methods
The ioapiTools module is a python toolkit for opening, manipulating and writing ioapi format files. It provides functions for extracting and manipulating data (individual variables) and writing to either an ioapi file or to a CF compliant netCDF file with ioapi metadata. Both CMAQ and SMOKE use the ioapi format for their inputs and outputs. For the rest of this document, we will simply refer to these two formats as "ioapi" and "CF".
This module has three major objects that the general user will likely use. The first is an iofile object, which acts as a file handle and a means (i.e. has methods) to read and write data. The second is an iovar object. This object holds the data of a particular variable in an array. At the same time, it retains the variables metadata and helps keep this metadata consistent with changes to the data (e.g. subsetting). The third object is iometa, which contains geographic projection information as well as ioapi metadata needed to write to an ioapi file.
This document is organized under common tasks. Each task begins with some examples. After the examples, each section has an explanation that fills in some of the details. The impatient user can probably get enough from the examples to get up and running without having to read the descriptions. Python's "help" function provides more detailed information on specific objects, methods, and functions.
Back to Contents
Python Intro:
http://www.hetland.org/python/instant-python.php
http://www.python.org/doc/current/tut/tut.html
cdms/cdat documentation:
Numeric documentation:
http://www.pfdubois.com/numpy/html2/numpy.html
IOAPI documentation:
http://www.cep.unc.edu/empd/EDSS/ioapi/
Rpy documentation:
http://rpy.sourceforge.net/
- Recommended Reading: Martelli, Alex. 2003. "Python in a Nutshell."
O'Reilly & Assoc, Sebastopol, CA
Back to Contents
At present, the ioapiTools support reading from and writing to ioapi and CF files. In particular, the variables in these files need to include all four dimensions (time, layer, row or latitude, and col or longitude). For example, the tools can access any of the cctm files produced by CMAQ, but they will fail on reading boundary files. In future versions, the tools will support reading ioapi files with three- and four-dimensional variables.
The following geographic projections are presently supported by the tools:
Lambert Conformal Conic
I use the term CF files to refer to netCDF files that attempt to hold to the standards of the netCDF Climate and Forecast metadata conventions:
In addition, these CF files have two essential variables. The first is a grid-mapping variable named after the specific geographic projection. The grid-mapping variable contains all the projection information. The second variable is "ioapi_meta." This ancillary variable retains all the additional metadata necessary to reconstruct the original ioapi file.
Back to Contents
Example 1:
$ python>>> import ioapiTools as ioT
>>> f = ioT.open("~/tmp/CCTM_ACONC.D2.001")
Example 2:
>>> g = ioT.open("~/tmp/o3_aconc.nc")
Example 3:
>>> fs = ioT.scan("~/tmp/CCTM_ACONC.D2.0*", "1996-06-24 12:00", \
"1996-06-26 22:00")
First of all, one needs to start the python interpreter's shell. Here, we simply type "python" at the Unix prompt. The python shell's prompt is indicated by ">>>". There are many ways of starting a python shell, but I happen to like using python through emacs or through the idle interface. After starting the shell, the first line indicates how to load the library. Load it with the name "ioT" to reduce the amount of typing. All functions and flags will be elements of this module (e.g. "ioT.open" is the open function under this module, vs. the standard file open function). The first and second examples both return iofile objects, "f" and "g" respectively. The only difference is the type of file; the first example is an ioapi file and the second is a CF file. There is no need to pass type flags because the open function automatically determines the type of file for reading. If the file is neither ioapi nor CF format, an error will be raised.
In the third example, the scan function returns an iofilescan object, "fs". This object is similar to an iofile object except that it can span multiple physical files. In the call sequence, scan searches for all files with the pattern "CTM_ACONC.D2.0*". It then searches through those files that match this pattern for the specific date range, in this case 1996-6-24 12:00 through and including 1996-6-26 22:00. Future method calls of "fs" will act on this data range independent of the number of files this range spans (see for example Extracting a Variable below). When defining a search string, it is important that all matching files have the same file type, i.e. either all ioapi or all CF.
Back to Contents
Example 1:
>>> f.listvariables()
>>> no2 = f("no2")
>>> co = f("co")
>>> o3 = g("o3")
Example 2:
>>> o3_long = fs("o3")
>>> o3.getTime()
>>> o3_long.getTime()
The first example starts with a method of iofile, listvariables. As its name indicates, it returns a list where each element of the list is a variable name in that file. In the second, third and fourth lines we extract individual variables from the file. The last line is exactly the same as the other extractions except that the variable is coming from a CF file versus the earlier ioapi files. The call sequence is identical, and because the data in the two files is from the same temporal and spatial domain, the variables can be mixed and matched. When making this call, the iofile object is actually calling the extract method, i.e. f("no2") is equivalent to f.extract("no2"). In all these cases, the variable's data and corresponding metadata is stored in the returned iovar object, e.g. "no2".In the second example, "o3_long" is also an iovar object, but here the data is extracted from multiple files and pieced together into one long array. The next two lines demonstrates the close connection between the data and the metadata. The iovar method, getTime, returns the time axis of the object, showing the difference in temporal range of o3 and o3_long.
Back to Contents
Example 1:
>>> dir(o3)
>>> print o3.ioM
Example 2:
>>> o3
>>> o3.getTime()
>>> o3.getLevel()
>>> o3.getLatitude()
>>> o3.getLongitude()
>>> o3.getLatitude().getValue()
>>> o3.getLongitude().getBounds()
>>> o3.getTime().asComponentTime()
The first line of Example 1 shows how to list the methods and attributes of an object, in this case o3. This will work for any object or module. This directory call returns a very long list of methods and attributes. The iovar object is based on (inherits from in "object speak") the cdms transient variable object. Most of the methods shown here are inherited from the cdms variable and you will likely only need to access a select few of them. I have added a series of methods, all starting with "IO" that will be discussed below. In addition, I added an attribute ioM, which is an iometa object. The iometa object is the key repository for ioapi and projection metadata. The second line prints an overview of the ioM attributes.
At present, the ioapiTools only support Lambert Conformal Conic projections. The native projection coordinates are called yLat and xLon. These native projection coordinates are in meters from the projection center. For example, a coordinate of (-23000, 6000) is 23000 meters West and 6000 meters North of the projection center, in this case -90 E, 40 N. The native coordinate system defines a regular grid in meters. It is important to understand that the cell centers are not a regular grid if they were transformed into lon, lat coordinates (in Degrees). The ioapiTools include functions and methods for converting between yLat, xlon (in meters) and lon, lat (in degrees) and row, col (0 based index). For examples, see IOsubset and coordConv below.
Example 2 describes the spatial and temporal domains. The first line simply displays the array info (in time, layer, row, col order). The second and third lines return the time and layer axes, respectively. The fourth and fifth lines return the latitude (yLat) and longitude (xLon) axes, respectively. Notice that the spatial axes are in the native coordinate system (i.e. meters from the projection center). The sixth line returns an array of latitude values for the center of each grid cell. In fact, all of the returned axes are objects in themselves and have the getValue method. The seventh line returns an array of longitudinal boundary values for each cell. The final line returns a time axis as seen before, but it uses a method of that axis to return an array of date values in cdtime format. In these tools, time can take one of four different formats: as an index, a string, a cdtime object or an mx.DateTime object.
Back to Contents
Example 1:
>>> help(o3.IOsubset)
>>> from mx import DateTime as D
>>> start = D.DateTime(1996, 6, 24, 12)
>>> end = start + 10*D.oneHour
>>> lat1 = 37.02
>>> lat2 = 39.276
>>> lon1 = -92.694
>>> lon2 = -90.358
>>> o3_sub = o3.IOsubset([(lon1,lat1), (lon2,lat2)], timeLst = [start,end])
Example 2:
>>> xlon1 = -234000.0
>>> xlon2 = -30000.0
>>> ylat1 = -318000.0
>>> ylat2 = -78000.0
>>> startCd = ioT.dates2Cdtime(start)
>>> endCd = ioT.dates2Cdtime(end)
>>> o3_sub2 = o3(time = (startCd,endCd), latitude=(ylat1, ylat2, "ccb"), \
longitude=(xlon1, xlon2, "ccb"))
Example 3:
>>> o3_sub3 = o3[6:17,:,5:26,3:21]
>>> o3_sub[0,0,0]
>>> o3_sub2[0,0,0]
>>> o3_sub3[0,0,0]
The first line of Example 1 prints help for this particular method. The help function is a general utility that will print any available help for a function, object, method, or module. In general, I have provided more detailed information in the help output than is discussed here in this tutorial.
In the three examples, we are creating three identical subsets. The first example is the most robust method. It allows you to subset based on any of the three spatial coordinate systems: lon lat, xLon yLat, or row col (see Projection and Domain Info for more specifics). It also allows you to subset based on time index or date (string, cdtime, or mx.DateTime object format) and based on either layer index or sigma levels. In this example, we start by setting up the parameters of the subset. The second line imports a new module, DateTime, which is then used to create a particular date. The mx.DateTime objects are particularly useful for performing date arithmetic, for example line four. We then define some spatial parameters, in this case using lon lat in degrees. The last line is the actual subset, returning the iovar object o3_sub.
In Example 2, we again define some spatial coordinates. In this case, they are in terms of yLat and xLon, meters from projection center (see Projection and Domain Info for details). We then convert the start and end dates from DateTime to cdtime format. Finally, we make a call to the iovar object to subset the variable along the time, latitude and longitude axes. This call is using cdms selectors, see the cdat documentation for more specifics on selectors.
In Example 3, we are using array notation to subset the variable. In python, the slicing convention for arrays and lists is somewhat different than in other languages. The slice does not include the upper bound. So, for example, "6:17" means array elements 6 through 16 (it does not include the endpoint 17). In our example, we extract elements six through 16 from the first dimension, all of the second dimension, elements 5 through 25 from the third dimension, and elements 3 through 20 from the fourth dimension. These dimensions correspond to time, layer, row, and col respectively. See the Numerics documentation for more specifics. After making the three subsets, one can inspect the data. Here, we show one dimension of the data to demonstrate the equivalence of the three subsets.
In comparing the three examples, Example 3 is the most efficient. Example 1 has the most overhead, which is no real surprise, seeing that it is the most flexible.
Back to Contents
Example 1:
>>> o3_tmp = o3 * 2
Example 2:
>>> o3_tmp2 = o3.IOclone()
>>> o3_tmp2[12,0,22,26] = 0.1
Example 3:
>>> o3_tmp3 = o3.IOclone()
>>> for i in range(24):
o3_tmp3[i] += i
Example 4:
>>> o3_tmp4 = o3.IOclone()
>>> o3_tmp4[12,0,30:] = 2*o3[12,0,30:]
Example 5:
>>> no=f("no")
>>> nox=no+no2
All these examples take advantage of array slicing and array arithmetic. You should look at the Numeric documentation for more information. In the first example, o3_tmp is identical to o3, except that all of its values (each element of the 4 dimensional array) are multiplied by 2.In the second example, we first make a copy of the iovar, so as not to change the values of the original variable. The IOclone function copies all the data and metadata information. We then change the value of a single element of o3_tmp2.
In Example 3, we again make a clone of the variable. We then add the time index, "i", to each time slice of the array.
In Example 4, we copy the array and only change a very specific slice of the data. Here, we are only modifying time 12, layer 0, rows 30 and higher, for all columns. In these four examples, the metadata is identical to o3. To modify the metadata, see the next section.
Finally, in Example 5, we simply add multiple iovar objects. For this to work, the iovar objects need to be the same size and dimension. We will still need to modify the metadata; see the next section.
Back to Contents
Example 1:
>>> o3_tmp = o3.IOclone()
>>> o3_tmp.IOchangeDate("1999-08-01 8:00")
>>> o3_tmp.getTime()
Example 2:
>>> o3_tmp2 = o3 * 1000
>>> print o3_tmp2.ioM
>>> o3_tmp2.IOmodVar(vunits="ppbV", desc="Variable O3 -- in ppb")
>>> print o3_tmp2.ioM
In the first example, we first make a clone of o3. At this point o3 and o3_tmp have the same time axis (24 hours starting at 1996-06-24 6:00). We then change the date using the IOchangeDate method. This can take a string, a cdtime, or a DateTime object. This retains the temporal range, 24 hours, but changes the starting point.In the second example we want to convert o3 from ppm to ppb. The first line changes the data. The second line shows that the metadata has not changed, specifically the variable units. In the third line, we change the variable units and the variable description.
All the metadata values are simply attributes of the iometa and iovar objects. One can easily display these attributes and modify them by hand. One should be very careful in directly modifying the attributes of either the iometa or iovar objects. If they are modified by hand, there is a good chance that they will either be overwritten by subsequent changes to the variable or that they may cause an error in subsetting or writing these variables. It is recommended that you limit your metadata modifications to the methods provided here and to methods of the iometa object (see Advanced Usage below).
Back to Contents
Example 1:
>>> h = ioT.open("~/tmp/test1.nc","w")
>>> h.write(o3)
>>> h.write([no2,co])
>>> h.close()
Example 2:
>>> k = ioT.open("~/tmp/test1.ioapi", "w", ioT.iofileFlag)
>>> k.write([o3,no2,co])
>>> k.close()
Example 3:
>>> newM = ioT.combineMeta([o3,no2,co])
>>> l = ioT.open("~/tmp/test2.ioapi","w", ioT.iofileFlag, newM)
>>> l.write(no2)
>>> l.write([co,o3])
>>> l.close()
In Example 1, we open a new file for writing. This returns "h", an iofile object. If the file already exists, it will be first removed before anything is written to the new file. By default, the new file is a CF formatted file. In the next line, iovar o3 is written to the file. Then, both no2 and co iovars are written, and finally the file is closed. If we look at the header of this file, for example using "ncdump", we see that there is a good amount of additional information stored in this netCDF file. There is a grid_map variable or projection information and an ioapi_meta variable, which stores information needed to recreate an ioapi file. These two variables basically make up the iometa object. Next, we see each of the axes. Finally, we have the variables. Each of the variables is a function of the physical axes (in contrast to ioapi format seen below) and make reference to the projection and ioapi_meta variables.
In the second and third examples, we open an ioapi file for writing. In Example 2., we open it for a single write. The reason for this limitation is that ioapi files need to be opened with all of its metadata written to the file header. This metadata includes the number and names of the variables in the file. In this case, the file doesn't actually come into existence until the write command. Here, the file's metadata is first set for these three variables. If you tried to make a second write to this file, the method would raise an error. I recommend using ncdump on the file to compare the header information with "test1.nc".
In the third example, we open the ioapi file for multiple writes. The first line extracts some metadata from the three variables that will eventually be written to the file. The second line opens the file for writing. Unlike Example 2, here the file is actually opened and the metadata is written to the header. This metadata includes information on the number of variables that will eventually be written to this file and the names of these variables. The next two lines actually write the variables to the file, and the final line closes the file. If you try to write a variable with a name not in the list of variables, or more than the three variables, the ioapi write will fail.
Back to Contents
In this section, we discuss some advanced usage and concepts in the ioapiTools module. Some of the topics include examples, but the emphasis here is less on the example and more on exploring a particular concept.
Example:
>>> f = ioT.open("~/tmp/CCTM_ACONC.D2.001")
>>> g = ioT.open("~/tmp/CCTM_ACONC.D2.002")
>>> o3_1 = f("o3")
>>> o3_2 = g("o3")
>>> o3_conc = ioT.concatenate([o3_1,o3_2])
>>> o3_conc.getTime()
Here, we concatenate the two variables o3_1 and o3_2 along the time axis. The concatenate function will only concatenate variables of equal spatial dimensions (layer, yLat, xLon) and will always concatenate along the time axis. The variables do not need to be continuous, for example one variable could be hourly data from 1996-6-24 and the second variable could be hourly data from 1996-6-27. But, non-continuous variables need to be stored in CF form to properly retain the time axis. The variables should not have overlapping dates or subsetting may act unpredictably. Other than time, all other metadata is copied from the first variable in the variable list.Back to Contents
Example:
>>> newM = ioT.combineMeta([o3])
>>> ioM = newM[0].copy()
>>> print ioM
>>> ioM.replVar(["O3","NO2","CO"], ["ppmV","ppmV","ppmV"], \
["Variable O3", "Variable NO2", "Variable CO"])
>>> ioM.gridName = "M_test"
>>> print ioM
>>> newM = (ioM, newM[1])
>>> k = ioT.open("~/tmp/test3.ioapi", "w",ioT.iofileFlag, newM)
>>> k.write(no2)
>>> k.write(co)
>>> k.write(o3)
>>> k.close()
There may be times in which one needs to directly modify the iometa object. The safest way is by using the replVar method, which modifies variable(s)'s name, units, and description, as well as the number of variables. In section "Basic Usage", we describe how to do multiple writes to an ioapi file. This example accomplishes the same task. The advantage of this approach is that we do not have to first extract all the variables that we will eventually write to the file. Here, we call the combineMeta to get a tuple of iometa and cdmsmeta objects for only o3 (this can be done directly, but this is a quick approach). We then modify the iometa object using the replVar method. One needs to pass three lists of length number of variables to this method. The lists are the name, units, and description of each variable that will be eventually written to the file. After upgrading the variable(s) information in the iometa, we directly modify an attribute, in this case the grid name (used by the IOAPI library in the GRIDDESC file). Direct modification of the iometa attributes should be done with caution, because we can easily break the ioapi format. After reconstructing newM as a tuple with the new iometa object and the original cdmsmeta object, we can open and perform multiple writes to the ioapi file. For a description of the cdmsmeta object, see the section on coordinate conversion below.
Back to Contents
Example:
>>> ioM = o3.ioM.copy()
>>> cdmsM = ioT.cdmsmeta(o3, ioT.cdmsvarFlag)
>>> coordIn = [(-66000, -102000), (66000, -102000), (66000, 102000)]
>>> coordOut = ioT.coordConv(ioM, coordIn, ioT.proj2llFlag)
>>> coordOut2 = ioT.coordConv(ioM, coordIn, ioT.proj2crFlag, cdmsM)
>>> coordOut3 = ioT.coordConv(ioM, coordOut2, ioT.cr2projFlag, cdmsM)
>>> coordOut4 = ioT.coordConv(ioM, coordOut2, ioT.cr2llFlag, cdmsM)
>>> coordOut5 = ioT.coordConv(ioM, coordOut2, ioT.cr2projFlag, cdmsM, ioT.NWFlag)
One of the major reasons I work with ioapiTools is to more easily utilize multiple types of coordinate systems. The function coordConv is the major tool that provides this functionality (in fact, coordConv is the conversion engine below IOsubset). It facilitates the translation between row col, lat lon (in degrees), and yLat xLon (native projection units). Conversions between lat lon and yLat xLon only require an iometa object, a list of coordinates (x,y order), and the appropriate flag indicating the direction of the conversion. To do conversions to or from row col, we also need to get the specific domain information, cdmsmeta (basically the four axes of the variable).
In this example, we chose three points that sit on the grid cell centers. The first conversion, coordIn to coordOut is from xLon yLat to lon lat. It is interesting to note that we are going from a regular grid to an irregular grid. When comparing the 2nd and 3rd points, you will note that lon changes even though the xLon value stays the same, which is consistent with a Lambert Conformal Conic projection. The rest of the conversions are to or from col row. In these cases, you also need to provide the domain info, the cdmsmeta object. In all the other conversions, you are assuming that you are calculating the row col position, in lon lat or xLon yLat, at the center of the cell. In the final case, we calculate the position at the Northwest (upper left) corner of the grid cell.
Back to Contents
Example:
>>> o3Array = o3.getValue()
>>> o3Cdms = o3.IO2cdms()
>>> help(ioT.createVariable)
>>> o3_tmp = ioT.createVariable(o3Cdms, o3.ioM)
>>> attributesTmp = o3.attributes
>>> cdmsM = ioT.cdmsmeta(o3, ioT.cdmsvarFlag)
>>> axesTmp = cdmsM.getAxes()>>> o3_tmp2 = ioT.createVariable (o3Array, o3.ioM, axesTmp, \
"03", attributesTmp)
At times, you may need to convert the iovar object to a more basic data format. A possible reason is that you are trying to use additional functions in the Numeric and cdat modules that will not accept iovar objects. In our example, the first line converts the iovar o3 to a Numeric array, and the second line converts it to a standard cdms transient variable.At other times, you may have data in Numeric or cdms data format and want to convert it into an iovar. For example, you might be importing data from a non-ioapi format or you might have used a cdms method of the iovar object which has returned a non-iovar object. You can create an iovar with the createVariable function. If the input data is a cdms object and was derived from an iovar object, then chances are that all the metadata attributes are in order. In this case (as in our example), the create variable only requires the data and the iometa object. If the input data does not have the appropriate data, you may need to pass the specific axes and attributes. Another iovar may be used as a model for the appropriate form of the axes and the attributes.
Back to Contents
6. Cheat Sheet of Functions and Methods
The following is a list of the common functions and class methods. It is not exhaustive, but acts as a reference for the general user.
module functions:
binAverager - averages over an axis, e.g. converts hourly data to daily average data
combineMeta - combines metadata from a series of variables
concatenate - concatenates a series of variables
coordConv - converts between different coordinate systems
coordInDomain - tests if coordinate in spatial domain
createVariable - creates an iovar variable
dateInDomain - tests if date in temporal domain
dates2Cdtime - converts dates to cdtime objects
dates2DateTime - converts dates to mx.DateTime objects
layerInDomain - tests if layer in vertical domain
open - opens a file for reading or writing, returns iofile
scan - scans through files for date range, returns iofilescan
cdmsmeta methods:
copy - copies the cdmsmeta object
getAxes - gets all the axes
iofile methods:
close - closes a file
extract - extracts a variable from a file
listvariables - list the variables in a file
write - writes a variable to a file
iofilescan methods:
close - closes the files in iofilescan
extract - extracts a variable from the files
listvariables - list the variables in the files
iometa methods:
copy - copies the iometa object
replVar - replaces the variable's metadata in iometa
iovar methods:
getLatitude - returns latitude axis
getLongitude - returns longitude axis
getLevel - returns level or layer axis
getTime - returns time axis
getValue - converts iovar to Numeric array
ioM - iometa object, an attribute
IO2cdms - converts iovar to cdms variable
IOchangeDate - changes the start date of time axis
IOclone - makes a copy of the iovar object
IOequal - tests 2 variables for data equivalence
IOmodVar - modifies the variables metadata
IOsubset - subsets the variable
Back to Contents