This function automates cleaning and reformatting of GSOD, https://data.noaa.gov/dataset/global-surface-summary-of-the-day-gsod, station files in "WMO-WBAN-YYYY.op.gz" format that have been downloaded from the United States National Center for Environmental Information's (NCEI) FTP server.
reformat_GSOD(dsn = NULL, file_list = NULL)
| dsn | User supplied file path to location of station file data on local disk for reformatting. |
|---|---|
| file_list | User supplied list of files of station data on local disk for reformatting. |
A data.frame object of weather data or a
comma-separated value (CSV) or GeoPackage (GPKG) file saved to local disk.
For automated downloading and processing see the get_GSOD
function which provides expanded functionality for automatically downloading
and expanding annual GSOD files and cleaning station files.
This function reformats the data into a more usable form and calculates three new elements; saturation vapour pressure (es), actual vapour pressure (ea) and relative humidity (RH). All units are converted to International System of Units (SI), e.g., Fahrenheit to Celsius and inches to millimetres. Alternative elevation measurements are supplied for missing values or values found to be questionable based on the Consultative Group for International Agricultural Research's Consortium for Spatial Information group's (CGIAR-CSI) Shuttle Radar Topography Mission 90 metre (SRTM 90m) digital elevation data based on NASA's original SRTM 90m data.
Data summarise each year by station, which include vapour pressure and relative humidity elements calculated from existing data in GSOD.
All missing values in resulting files are represented as NA regardless of which field they occur in.
Only station files in ".op.gz" file format are supported by this function. If you have downloaded the full annual "gsod_YYYY.tar" file you will need to extract the individual station files first to use this function.
The data returned either in a data.frame object that includes the following fields:
Station number (WMO/DATSAV3 number) for the location
Number where applicable--this is the historical "Weather Bureau Air Force Navy" number - with WBAN being the acronym
Unique text identifier
Country in which the station is located
Latitude. *Station dropped in cases where values are < -90 or > 90 degrees or Lat = 0 and Lon = 0* (WGS84)
Longitude. *Station dropped in cases where values are < -180 or > 180 degrees or Lat = 0 and Lon = 0* (WGS84)
Elevation in metres
Elevation in metres corrected for possible errors, derived from the CGIAR-CSI SRTM 90m database (Jarvis et al. 2008)
Date in YYYY-mm-dd format
The year (YYYY)
The month (mm)
The day (dd)
Sequential day of year (not in original GSOD)
Mean daily temperature converted to degrees C to tenths. Missing = NA
Number of observations used in calculating mean daily temperature
Mean daily dew point converted to degrees C to tenths. Missing = NA
Number of observations used in calculating mean daily dew point
Mean sea level pressure in millibars to tenths. Missing = NA
Number of observations used in calculating mean sea level pressure
Mean station pressure for the day in millibars to tenths. Missing = NA
Number of observations used in calculating mean station pressure
Mean visibility for the day converted to kilometres to tenths Missing = NA
Number of observations used in calculating mean daily visibility
Mean daily wind speed value converted to metres/second to tenths Missing = NA
Number of observations used in calculating mean daily wind speed
Maximum sustained wind speed reported for the day converted to metres/second to tenths. Missing = NA
Maximum wind gust reported for the day converted to metres/second to tenths. Missing = NA
Maximum temperature reported during the day converted to Celsius to tenths--time of max temp report varies by country and region, so this will sometimes not be the max for the calendar day. Missing = NA
Blank indicates max temp was taken from the explicit max temp report and not from the 'hourly' data. An "*" indicates max temp was derived from the hourly data (i.e., highest hourly or synoptic-reported temperature)
Minimum temperature reported during the day converted to Celsius to tenths--time of min temp report varies by country and region, so this will sometimes not be the max for the calendar day. Missing = NA
Blank indicates max temp was taken from the explicit max temp report and not from the 'hourly' data. An "*" indicates min temp was derived from the hourly data (i.e., highest hourly or synoptic-reported temperature)
Total precipitation (rain and/or melted snow) reported during the day converted to millimetres to hundredths; will usually not end with the midnight observation, i.e., may include latter part of previous day. A ".00" value indicates no measurable precipitation (includes a trace). Missing = NA; *Note: Many stations do not report '0' on days with no precipitation-- therefore, 'NA' will often appear on these days. For example, a station may only report a 6-hour amount for the period during which rain fell.* See FLAGS_PRCP column for source of data
1 report of 6-hour precipitation amount
Summation of 2 reports of 6-hour precipitation amount
Summation of 3 reports of 6-hour precipitation amount
Summation of 4 reports of 6-hour precipitation amount
1 report of 12-hour precipitation amount
Summation of 2 reports of 12-hour precipitation amount
1 report of 24-hour precipitation amount
Station reported '0' as the amount for the day (e.g., from 6-hour reports), but also reported at least one occurrence of precipitation in hourly observations--this could indicate a trace occurred, but should be considered as incomplete data for the day
Station did not report any precip data for the day and did not report any occurrences of precipitation in its hourly observations--it's still possible that precipitation occurred but was not reported
Snow depth in millimetres to tenths. Missing = NA
Indicator for fog, (1 = yes, 0 = no/not reported) for the occurrence during the day
Indicator for rain or drizzle, (1 = yes, 0 = no/not reported) for the occurrence during the day
Indicator for snow or ice pellets, (1 = yes, 0 = no/not reported) for the occurrence during the day
Indicator for hail, (1 = yes, 0 = no/not reported) for the occurrence during the day
Indicator for thunder, (1 = yes, 0 = no/not reported) for the occurrence during the day
Indicator for tornado or funnel cloud, (1 = yes, 0 = no/not reported) for the occurrence during the day
Mean daily actual vapour pressure
Mean daily saturation vapour pressure
Mean daily relative humidity
Some of these data are redistributed with this R package. Originally from these data come from the US NCEI which states that users of these data should take into account the following: “The following data and products may have conditions placed on their international commercial use. They can be used within the U.S. or for non-commercial international activities without restriction. The non-U.S. data cannot be redistributed for commercial purposes. Re-distribution of these data by others must provide this same notification.”
Jarvis, A., Reuter, H.I, Nelson, A., Guevara, E. (2008) Hole-filled SRTM for the globe Version 4, available from the CGIAR-CSI SRTM 90m Database http://srtm.csi.cgiar.org
## Not run: ------------------------------------ # # # Reformat station data files in local directory # x <- reformat_GSOD(dsn = "~/tmp") # # # Reformat a list of data files # y <- c("~/GSOD/gsod_1960/200490-99999-1960.op.gz", # "~/GSOD/gsod_1961/200490-99999-1961.op.gz") # x <- reformat_GSOD(file_list = y) ## ---------------------------------------------