This function automates cleaning and reformatting of GSOD, https://data.noaa.gov/dataset/global-surface-summary-of-the-day-gsod, station files in "WMO-WBAN-YYYY.op.gz" format that have been downloaded from the United States National Center for Environmental Information's (NCEI) FTP server.

reformat_GSOD(dsn = NULL, file_list = NULL)

Arguments

dsn

User supplied file path to location of station file data on local disk for reformatting.

file_list

User supplied list of files of station data on local disk for reformatting.

Value

A data.frame object of weather data or a comma-separated value (CSV) or GeoPackage (GPKG) file saved to local disk.

Details

For automated downloading and processing see the get_GSOD function which provides expanded functionality for automatically downloading and expanding annual GSOD files and cleaning station files.

This function reformats the data into a more usable form and calculates three new elements; saturation vapour pressure (es), actual vapour pressure (ea) and relative humidity (RH). All units are converted to International System of Units (SI), e.g., Fahrenheit to Celsius and inches to millimetres. Alternative elevation measurements are supplied for missing values or values found to be questionable based on the Consultative Group for International Agricultural Research's Consortium for Spatial Information group's (CGIAR-CSI) Shuttle Radar Topography Mission 90 metre (SRTM 90m) digital elevation data based on NASA's original SRTM 90m data.

Data summarise each year by station, which include vapour pressure and relative humidity elements calculated from existing data in GSOD.

All missing values in resulting files are represented as NA regardless of which field they occur in.

Only station files in ".op.gz" file format are supported by this function. If you have downloaded the full annual "gsod_YYYY.tar" file you will need to extract the individual station files first to use this function.

The data returned either in a data.frame object that includes the following fields:

STNID

Station number (WMO/DATSAV3 number) for the location

WBAN

Number where applicable--this is the historical "Weather Bureau Air Force Navy" number - with WBAN being the acronym

STN_NAME

Unique text identifier

CTRY

Country in which the station is located

LAT

Latitude. *Station dropped in cases where values are < -90 or > 90 degrees or Lat = 0 and Lon = 0* (WGS84)

LON

Longitude. *Station dropped in cases where values are < -180 or > 180 degrees or Lat = 0 and Lon = 0* (WGS84)

ELEV_M

Elevation in metres

ELEV_M_SRTM_90m

Elevation in metres corrected for possible errors, derived from the CGIAR-CSI SRTM 90m database (Jarvis et al. 2008)

YEARMODA

Date in YYYY-mm-dd format

YEAR

The year (YYYY)

MONTH

The month (mm)

DAY

The day (dd)

YDAY

Sequential day of year (not in original GSOD)

TEMP

Mean daily temperature converted to degrees C to tenths. Missing = NA

TEMP_CNT

Number of observations used in calculating mean daily temperature

DEWP

Mean daily dew point converted to degrees C to tenths. Missing = NA

DEWP_CNT

Number of observations used in calculating mean daily dew point

SLP

Mean sea level pressure in millibars to tenths. Missing = NA

SLP_CNT

Number of observations used in calculating mean sea level pressure

STP

Mean station pressure for the day in millibars to tenths. Missing = NA

STP_CNT

Number of observations used in calculating mean station pressure

VISIB

Mean visibility for the day converted to kilometres to tenths Missing = NA

VISIB_CNT

Number of observations used in calculating mean daily visibility

WDSP

Mean daily wind speed value converted to metres/second to tenths Missing = NA

WDSP_CNT

Number of observations used in calculating mean daily wind speed

MXSPD

Maximum sustained wind speed reported for the day converted to metres/second to tenths. Missing = NA

GUST

Maximum wind gust reported for the day converted to metres/second to tenths. Missing = NA

MAX

Maximum temperature reported during the day converted to Celsius to tenths--time of max temp report varies by country and region, so this will sometimes not be the max for the calendar day. Missing = NA

MAX_FLAG

Blank indicates max temp was taken from the explicit max temp report and not from the 'hourly' data. An "*" indicates max temp was derived from the hourly data (i.e., highest hourly or synoptic-reported temperature)

MIN

Minimum temperature reported during the day converted to Celsius to tenths--time of min temp report varies by country and region, so this will sometimes not be the max for the calendar day. Missing = NA

MIN_FLAG

Blank indicates max temp was taken from the explicit max temp report and not from the 'hourly' data. An "*" indicates min temp was derived from the hourly data (i.e., highest hourly or synoptic-reported temperature)

PRCP

Total precipitation (rain and/or melted snow) reported during the day converted to millimetres to hundredths; will usually not end with the midnight observation, i.e., may include latter part of previous day. A ".00" value indicates no measurable precipitation (includes a trace). Missing = NA; *Note: Many stations do not report '0' on days with no precipitation-- therefore, 'NA' will often appear on these days. For example, a station may only report a 6-hour amount for the period during which rain fell.* See FLAGS_PRCP column for source of data

PRCP_FLAG

A

1 report of 6-hour precipitation amount

B

Summation of 2 reports of 6-hour precipitation amount

C

Summation of 3 reports of 6-hour precipitation amount

D

Summation of 4 reports of 6-hour precipitation amount

E

1 report of 12-hour precipitation amount

F

Summation of 2 reports of 12-hour precipitation amount

G

1 report of 24-hour precipitation amount

H

Station reported '0' as the amount for the day (e.g., from 6-hour reports), but also reported at least one occurrence of precipitation in hourly observations--this could indicate a trace occurred, but should be considered as incomplete data for the day

I

Station did not report any precip data for the day and did not report any occurrences of precipitation in its hourly observations--it's still possible that precipitation occurred but was not reported

SNDP

Snow depth in millimetres to tenths. Missing = NA

I_FOG

Indicator for fog, (1 = yes, 0 = no/not reported) for the occurrence during the day

I_RAIN_DRIZZLE

Indicator for rain or drizzle, (1 = yes, 0 = no/not reported) for the occurrence during the day

I_SNOW_ICE

Indicator for snow or ice pellets, (1 = yes, 0 = no/not reported) for the occurrence during the day

I_HAIL

Indicator for hail, (1 = yes, 0 = no/not reported) for the occurrence during the day

I_THUNDER

Indicator for thunder, (1 = yes, 0 = no/not reported) for the occurrence during the day

I_TORNADO_FUNNEL

Indicator for tornado or funnel cloud, (1 = yes, 0 = no/not reported) for the occurrence during the day

ea

Mean daily actual vapour pressure

es

Mean daily saturation vapour pressure

RH

Mean daily relative humidity

Note

Some of these data are redistributed with this R package. Originally from these data come from the US NCEI which states that users of these data should take into account the following: “The following data and products may have conditions placed on their international commercial use. They can be used within the U.S. or for non-commercial international activities without restriction. The non-U.S. data cannot be redistributed for commercial purposes. Re-distribution of these data by others must provide this same notification.”

References

Jarvis, A., Reuter, H.I, Nelson, A., Guevara, E. (2008) Hole-filled SRTM for the globe Version 4, available from the CGIAR-CSI SRTM 90m Database http://srtm.csi.cgiar.org

See also

get_GSOD

Examples

## Not run: ------------------------------------ # # # Reformat station data files in local directory # x <- reformat_GSOD(dsn = "~/tmp") # # # Reformat a list of data files # y <- c("~/GSOD/gsod_1960/200490-99999-1960.op.gz", # "~/GSOD/gsod_1961/200490-99999-1961.op.gz") # x <- reformat_GSOD(file_list = y) ## ---------------------------------------------