


Read Met Office Unified Model (TM) (hereafter MetUM) netCDF files and
extact the variables in varlist.
MetUM = read_MetUM_forcing(files, varlist)
DESCRIPTION:
Given a cell array of file names, extract the variables in varlist into
a struct. Field names are the variable names gives.
INPUT:
files - cell array of file names (full paths to the files)
varlist [optional] - list of variable names to extract. If omitted, all
variables are extracted.
OUTPUT:
MetUM - MATLAB struct with the data from the variables specified in
varlist.
EXAMPLE USAGE:
varlist = {'x', 'y', 't_1', 'sh', 'x-wind', 'y-wind', 'rh', 'sh', ...
'lh', 'solar', 'longwave'};
files = {'/tmp/sn_2011010100_s00.nc', '/tmp/sn_201101016_s00.nc'};
MetUM = read_MetUM_forcing(files, varlist);
NOTE:
The last 4 times are dropped from each file because the Met Office
Unified Model is a forecast model with four hours of forecast in these
PP files.
Author(s):
Pierre Cazenave (Plymouth Marine Laboratory)
Revision history:
2013-08-29 First version.
2013-09-02 Amend the way the 3 and 4D variables are appended to one
another. The assumption now is time is the last dimension and arrays
are appended with time.
2013-09-06 Trim the last 4 time samples from all variables (these are
the forecast results which we don't want/need for forcing the model. I
suppose at some point, given the patchy temporal coverage of the data
(i.e. the Met Office FTP server doesn't have all files in a usable
state), it might be better to use the forecast data to partially fill
in gaps from missing files. However, given this forcing is hourly and I
was previously using four times daily forcing, I'm not that fussed.
2013-09-12 Add support for extracting the surface pressure level from
the 4D temperature variable (temp_2).
2013-10-23 Fix the way time is handled. Previously a time variable had
to be specified in varlist. Now, each data variable's time is returned
as an array within the MetUM.(variable) struct, giving
MetUM.(variable).time and MetUM.(variable).data. This means if each
data variable uses a different time sampling, that can be accounted for
later (by interpolating to a common time series with interp3, for
example). Currently the code extracts the first 6 hour's worth of data.
The assumption there is that the Met Office do 4 runs per day, so 6
hours of data from each run gives you a day's worth.
==========================================================================


0001 function MetUM = read_MetUM_forcing(files, varlist) 0002 % Read Met Office Unified Model (TM) (hereafter MetUM) netCDF files and 0003 % extact the variables in varlist. 0004 % 0005 % MetUM = read_MetUM_forcing(files, varlist) 0006 % 0007 % DESCRIPTION: 0008 % Given a cell array of file names, extract the variables in varlist into 0009 % a struct. Field names are the variable names gives. 0010 % 0011 % INPUT: 0012 % files - cell array of file names (full paths to the files) 0013 % varlist [optional] - list of variable names to extract. If omitted, all 0014 % variables are extracted. 0015 % 0016 % OUTPUT: 0017 % MetUM - MATLAB struct with the data from the variables specified in 0018 % varlist. 0019 % 0020 % EXAMPLE USAGE: 0021 % varlist = {'x', 'y', 't_1', 'sh', 'x-wind', 'y-wind', 'rh', 'sh', ... 0022 % 'lh', 'solar', 'longwave'}; 0023 % files = {'/tmp/sn_2011010100_s00.nc', '/tmp/sn_201101016_s00.nc'}; 0024 % MetUM = read_MetUM_forcing(files, varlist); 0025 % 0026 % NOTE: 0027 % The last 4 times are dropped from each file because the Met Office 0028 % Unified Model is a forecast model with four hours of forecast in these 0029 % PP files. 0030 % 0031 % Author(s): 0032 % Pierre Cazenave (Plymouth Marine Laboratory) 0033 % 0034 % Revision history: 0035 % 2013-08-29 First version. 0036 % 2013-09-02 Amend the way the 3 and 4D variables are appended to one 0037 % another. The assumption now is time is the last dimension and arrays 0038 % are appended with time. 0039 % 2013-09-06 Trim the last 4 time samples from all variables (these are 0040 % the forecast results which we don't want/need for forcing the model. I 0041 % suppose at some point, given the patchy temporal coverage of the data 0042 % (i.e. the Met Office FTP server doesn't have all files in a usable 0043 % state), it might be better to use the forecast data to partially fill 0044 % in gaps from missing files. However, given this forcing is hourly and I 0045 % was previously using four times daily forcing, I'm not that fussed. 0046 % 2013-09-12 Add support for extracting the surface pressure level from 0047 % the 4D temperature variable (temp_2). 0048 % 2013-10-23 Fix the way time is handled. Previously a time variable had 0049 % to be specified in varlist. Now, each data variable's time is returned 0050 % as an array within the MetUM.(variable) struct, giving 0051 % MetUM.(variable).time and MetUM.(variable).data. This means if each 0052 % data variable uses a different time sampling, that can be accounted for 0053 % later (by interpolating to a common time series with interp3, for 0054 % example). Currently the code extracts the first 6 hour's worth of data. 0055 % The assumption there is that the Met Office do 4 runs per day, so 6 0056 % hours of data from each run gives you a day's worth. 0057 % 0058 %========================================================================== 0059 0060 subname = 'read_MetUM_forcing'; 0061 0062 global ftbverbose 0063 if ftbverbose 0064 fprintf('\nbegin : %s \n', subname) 0065 end 0066 0067 assert(iscell(files), 'List of files provided must be a cell array.') 0068 0069 % Find the approximate surface pressure level (1013.25mbar) for the 4D 0070 % temperature data. 0071 nc = netcdf.open(files{1}, 'NOWRITE'); 0072 [~, numvars, ~, ~] = netcdf.inq(nc); 0073 levelidx = []; 0074 for f = 1:numvars 0075 [varname, ~, ~, ~] = netcdf.inqVar(nc, f - 1); 0076 if strcmp(varname, 'p') % p = pressure levels in the temp_2 variable. 0077 varid = netcdf.inqVarID(nc, varname); 0078 tmpdata = netcdf.getVar(nc, varid, 'double'); 0079 % Find the index for the level closest to the surface. The 0080 % documentation at: 0081 % http://badc.nerc.ac.uk/data/um/Met_Office_NAE_Output.pdf 0082 % suggests 980hPa is the surface. 0083 [~, levelidx] = min(abs(tmpdata - 980)); 0084 end 0085 end 0086 0087 % If that failed, use a best guess of the 5th index (based on my checking a 0088 % bunch of random files where the 1000mbar value falls in the p index). 0089 if isempty(levelidx) 0090 levelidx = 5; 0091 end 0092 0093 MetUM = struct(); 0094 0095 for f = 1:length(files) 0096 0097 % Set the number of time steps to extract to default to 6. It should be 0098 % checked for each file anyway (assuming there's a time variable being 0099 % requested). 0100 nh = 6; 0101 0102 if ftbverbose 0103 % Don't display the full path if it's really long. 0104 if length(files{f}) > 80 0105 [~, fname, fext] = fileparts(files{f}); 0106 dispname = [fname, fext]; 0107 else 0108 dispname = files{f}; 0109 end 0110 fprintf('File %i of %i (%s)... ', f, length(files), dispname) 0111 end 0112 0113 nc = netcdf.open(files{f}, 'NOWRITE'); 0114 0115 % Query the netCDF file to file the variable names. If the name matches 0116 % one in the list we've been given (or if we haven't been given any 0117 % particular variables), save it in the output struct. 0118 [~, numvars, ~, ~] = netcdf.inq(nc); 0119 0120 for ii = 1:numvars 0121 % Find the name of the current variable 0122 [varname, ~, ~, varAtts] = netcdf.inqVar(nc, ii - 1); 0123 0124 if ismember(varname, varlist) || nargin == 1 0125 varid = netcdf.inqVarID(nc, varname); 0126 0127 % Some variables contain illegal (in MATLAB) characters. Remove 0128 % them here. 0129 safename = regexprep(varname, '-', ''); 0130 0131 % Append the data on the assumption the last dimension is time. 0132 % Don't append data with only 2 dimensions as it's probably 0133 % longitude or latitude data. The time variable ('t') is 0134 % turned into a list of time stamps. 0135 tmpdata = squeeze(netcdf.getVar(nc, varid, 'double')); 0136 nn = ndims(tmpdata); 0137 0138 if isfield(MetUM, safename) 0139 switch varname 0140 case {'x', 'y', 'x_1', 'y_1', 'longitude', 'latitude', 'lsm'} 0141 continue 0142 case {'t', 't_1', 't_2', 't_3', 't_4', 't_5', 't_6', 't_7', 't_8'} 0143 % Ignore time variables. 0144 continue 0145 otherwise 0146 try 0147 % Extract the time for this variable. 0148 temptime = fix_time(nc, varid); 0149 0150 % Find how many indices to extract to at least 0151 % 6 hours of data. 0152 interval = mean(roundn(diff(datenum(temptime)) * 24 * 60, 0)); 0153 if abs(60 - interval) < abs(30 - interval) 0154 % Hourly 0155 nh = 6; 0156 elseif abs(30 - interval) < abs(60 - interval) 0157 % Half-hourly 0158 nh = 12; 0159 else 0160 error('Unsupported time sampling interval (support hourly and half-hourly sampling).') 0161 end 0162 % Check we don't try and get more data than we 0163 % have. 0164 if nh > size(temptime, 1); 0165 nh = size(temptime, 1); 0166 end 0167 0168 MetUM.(safename).time = [MetUM.(safename).time; temptime(1:nh, :)]; 0169 0170 % Append along last dimension. 0171 if nn == 3 0172 MetUM.(safename).data = cat(nn, MetUM.(safename).data, tmpdata(:, :, 1:nh)); 0173 else 0174 % We're flattening from 4D to 3D here, so 0175 % nn - 1. 0176 MetUM.(safename).data = cat(nn - 1, MetUM.(safename).data, squeeze(tmpdata(:, :, levelidx, 1:nh))); 0177 end 0178 catch err 0179 fprintf('\n') 0180 warning('Couldn''t append %s to the existing field from file %s.', safename, files{f}) 0181 warning('%s\n', err.message) 0182 end 0183 0184 end 0185 else % first time around 0186 switch varname 0187 case {'x', 'y', 'x_1', 'y_1', 'longitude', 'latitude', 'lsm'} 0188 MetUM.(safename).data = tmpdata; 0189 case {'t', 't_1', 't_2', 't_3', 't_4', 't_5', 't_6', 't_7', 't_8'} 0190 % Ignore time variables. 0191 continue 0192 otherwise 0193 % This is data. 0194 0195 % Extract the time for this variable. 0196 temptime = fix_time(nc, varid); 0197 0198 % Find how many indices to extract to at least 0199 % 6 hours of data. 0200 interval = mean(roundn(diff(datenum(temptime)) * 24 * 60, 0)); 0201 if abs(60 - interval) < abs(30 - interval) 0202 % Hourly 0203 nh = 6; 0204 elseif abs(30 - interval) < abs(60 - interval) 0205 % Half-hourly 0206 nh = 12; 0207 else 0208 error('Unsupported time sampling interval (support hourly and half-hourly sampling).') 0209 end 0210 % Check we don't try and get more data than we 0211 % have. 0212 if nh > size(temptime, 1); 0213 nh = size(temptime, 1); 0214 end 0215 0216 MetUM.(safename).time = temptime(1:nh, :); 0217 0218 if nn == 3 0219 MetUM.(safename).data = tmpdata(:, :, 1:nh); 0220 else 0221 % Assume temperature at pressure levels. 0222 % Extract the 1000mb pressure level 0223 % (approximately the surface). 0224 MetUM.(safename).data = squeeze(tmpdata(:, :, levelidx, 1:nh)); 0225 end 0226 end 0227 end 0228 end 0229 end 0230 0231 if ftbverbose 0232 fprintf('done.\n') 0233 end 0234 0235 end 0236 0237 % Squeeze out singleton dimensions. 0238 fields = fieldnames(MetUM); 0239 for i = 1:length(MetUM) 0240 MetUM.(fields{i}) = squeeze(MetUM.(fields{i})); 0241 end 0242 0243 if ftbverbose 0244 fprintf('end : %s \n', subname) 0245 end 0246 0247 function fixedtime = fix_time(nc, varid) 0248 % Little helper function to get the time data for the current variable. 0249 % 0250 % INPUT: 0251 % nc : netCDF file handle 0252 % varid : current variable ID 0253 % 0254 % OUTPUT: 0255 % tt : date string for the current file (Gregorian date) 0256 0257 % Extract the time array for this variable's time dimension. 0258 [numdims, ~, ~, ~] = netcdf.inq(nc); 0259 dimnames = cell(numdims, 1); 0260 for jj = 1:numdims 0261 [dimname, ~] = netcdf.inqDim(nc, jj - 1); 0262 dimnames{jj} = dimname; 0263 end 0264 0265 % Find the dimensions of this variable. 0266 [~, ~, dimids, ~] = netcdf.inqVar(nc, varid); 0267 % We presume the time variable starts with a t. 0268 ttidx = strncmpi(dimnames(dimids + 1), 't', length('t')); 0269 ttvarid = netcdf.inqVarID(nc, dimnames{dimids(ttidx) + 1}); 0270 % There are issues around precision here, so 0271 % convert tt to minutes and then back to fractions 0272 % of a day. 0273 tt = netcdf.getVar(nc, ttvarid, 'double'); 0274 tt = roundn(tt * 24 * 60, -1) / 24 / 60; 0275 0276 [~, ~, ~, tVarAtts] = netcdf.inqVar(nc, ttvarid); 0277 0278 for j = 1:tVarAtts 0279 timeatt = netcdf.inqAttName(nc, ttvarid, j - 1); 0280 if strcmpi(timeatt, 'time_origin') 0281 timeval = netcdf.getAtt(nc, ttvarid, timeatt); 0282 end 0283 end 0284 mt = datenum(timeval, 'dd-mmm-yyyy:HH:MM:SS'); 0285 0286 fixedtime = datestr(mt + tt, 'yyyy-mm-dd HH:MM:SS');