Intermediate processing steps of quality-checking of Antarctic Circumnavigation Expedition (ACE) cruise track data. ***** Dataset abstract ***** The Antarctic Circumnavigation Expedition (ACE), undertaken in the austral summer of 2016/2017 recorded the cruise track using two independent geo-location instruments: one using GLobal NAvigation Satellite Systems (GLONASS; hereafter referred to as GLONASS) and another primarily using the Global Positioning System (GPS; hereafter referred to as the Trimble GPS). Daily log files were recorded in real-time from both instruments during the expedition and added to MySQL database tables. Following the expedition, quality-checking work has been undertaken to provide a one-second resolution set of positions for the cruise track. This dataset presents the intermediate files that were produced during the quality-checking, therefore it could be used to check the processing steps that have been undertaken, but should not be used as a final source of the cruise track data. Both the original raw data files and final quality-checked cruise track can be found in related datasets. ***** Original data collection ***** The data files from which this intermediate dataset arises can be found in the dataset by Thomas and Pina Estany (2019; DOI 10.5281/zenodo.3260616 and DOI 10.5281/zenodo.3368403). During the expedition, both instruments experienced some problems: the Trimble was turned off manually and also demonstrates strange deviations in the track; the GLONASS was affected at times by the software on a Windows computer from where the data were logged, crashing. These incidences left gaps and other unknown factors in the data. ***** Data processing ***** Code used to produce these intermediate sets of files as well as the final output dataset can be found in the packages science-cruise-data-managementv0.1.0 (Pina Estany and Thomas, 2019; DOI 10.5281/zenodo.3360649) and science-data-utilsv0.1.0 (Thomas and Pina Estany, 2019; DOI 10.5281/zenodo.3268428). – From log files add to database – Within the package science-cruise-data-managementv0.1.0, the script importcruisetrack.py parses the NMEA strings in the log files and adds them to MySQL database tables. – Export from database to daily csv files – An export command was used to output the data from the database into daily files for each instrument (ace_INSTRUMENT_YYYY-MM-DD.csv). The following steps were undertaken using cruise_track_data_processing.py within science-data-utilsv0.1.0. – Flag data points according to the following criteria: a) speed, b) acceleration, c) change in heading – SeaDataNet Measureand Qualifier Flags (NERC Vocabulary Server version 2.0 (NVS2.0)) were used to flag all data points and highlight quality-checked data and bad data, assigning each point to different criteria. This was done according to the: a) speed of the vessel; initially the distance between consecutive data points was calculated and from this the speed of the vessel was calculated. The vessel normally travelled no faster than 15 knots, which could be a faster speed-over-ground if it was accelerated by a tailwind or currents. To calculate the maximum speed that would be permitted as a “good value”, the interquartile range of the speeds displayed by the dataset (for each instrument individually) was determined after first discarding periods where 2.5 <= speed (knots) < 100. This removed periods where the vessel was stationary in port or supposedly moving at speeds that were not physically possible. b) distance between consecutive data points: a minimum distance was set for between points where the ship should have been moving as it was discovered that there were some periods where the ship was moving according to records (Walton and Thomas, 2018) but showed as stationary. This minimum distance was 0.019m – this is the maximum distance that there could be between two points at the Equator if the latitude and longitude were measured to 6 decimal places. If 0 < distance (m) < 0.019 where the ship wasn’t in port then it was considered “probably bad value”. c) acceleration of the vessel: physical limitations meant that the vessel could not accelerate at more than 1ms^2 so any acceleration > 1ms^2 was considered a “probably bad value”. See the data quality-checking section for more information about how these criteria were defined. – Manually identify incorrect positions from visual inspection of the track – Each daily file (ace_INSTRUMENT_YYYY-MM-DD.csv) resulting from the previous step was loaded into QGIS (version 2.18.28 running on Debian 9.x) individually or two at a time and with a resolution of 1:10,000 the cruise track was inspected. Where there were obvious deviations from what would be considered a relatively straight track, the start and end time of these deviations were noted in the files, ace_INSTRUMENT_manual_position_errors.csv. None of these obvious deviations were recorded for the GLONASS instrument. An example of what was recognised as a deviation can be seen in the file ace_cruise_track_trimble_example_deviation_2016-12-24.png. Data points that fell within these time periods were then flagged as bad data according to the SeaDataNet Measureand Qualifier Flags (NERC Vocabulary Server version 2.0 (NVS2.0)). – Output intermediate files which contain the flags – A daily file for each instrument was output containing a full set of all data points and quality flags. – Combine and prioritise data from the two sources -- Data were prioritised using a combination of the source (instrument) and data quality. GLONASS data were selected over Trimble data as this instrument was considered more reliable during the expedition. For more information about how this was done, see the data quality-checking section. ***** Quality-checking ***** Quality-checking of the cruise track data was considered necessary as deviations in the track were identified visually. In addition to this, we wanted to ensure that by using the two instruments, we provided an accurate track. – Data flagging – Data points were flagged as part of the processing steps in order to identify bad data. 1 = good value 2 = probably good value 3 = probably bad value 9 = missing value Quality flags were applied to the following criteria: a) speed of the vessel: where speed (knots) <= upper_quartile + 1.5 x interquartile_range, speed is a “good value” b) distance between consecutive data points: where distance > 0.019 m, distance is a good value. c) acceleration of the vessel: where acceleration (ms^-2) <= 1 ms^2, acceleration is a “good value” d) visual inspection: any areas of the cruise track which were identified visually and manually as unrealistic deviations. These were then combined into an “overall” data quality flag: - if any of the criteria for a data point were flagged as “probably bad value”, then the overall flag was assigned as “probably bad value”; - if all of the criteria for a data point were flagged as “good value”, then the overall flag was assigned as “good value”; - if one of the calculated criteria values were flagged as “missing values”, but the other criteria were flagged as “good value”, then the overall flag was assigned as “probably good value”; - if one or more of the criteria values were flagged as “probably good value” and the other other criteria were flagged as “good value”, then the overall flag was assigned as “probably good value”. It is on the basis of the overall flag that the dataset points were prioritised and from which the final dataset results. – Prioritisation of data points – Data points that occur at the same time (with the same seconds value), were compared to select the one considered to be of best quality. At no point were bad data points were selected for the final dataset, so this will result in some gaps in the track. Prioritisation steps were as follows: - if there was only one data point for specific time point, it was recorded by GLONASS and had an overall flag of good value, it was selected - if there was more than one data point to choose from and they were all of good quality, then the GLONASS point was selected. Otherwise if the GLONASS data point was probably good or probably bad, and the Trimble data point was of good quality, then that was selected. ***** Further information for interpreting the data and using the dataset ***** ***** Dataset contents ***** - ace_INSTRUMENT_YYYY-MM-DD.csv – daily files for each instrument that were output from the database, data file, comma-separated values - ace_INSTRUMENT_concatenated_YYYY-MM.csv – input files concatenated by month and instrument, data file, comma-separated values - flagging_data_ace_INSTRUMENT_YYYY-MM-DD.csv – daily output files for each instrument with flagged data points, data file, comma-separated values - track_data_combined_overall_flags_YYYY-MM.csv – instrument data combined with overall data flag for each month, data file, comma-separated values - track_data_prioritised_YYYY-MM.csv – prioritised data files with overall data flag for each month, data file, comma-separated values - ace_INSTRUMENT_manual_position_errors.csv – files containing the manually-observed errors, metadata, comma-separated values - in_port.csv - dates on when the ship was stationary in port, metadata, comma-separated values - README.txt – metadata, text file - data_file_header.txt – metadata, text file ***** Dataset contact ***** Jenny Thomas, Swiss Polar Institute, Switzerland. ORCID: 0000-0002-5986-7026. Email: jenny.thomas@epfl.ch, jt.sciencedata@gmail.com ***** Dataset license ***** This dataset containing intermediate processing files of the ACE cruise track is made available under the Open Data Commons Attribution License (ODC-By) v1.0 whose full text can be found at https://www.opendatacommons.org/licenses/by/1.0/index.html ***** Dataset citation ***** Please cite this dataset as: Thomas, J. and Pina Estany, C. (2019). Intermediate processing steps of quality-checking of Antarctic Circumnavigation Expedition (ACE) cruise track data. (Version 1.0) [Data set]. Zenodo. DOI: 10.5281/zenodo.3471403.