This readme.txt file was generated on 2021-12-39 by Len Thomas GENERAL INFORMATION 1. Title of Dataset: SAMBAH (Static Acoustic Monitoring of the Baltic Sea Harbour Porpoise) Abundance Estimation. Dataset accompanies the manuscript: Estimating the abundance of the critically endangered Baltic Proper harbour porpoise (Phocoena phocoena) population using passive acoustic monitoring. Mats Amundin, Julia Carlström, Len Thomas, Ida Carlén, Jens Koblitz, Jonas Teilmann, Jakob Tougaard, Nick Tregenza, Daniel Wennerberg, Olli Loisa, Katharina Brundiers, Monika Kosecka, Line Anker Kyhn, Cinthia Tiberi Ljungqvist, Signe Sveegaard, M. Louise Burt, Iwona Pawliczka, Ivar Jussi, Radomil Koza, Bartlomiej Arciszewski, Anders Galatius, Martin Jabbusch, Jussi Laaksonlaita, Sami Lyytinen, Jussi Niemi, Aleksej Šaškov, Jamie MacAuley, Andrew Wright, Anja Gallus, Penina Blankett, Michael Dähne, Alejandro Acevedo-Gutiérrez and Harald Benke Ecology and Evolution. 2. Author Information A. Corresponding Author Contact Information Name: Julia Carlström Institution: Swedish Museum of Natural History Address: Stockholm, Sweden Email: julia.carlstrom@nrm.se B. SAMBAH Principal Investigator Contact Information Name: Mats Amundin Institution: Kolmarden Wildlife Park Address: Kolmården, Sweden Email: Mats.Amundin@kolmarden.com C. Data and Code Curator Contact Information Name: Len Thomas Institution: University of St Andrews Address: St Andrews, UK Email: len.thomas@st-andrews.ac.uk 3. Date of data collection: SAMBAH primary survey period: 2011-05-01 to 2013-04-30 Great Belt tracking experiment: 2013-05-27 to 2013-06-22 Tag data collected within the period: 2010-05-19 and 2011-04-09 4. Geographic location of data collection: SAMBAH primary survey: Baltic Sea (see manuscript Figure 2) Great Belt tracking experiment: Great Belt, Denmark 55° 27.2' N 10° 50.6' E Tag data: Tags attached in waters near Korsør and Fjellerup, Denmark. Animals moved mostly through Inner Danish Waters, occasionally moving into Swedish or Norwegian waters of the Skagerrak and northern Kattegat. 5. Information about funding sources that supported the collection of the data: The SAMBAH project was funded by the LIFE+ programme of the European Commission (LIFE08 NAT/S/000261), and co-funded by Bundesamt für Naturschutz, Germany (SAMBAH II 5 Vw/52602/2011-Mar 36032/66); Bundesministerium für Umwelt, Naturschutz und Reaktorsicherheit, Germany (COSAMM FKZ 0325238); Carlsbergfondet, Denmark (CF16-0861); European Association of Zoos and Aquaria, The Netherlands; Havs- och Vattenmyndigheten, Sweden; Instytut Meteorologii i Gospodarki Wodnej - Panstwowy Instytut Badawczy, Poland; Japanese Science and Technology Agency-CREST, Japan (7620-7); Kolmårdens Djurpark, Sweden; Maailmann Luonnon Säätiö (WWF) Suomen Rahasto, Finland; Miljøministeriet, Denmark; Miljø- og Fødevareministeriet, Denmark (SN 343/SN-0008); Narodowy Fundusz Ochrony Srodowiska i Gospodarki Wodnej, Poland (561/2009/Wn-50/OP/RE-LF/D); Naturvårdsverket, Sweden; SNAK Ph.D. School, Aarhus University, Denmark (91147/365); Tampereen Särkänniemi Ltd., Finland; Turun ammattikorkeakoulu, Finland; Uniwersytet Gdanski, Poland; Wojewódzki Fundusz Ochrony Srodowiska i Gospodarki Wodnej w Gdansku, Poland; Ympäristöministeriö, Finland. SHARING/ACCESS INFORMATION 1. Licenses/restrictions placed on the data: Data and code published under Creative Commons Zero waiver (http://creativecommons.org/about/cc0). If you use any part of these data or code, please acknowledge the primary authors using the recommended citation given below. 2. Links to publications that cite or use the data: Amundin et al. In press. Estimating the abundance of the critically endangered Baltic Proper harbour porpoise (Phocoena phocoena) population using passive acoustic monitoring. Ecology and Evolution. 3. Links to other publicly accessible locations of the data: None. 4. Links/relationships to ancillary data sets: None. 5. Was data derived from another source? No. 6. Recommended citation for this dataset: Amundin et al. In press. Estimating the abundance of the critically endangered Baltic Proper harbour porpoise (Phocoena phocoena) population using passive acoustic monitoring. Ecology and Evolution. DATA & FILE OVERVIEW 1. File List: 1.1 Overview of Sweave files: There are 6 Sweave files (i.e., code files containing a mix of LaTeX and R code). Running ("compiling") these in R using the knitr package, with appropriate other packages installed, will reproduce all of the results given in the paper. Sweave files have the ending .Rnw; we also provide the corresponding .pdf files that are the result of running the .Rnw files. Note that the Sweave files are text files and can be viewed using any text editor. The 6 Sweave files are as follows: SAMBAH_1_EncounterRate.Rnw SAMBAH_2_GreatBeltTrackingDetectionFunction.Rnw SAMBAH_3_GreatBeltPlaybackDetectionFunction.Rnw SAMBAH_4_PlaybackDetectionFunction.Rnw SAMBAH_5_TagAnalysis.Rnw SAMBAH_6_Density.Rnw Please look in the corresponding pdf files to find out more about each file. Each file has one or more input files it requires, as listed below. Note that the run-times on some of the Sweave files is long. In order to save time in re-running them, some files have boolean variables in the R code that cause the code to load results from R data (.RData) files rather than re-analyzing all the data from scratch. Please consult the Sweave files for more details. If a full run is undertaken, then .RData files are produced as output files, and these are listed below. All output files required as input to other Sweave files are included here, so the code can be run efficiently if required. 1.2 Input data files: General files: - logo_SAMBAH.jpg - image file used in compiling Sweave documents - logo_natura2000.jpg - image file used in compiling Sweave documents - logo_life.jpg - image file used in compiling Sweave documents SAMBAH_1_EncounterRate: - n.bymonth.bymin.txt - main input file - contains number of click-positive seconds and number of seconds of monitoring per station and month for each minute of the day (from minute 1 being midnight through to 1440 being 1 minute before midnight) - coastline.txt - lat, lon positions of coastline for making a map - SAMBAH_geo.txt - lat, lon and country of each sampling station - Megametadata_SAMBAH_v6.csv - master meta-information file for each CPOD deployment in the main SAMBAH area. Contains information such as deployment time, CPOD number, etc. - Diel_phase_start_times_v2.csv - for each station and month, gives the start times of the 4 diel phases SAMBAH_2_GreatBeltTrackingDetectionFunction: - GreatBelt_TrialData.csv - outcome of tracking experiment trials - one line per second of each encounter and each CPOD, whether the CPOD detected the porpoise or not - diel_model.csv - output from SAMBAH_1 - used just for producing a table SAMBAH_3_GreatBeltPlaybackDetectionFunction: - playback_GreatBelt.csv - outcome of playbacks at GreatBelt site - one line per burst of 10 clicks played back, with relevant covariate information and how many of the clicks were detected by the CPOD SAMBAH_4_PlaybackDetectionFunction: - playback_SAMBAH.csv - outcome of playbacks undertaken during SAMBAH main survey - one line per burst of 10 clicks played back at each station, with relevant covariate information and how many of the clicks were detected by the CPOD - playback_SAMBAH_covs.csv - file of covariates for all stations and months, regardless of whether a playback was undertaken then or not. This is used in producing predictions of playback effective detection area in all months and stations. - Diel phase start times v2 - same as used in SAMBAH_1 SAMBAH_5_TagAnalysis: - TagData.RData - an R Data file containing a list of the 6 tag records. Each has record contains meta-information (tag on and off times, etc), plus vectors of length equal to the number of seconds between tag on and tag off, whether a click was detected, etc. - TagDataDielTimes.csv - for each day a tag was out, contains the start times of the 4 diel phases in that (approximate) area. Used in the supplementary examination of diel behaviour. SAMBAH_6_Density: - station_info.csv - information about location, country and diel period start times for each station and month - n_bymonth_byposition_bydiel.csv - produced by SAMBAH_1 - GreatBeltDetectionFunctionResults.RData - procuded by SAMBAH_2 - PlaybackGreatBeltResults.RData - produced by SAMBAH_3 - PlaybackSambahResults.RData - produced by SAMBAH_4 - PlaybackSambahBootResults.RData - produced by SAMBAH_4 1.3 Output results files: SAMBAH_1_EncounterRate: - n_bymonth_byposition_bydiel.csv - main output - contains number of click-positive seconds and seconds of monitoring per station per month per diel phase. Used in SAMBAH_6 - diel_model.csv - results of a model of encounter rate vs diel time that is an input to SAMBAH_2 SAMBAH_2_GreatBeltTrackingDetectionFunction: - GreatBeltDetectionFunctionResults.RData - results file used in SAMBAH_6 SAMBAH_3_GreatBeltPlaybackDetectionFunction: - PlaybackGreatBeltResults.RData - results file used in SAMBAH_6 SAMBAH_4_PlaybackDetectionFunction: - PlaybackSambahResults.RData - results file used in SAMBAH_6 - PlaybackSambahBootResults.RData - results file used in SAMBAH_6 SAMBAH_5_TagAnalysis: - None. (Main output of tag analysis is proportion of time clicking, and this is input to the SAMBAH_6 file code manually.) SAMBAH_6_Density: - None. All results printed as tables and figures in the pdf. 1.4 Other files provided CalculateEncounterRate_v6.R - R code file that takes raw data (see next two files) from CPOD analysis and outputs n.bymonth.bymin.txt. It is not necessary to run this file to reproduce our results, as n.bymonth.bymin.txt is provided in this package. detections and environment - validated and cropped - 20141013.txt This is one of two "raw" data files that is output by the CPOD processing software CPOD.exe and used as input to CalculateEncounterRate_v6.R. It contains one record per minute of survey effort per deployment, and is a large file (approx 17GB). It is provided for future use by researchers, as there is more information here than we used in our analysis. click details - validated and croppped - 20141013.txt This is the second of two "raw" data files that is output by the CPOD processing software CPOD.exe and used as input to CalculateEncounterRate_v6.R. It contains one record per click detected and classified as part of a harbour porpoise click train. It is provided for future use by researchers, as there is more information here than we used in our analysis. 2. Relationship between files, if important: Relationships between files documented in previous section. 3. Additional related data collected that was not included in the current data package: Raw tag data files produced by ATag software not included, as we do not have explicit permission to distribute these files. Instead, the R data object - TagData.RData is provided, which contains information for each tag on a per-second basis, which is what is needed to reproduce our analyses. 4. Are there multiple versions of the dataset? No METHODOLOGICAL INFORMATION 1. Description of methods used for collection/generation of data: See paper and Sweave files. 2. Methods for processing the data: CPOD raw output files were processed as described in the paper to produce the "detections and environment" and "click details" files mentioned above. These files were validated by the SAMBAH team, and cropped to just the main SAMBAH survey period. Raw tag data files were processed by aggregating to a per-second basis. 3. Instrument- or software-specific information needed to interpret the data: No particular software is required to view .csv and .txt files. The tag data are provided as an R data file, and so R is required to view it. R is freely available at https://www.r-project.org/ R is required if it is desired to re-process our input files to produce our results, using the Sweave files included. 4. Standards and calibration information, if appropriate: N/A. 5. Environmental/experimental conditions: See paper. 6. Describe any quality-assurance procedures performed on the data: See paper. 7. People involved with sample collection, processing, analysis and/or submission: Primary field team are among the paper authors. DATA-SPECIFIC INFORMATION FOR: n.bymonth.bymin.txt Main input file from SAMBAH survey: contains information about survey effort and click-positive seconds per station and month. One line per minute of the day (1 is midnight, 61 is 1am, 121 is 2am, ...) per monitoring station and month. Tab-delimited, with a header row giving column names. Columns: - deployment -- deployment code - year - month 1=Jan, 12=Dec - minute of the day from 1 to 1440 - effort.secs - number of seconds of monitoring in that minute of the day over the month. Max 1860 which corresponds to complete monitoring over a 31-day month (60 seconds per day times 31 days) - click.secs - number of monitoring seconds that contained 1 or more click DATA-SPECIFIC INFORMATION FOR: coastline.txt Series of longitude and latitude waypoints that make up a coastline for the study area. Tab-delimited, with a header row. Each set of rows makes up a polygon, sets of rows are separated by lines with NA on them. Columns: - lon - longitude - lat - latitude DATA-SPECIFIC INFORMATION FOR: SAMBAH_geo.txt Gives the latitude, longitude and country of each sampling location in the main SAMBAH study area. Tab-delimited, with a header row. Columns: - position - code number of sampling location - lat - latitude - lon - longitude - country - country code - see SAMBAH_1 Sweave file for correspondance between code and country names DATA-SPECIFIC INFORMATION FOR: Megametadata_SAMBAH_v6.csv Master meta-information file for each CPOD deployment in the main SAMBAH area. Comma-delimited text file, with 3 header rows and a blank row before the data starts. Header row 1 groups columns into general categories, header row 2 gives the name of the column and header row 3 gives information about the data type of the column. Starting on row 5 there is one row per CPOD deployment in the main study ara. Columns used in this analysis are (column numbers in brackets): - (1) station ID - (2) deployment ID - (11) deployment date - (13) Time difference in hours between local time and UTC - (23) recovery date - (31) recovery type (planned recovery, lost recovery, etc) - (40) download name - name of the corresponding download file from the CPOD - (47) expected days - expected number of days of logging in the file DATA-SPECIFIC INFORMATION FOR: Diel_phase_start_times_v2.csv Contains the start times of each diel phase per station and month. Comma-delimited text file, with header row. Columns: - station ID - latitude - longitude - year - month - doy - day of year - morn.start - start time of dawn diel phase - day.start - start time of day diel phase - eve.start - start time of dusk diel phase - night.start - start time of night diel phase DATA-SPECIFIC INFORMATION FOR: GreatBelt_TrialData.csv Contains results from tracking experiment. There is one line per second during each encounter for each CPOD that was active during that encounter. Comma-delimited text file, with header row. Columns used in this analysis are: - id.enc - ID code of the encounter - click - 0 if no click detected in that second, 1 if detected - distance - estimated distance from CPOD to porpoise - date - YYYY-MM-DD format - time - 24H UTC DATA-SPECIFIC INFORMATION FOR: playback_GreatBelt.csv Contains results of playback experiment at Great Belt tracking site. There is one line per playback burst of 10 clicks. Comma-delimited text file, with header row. Columns used in this analysis are: - cpod - CPOD ID - distance - distance from CPOD to transponder - SL.plan - planned source level - playback - playback number - n.detected - number of clicks detected - n.not.detected - number of clicks not detected DATA-SPECIFIC INFORMATION FOR: playback_SAMBAH.csv Contains results of playback experiment SAMBAH main survey locations. There is one line per playback burst of 10 or 20 clicks. Comma-delimited text file, with header row. Columns used in this analysis are: - station - station ID - year - month - distance - SL.plan - planned source level - n.detected - number of clicks detected - n.not.detected - number of clicks not detected - depth - station depth - geo - geology type (1-7 - see paper for details) - SST - sea-surface temperature - SSsal - sea-surface salinity See paper for further details about candidate covariates. DATA-SPECIFIC INFORMATION FOR: playback_SAMBAH_covs.csv Contains covariate values for all stations and months. results of playback experiment at Great Belt tracing site. There is one line per playback burst of 10 or 20 clicks. Comma-delimited text file, with header row. Columns used in this analysis are: - station - station ID - year - month - depth - station depth - geo - geology type (1-7 - see paper for details) - SST - sea-surface temperature - SSsal - sea-surface salinity - Time (UTC) DATA-SPECIFIC INFORMATION FOR: TagData.RData An R Data file containing a list of the 6 tag records - requires bit package to access the boolean variables (which are stored in bit format to make them compact). Each list element it iself a list, each with the same list elements: - start.POSIXct - start time of tag dataset in POSIX format - stop.POSIXct - stop time of tag dataset in POSIX format - T - number of seconds between start and stop time - is.click - length T boolean variable - TRUE if a click is detectected in that second - is.shallow - length T boolean variable - TRUE if porpoise is 2m or more shallow in that second - is.on - length T boolean variable - TRUE if acoustic record is on during that second (acoustic data is duty-cycled) - is.ok.click - length T boolean variable - TRUE if (is.on & !(is.shallow)) DATA-SPECIFIC INFORMATION FOR: TagDataDielTimes.csv Diel times of for dawn, day, evening and night for each day of tag deployments. Comma-delimited text file, with header row. Columns: HP - harbour porpoise number Date begin.ct - start of dawn (UTC) sunrise - start of day (UTC) sunset - start of dusk (UTC) end.ct - start of night (UTC) DATA-SPECIFIC INFORMATION FOR: station_info.csv Information about each sampling station in the main survey, for each month and year of the survey, used in SAMBAH_6. Comma-delimited text file, with header row. Columns: - station - station ID - year - month (1-12) - country (1-8) - name - name corresponding to country - region - SW or NW - season - Summer or Winter - lat - latitude of station - lon - longitude of station - prop.morn - proportion of the day that is is the "dawn" diel phase - prop.day - proportion of the day that is is the "day" diel phase - prop.eve - proportion of the day that is is the "dusk" diel phase - prop.night - proportion of the day that is is the "night" diel phase DATA-SPECIFIC INFORMATION FOR: detections and environment - validated and cropped - 20141013.txt Contains one record per minute of survey effort per deployment. Tab delimited with header row. The file is produced by the CPOD processing software CPOD.exe as one of the possible outputs. More information about that software is available at the Chelonia web site; currently the user manual is available at https://chelonia.co.uk/downloads/CPOD.pdf Columns: - File - Input filename - contains station and deployment e.g., 1001-A 2011 04 11 POD1358 file01.CP3 - ChunkEnd - date and time of the record to nearest minute - format dd/mm/yyyy hh:mm - e.g., 12/4/2011 00:00 UTC - Minute - minute since start of 1900 - Temp - temperature in Celcius (note - decimal comma) - Angle - angle of CPOD from vertical (note - decimal comma) - MinutesON - 1 if CPOD on that minute, 0 if off - DPM - Detection-positive minutes - 1 if click train detected that minute - Nfiltered/m - Number of filtered clicks -- i.e., for SAMBAH the number of clicks in trains classified as porpoise click trains (note - decimal comma) - Nall/m - total (unfiltered) number of clicks in the raw CP1 file for that minute - %TimeLost - percentage of time in that minute when the CPOD had reached its logging limit and was not recording further detections - SonarRisk - 0 or 1 if some risk identified - kHz_continuous_noise - frequency of any continuous noise identified - LandmarkSeq_total - Not used DATA-SPECIFIC INFORMATION FOR: click details - validated and croppped - 20141013.txt This is the second of two "raw" data files that is output by the CPOD processing software CPOD.exe as one of the possible outputs. It contains one record per click detected and classified as part of a harbour porpoise click train. More information about that software is available at the Chelonia web site; currently the user manual is available at https://chelonia.co.uk/downloads/CPOD.pdf Columns: - abbreviated file name - station and deployment - e.g., 1001-A - Date/Time (not in header list) dd/mm/yyyy hh:mm format - e.g., 15/5/2011 19:36 - Minute - minute since start of 1900 - microsec - number of microseconds since start of minute to that click (recorded at 5 microsecond resolution but converted to single microseconds so 60 milliion per minute) - cycles - number of cycles of click waveform - Pmax - maximum pressure in Pa, adjusted for the frequency response of the hydrophone - KhZ - frequency of click calculated from number of cycles and duration of click - Bandwidth - an arbitrary unit logged by CPOD based on the variation in zero-crossing intervals within the click - end kHz - frequency based on last zero-crossing interval in the click - Qn - arbitrary valu reprsenting confidence assessed by KERNO classifier that the click train that this click has been assigned to came from a true train source - Trn - Click train number from classifier (unique within each minute)