Dataset Open Access
Scientific Committee on Antarctic Research
NOTE: an updated version of this data set is available at https://zenodo.org/record/5072528
Information related to diet and energy flow is fundamental to a diverse range of Antarctic and Southern Ocean biological and ecosystem studies. This metadata record describes a database of such information being collated by the SCAR Expert Groups on Antarctic Biodiversity Informatics (EG-ABI) and Birds and Marine Mammals (EG-BAMM) to assist the scientific community in this work. It includes data related to diet and energy flow from conventional (e.g. gut content) and modern (e.g. molecular) studies, stable isotopes, fatty acids, and energetic content. It is a product of the SCAR community and open for all to participate in and use.
Data have been drawn from published literature, existing trophic data collections, and unpublished data. The database comprises five principal tables, relating to (i) direct sampling methods of dietary assessment (e.g. gut, scat, and bolus content analyses, stomach flushing, and observed predation), (ii) stable isotopes, (iii) lipids, (iv) DNA-based diet assessment, and (v) energetics values. The schemas of these tables are described below, and a list of the sources used to populate the tables is provided with the data.
A range of manual and automated checks were used to ensure that the entered data were as accurate as possible. These included visual checking of transcribed values, checking of row or column sums against known totals, and checking for values outside of allowed ranges. Suspicious entries were re-checked against original source.
Notes on names:
Names have been validated against the World Register of Marine Species (http://www.marinespecies.org/). For uncertain taxa, the most specific taxonomic name has been used (e.g. prey reported in a study as "Pachyptila sp." will appear here as "Pachyptila"; "Cephalopods" will appear as "Cephalopoda"). Uncertain species identifications (e.g. "Notothenia rossii?" or "Gymnoscopelus cf. piabilis") have been assigned the genus name (e.g. "Notothenia", "Gymnoscopelus"). Original names have been retained in a separate column to allow future cross-checking. WoRMS identifiers (APHIA_ID numbers) are given where possible.
Grouped prey data in the diet sample table need to be handled with a bit of care. Papers commonly report prey statistics aggregated over groups of prey - e.g. one might give the diet composition by individual cephalopod prey species, and then an overall record for all cephalopod prey. The PREY_IS_AGGREGATE column identifies such records. This allows us to differentiate grouped data like this from unidentified prey items from a certain prey group - for example, an unidentifiable cephalopod record would be entered as Cephalopoda (the scientific name), with "N" in the PREY_IS_AGGREGATE column. A record that groups together a number of cephalopod records, possibly including some unidentifiable cephalopods, would also be entered as Cephalopoda, but with "Y" in the PREY_IS_AGGREGATE column. See the notes on PREY_IS_AGGREGATE, below.
There are two related R packages that provide data access and functionality for working with these data. See the package home pages for more information: https://github.com/SCAR/sohungry and https://github.com/SCAR/solong.
Data table schemas
Sources data table
- SOURCE_ID: The unique identifier of this source
- DETAILS: The bibliographic details for this source (e.g. "Hindell M (1988) The diet of the royal penguin Eudyptes schlegeli at Macquarie Island. Emu 88:219–226")
- NOTES: Relevant notes about this source – if it’s a published paper, this is probably the abstract
- DOI: The DOI of the source (paper or dataset), in the form "10.xxxx/yyyy"
Diet data table
- RECORD_ID: The unique identifier of this record
- SOURCE_ID: The identifier of the source study from which this record was obtained (see corresponding entry in the sources data table)
- SOURCE_DETAILS, SOURCE_DOI: The details and DOI of the source, copied from the sources data table for convenience
- ORIGINAL_RECORD_ID: The identifier of this data record in its original source, if it had one
- LOCATION: The name of the location at which the data was collected
- WEST: The westernmost longitude of the sampling region, in decimal degrees (negative values for western hemisphere longitudes)
- EAST: The easternmost longitude of the sampling region, in decimal degrees (negative values for western hemisphere longitudes)
- SOUTH: The southernmost latitude of the sampling region, in decimal degrees (negative values for southern hemisphere latitudes)
- NORTH: The northernmost latitude of the sampling region, in decimal degrees (negative values for southern hemisphere latitudes)
- ALTITUDE_MIN: The minimum altitude of the sampling region, in metres
- ALTITUDE_MAX: The maximum altitude of the sampling region, in metres
- DEPTH_MIN: The shallowest depth of the sampling, in metres
- DEPTH_MAX: The deepest depth of the sampling, in metres
- OBSERVATION_DATE_START: The start of the sampling period
- OBSERVATION_DATE_END: The end of the sampling period. If sampling was carried out over multiple seasons (e.g. during January of 2002 and January of 2003), this will be the first and last dates (in this example, from 1-Jan-2002 to 31-Jan-2003)
- PREDATOR_NAME: The name of the predator. This may differ from predator_name_original if, for example, taxonomy has changed since the original publication, if the original publication had spelling errors or used common (not scientific) names
- PREDATOR_NAME_ORIGINAL: The name of the predator, as it appeared in the original source
- PREDATOR_APHIA_ID: The numeric identifier of the predator in the WoRMS taxonomic register
- PREDATOR_WORMS_RANK, PREDATOR_WORMS_KINGDOM, PREDATOR_WORMS_PHYLUM, PREDATOR_WORMS_CLASS, PREDATOR_WORMS_ORDER, PREDATOR_WORMS_FAMILY, PREDATOR_WORMS_GENUS: The taxonomic details of the predator, from the WoRMS taxonomic register
- PREDATOR_GROUP_SOKI: A descriptive label of the group to which the predator belongs (currently used in the Southern Ocean Knowledge and Information wiki, http://soki.aq)
- PREDATOR_LIFE_STAGE: Life stage of the predator, e.g. "adult", "chick", "larva", "juvenile". Note that if a food sample was taken from an adult animal, but that food was destined for a juvenile, then the life stage will be "juvenile" (this is common with seabirds feeding chicks)
- PREDATOR_BREEDING_STAGE: Stage of the breeding season of the predator, if applicable, e.g. "brooding", "chick rearing", "nonbreeding", "posthatching"
- PREDATOR_SEX: Sex of the predator: "male", "female", "both", or "unknown"
- PREDATOR_SAMPLE_COUNT: The number of predators for which data are given. If (say) 50 predators were caught but only 20 analysed, this column will contain 20. For scat content studies, this will be the number of scats analysed
- PREDATOR_SAMPLE_ID: The identifier of the predator(s). If predators are being reported at the individual level (i.e. PREDATOR_SAMPLE_COUNT = 1) then PREDATOR_SAMPLE_ID is the individual animal ID. Alternatively, if the data values being entered here are from a group of predators, then the PREDATOR_SAMPLE_ID identifies that group of predators. PREDATOR_SAMPLE_ID values are unique within a source (i.e. SOURCE_ID, PREDATOR_SAMPLE_ID pairs are globally unique). Rows with the same SOURCE_ID and PREDATOR_SAMPLE_ID values relate to the same predator individual or group of individuals, and so can be combined (e.g. for prey diversity analyses). Subsamples are indicated by a decimal number S.nnn, where S is the parent PREDATOR_SAMPLE_ID, and nnn (001-999) is the subsample number. Studies will sometimes report detailed prey information for a large sample, but then report prey information for various subsamples of that sample (e.g. broken down by predator sex, or sampling season). In the simplest case, the diet of each predator will be reported only once in the study, and in this scenario the PREDATOR_SAMPLE_ID values will simply be 1 to N (for N predators).
- PREDATOR_SIZE_MIN, PREDATOR_SIZE_MAX, PREDATOR_SIZE_MEAN, PREDATOR_SIZE_SD: The minimum, maximum, mean, and standard deviation of the size of the predators in the sample
- PREDATOR_SIZE_UNITS: The units of size (e.g. "mm")
- PREDATOR_SIZE_NOTES: Notes on the predator size information, including a definition of what the size value represents (e.g. "total length", "standard length")
- PREDATOR_MASS_MIN, PREDATOR_MASS_MAX, PREDATOR_MASS_MEAN, PREDATOR_MASS_SD: The minimum, maximum, mean, and standard deviation of the mass of the predators in the sample
- PREDATOR_MASS_UNITS: The units of mass (e.g. "g", "kg")
- PREDATOR_MASS_NOTES: Notes on the predator mass information, including a definition of what the mass value represents
- PREY_NAME: The scientific name of the prey item (corrected, if necessary)
- PREY_NAME_ORIGINAL: The name of the prey item, as it appeared in the original source
PREY_APHIA_ID: The numeric identifier of the prey in the WoRMS taxonomic register
- PREY_WORMS_RANK, PREY_WORMS_KINGDOM, PREY_WORMS_PHYLUM, PREY_WORMS_CLASS, PREY_WORMS_ORDER, PREY_WORMS_FAMILY, PREY_WORMS_GENUS: The taxonomic details of the prey, from the WoRMS taxonomic register
- PREY_GROUP_SOKI: A descriptive label of the group to which the prey belongs (currently used in the Southern Ocean Knowledge and Information wiki, http://soki.aq)
- PREY_IS_AGGREGATE: "Y" indicates that this row is an aggregation of other rows in this data source. For example, a study might give a number of individual squid species records, and then an overall squid record that encompasses the individual records. Use the PREY_IS_AGGREGATE information to avoid double-counting during analyses
- PREY_LIFE_STAGE: Life stage of the prey (e.g. "adult", "chick", "larva")
- PREY_SEX: The sex of the prey ("male", "female", "both", or "unknown"). Note that this is generally "unknown"
- PREY_SAMPLE_COUNT: The number of prey individuals from which size and mass measurements were made (note: this is NOT the total number of individuals of this prey type, unless all individuals in the sample were measured)
- PREY_SIZE_MIN, PREY_SIZE_MAX, PREY_SIZE_MEAN, PREY_SIZE_SD: The minimum, maximum, mean, and standard deviation of the size of the prey in the sample
- PREY_SIZE_UNITS: The units of size (e.g. "mm", "cm", "m")
- PREY_SIZE_NOTES: Notes on the prey size information, including a definition of what the size value represents (e.g. "total length", "standard length")
- PREY_MASS_MIN, PREY_MASS_MAX, PREY_MASS_MEAN, PREY_MASS_SD: The minimum, maximum, mean, and standard deviation of the mass of the prey in the sample
- PREY_MASS_UNITS: The units of mass (e.g. "mg", "g", "kg")
- PREY_MASS_NOTES: Notes on the prey mass information, including a definition of what the mass value represents
- FRACTION_DIET_BY_WEIGHT: The fraction by weight of the predator diet that this prey type made up (e.g. if Euphausia superba contributed 50% of the total mass of prey items, this value would be 0.5). Note: many papers represent very small dietary contributions as "trace" or sometimes "less than 0.1%". These have been entered as -999
- FRACTION_DIET_BY_PREY_ITEMS: The fraction (by number) of prey items that this prey type made up (e.g. if 1000 Euphausia superba were found out of a total of 2000 prey items, this value would be 0.5). Note: many papers represent very small dietary contributions as "trace" or sometimes "less than 0.1%". These have been entered as -999
- FRACTION_OCCURRENCE: The number of times this prey item occurred in a predator sample, as a fraction of the number of non-empty samples (e.g. if Euphausia superba occurred in half of the non-empty stomachs examined, this value would be 0.5). Empty stomachs are ignored for the purposes of calculating fraction of occurrence.
- FRACTION_OCCURRENCE: The number of times this prey item occurred in a predator sample, as a fraction of the number of non-empty samples (e.g. if Euphausia superba occurred in half of the non-empty stomachs examined, this value would be 0.5). Empty stomachs are ignored for the purposes of calculating fraction of occurrence. For gut content analyses (and any other study types where "no prey" can occur in a sample), the fraction of empty stomachs may also be reported, using prey_name "None". Note: many papers represent very small dietary contributions as "trace" or sometimes "less than 0.1%". These have been entered as -999
- PREY_ITEMS_INCLUDED: Which prey items were examined? For example, if the data came from a stomach contents study and all stomach contents were counted, this will be "all". Conversely, if only upper squid beaks were counted, this will be "upper beaks"
- ACCUMULATED_HARD_PARTS_TREATMENT: Only applicable to methods where hard diet remains can accumulate over time (e.g. stomach content of seabirds). How were accumulated hard parts dealt with? Some stomach content studies try to avoid over-estimation of hard parts by discarding anything other than fresh hard parts. Current values here are "included", "excluded", and "unknown"
- QUALITATIVE_DIETARY_IMPORTANCE: A qualitative description of the dietary importance of this prey item (e.g. from comments about certain prey in the discussion text of an article), if numeric values have not been given. Current values are "none", "incidental", "minor", "major", "almost exclusive", "exclusive"
- CONSUMPTION_RATE_MIN, CONSUMPTION_RATE_MAX, CONSUMPTION_RATE_MEAN, CONSUMPTION_RATE_SD: The minimum, maximum, mean, and standard deviation of the consumption rate of this prey item
- CONSUMPTION_RATE_UNITS: The units of consumption rate (e.g. "kg/day")
- CONSUMPTION_RATE_NOTES: Notes about the consumption rate estimates
- IDENTIFICATION_METHOD: How this dietary information was gathered. A single study may have used multiple methods, in which case the IDENTIFICATION_METHOD may contain multiple values (separated by commas). Current values include "scat content" (contents of scats), "stomach flushing" (physical sampling of the stomach contents by flushing the contents out with water), "stomach content" (physical sampling of the stomach contents from a dead animal), "regurgitate content" (physical sampling of the contents of forced or spontaneous regurgitations), "observed predation", "bolus content" (physical sampling of the contents of boluses), "nest detritus", "gut pigment", "unknown"
- QUALITY_FLAG: An indicator of the quality of this record. "Q" indicates that the data are known to be questionable for some reason. The reason should be in the notes column. "G" indicates good data
- IS_SECONDARY_DATA: An indicator of whether this record was entered from its primary source, or from a secondary citation. "Y" here indicates that the data actually came from another paper and were being reported in this paper as secondary data. Secondary data records are likely to be removed at a later date and replaced with information from the original source
- NOTES: Any other notes
- LAST_MODIFIED: The date of last modification of this record
Isotopes data table
(Columns that are already described in the "Diet" schema above are not included here)
- TAXON_*: As for "PREDATOR_*" in the diet data table
- TAXON_SAMPLE_ID: The identifier of the animal(s). If animals are being reported at the individual level (i.e. TAXON_SAMPLE_COUNT = 1) then TAXON_SAMPLE_ID is the individual animal ID. Alternatively, if the data values being entered here are from a group of animals, then the TAXON_SAMPLE_ID identifies that group of animals. TAXON_SAMPLE_ID values are unique within a source. Rows with the same SOURCE_ID and TAXON_SAMPLE_ID values relate to the same individual(s), but may represent different processing methods, different physical samples (see PHYSICAL_SAMPLE_ID) or different analytical replicates (see ANALYTICAL_REPLICATE_ID). In the simplest case, the isotopes of each animal will be reported at the individual-animal level and based on only one processing method, and in this scenario the TAXON_SAMPLE_ID values will simply be 1 to N (for N individual animals)
- PHYSICAL_SAMPLE_ID: Where multiple samples were taken from one individual animal, this column will identify the samples. This will be blank kif only one physical sample was taken from each TAXON_SAMPLE_ID, or if the results were aggregated for reporting
- ANALYTICAL_REPLICATE_ID: Where the lab analysis was replicated on each physical sample (i.e. multiple sub-samples of each sample were run through the machine), this column will identify the replicates. This column will be blank if the lab analysis for each PHYSICAL_SAMPLE_ID was not replicated, or if the results were aggregated for reporting
- ANALYTICAL_REPLICATE_COUNT: If lab analyses were replicated but the data here represent the aggregated results over the replicates, this column will indicate the number of replicates. The ANALYTICAL_REPLICATE_ID column in this case will be blank, because the data pertain to multiple replicates
- SAMPLES_WERE_POOLED: If "Y", multiple physical samples were pooled for analysis (likely because of a minimum required volume or mass of matter for the analytical process)
- MEASUREMENT_NAME: the name of the quantity being reported ("delta_15N", "C:N mass ratio", "standard length", "wet weight")
- MEASUREMENT_MIN_VALUE, MEASUREMENT_MAX_VALUE, MEASUREMENT_MEAN_VALUE, MEASUREMENT_VARIABILITY_VALUE: The minimum, maximum, mean, and variability of the measured values
- MEASUREMENT_VARIABILITY_TYPE: the type of variability reported ("SD", "SE")
- MEASUREMENT_UNITS: the units of measurement ("per mil", "mm", "mg")
- MEASUREMENT_METHOD: a description of the measurement method
- ISOTOPES_CARBONATES_TREATMENT: How were carbonates treated in the sample processing? Currently "acidification" (acid used to remove carbonates from samples), "none" (no carbonate treatment), or "unknown"
- ISOTOPES_LIPIDS_TREATMENT: How were lipids treated in the sample processing? Currently either "chemical delipidation" (where lipids were removed chemically), "mathematical correction" (where a mathematical model was used to correct for the effects of lipids), "none" (for no lipid treatment), or "unknown"
- ISOTOPES_PRETREATMENT: Any other pretreatment (free text)
- ISOTOPES_ARE_ADJUSTED: "Y" here indicates that the isotope values have been adjusted in some way not already described in the other columns (e.g. values derived from blood samples might be adjusted to make them comparable to tissue sample values)
- ISOTOPES_ADJUSTMENT_NOTES: if ISOTOPES_ARE_ADJUSTED, notes on the adjustment applied (e.g. "Adjusted values are corrected to represent muscle tissue")
- ISOTOPES_BODY_PART_USED: Which part of the organism was sampled?
Lipids data table
(Columns that are already described in the "Diet" or "Isotopes" schemas above are not included here)
- MEASUREMENT_NAME: the name of the quantity being reported ("lipid content", "monounsaturated fatty alcohol content", "18:1n-7 content", "wet weight")
- MEASUREMENT_CLASS: where the measurement could apply to e.g. either fatty acids or fatty alcohols, this column is used to clarify (e.g. "fatty acid", "fatty alcohol", "triacylglycerol fatty acid", "wax ester fatty acid")
Energetics data table
All of the columns in this data table have been described in the schemas above.
DNA diet data table
(Columns that are already described in the schemas above are not included here)
- SEQUENCES_TOTAL: The total sequence count for this predator sample
- DNA_CONCENTRATION: Sample DNA concentration if recorded, in nM/µl
- FRACTION_SEQUENCES_BY_PREY: The fraction of SEQUENCES_TOTAL that this prey type made up (e.g. if Euphausia superba contributed 50% of the total sequences of prey items, this value would be 0.5). Note: many papers represent very small dietary contributions as "trace" or sometimes "less than 0.1%". These have been entered as -999
- FRACTION_OCCURRENCE: The fraction of predator samples in which this prey item occurred (e.g. if Euphausia superba occurred in half of the scats collected, this value would be 0.5). Note: many papers represent very small dietary contributions as "trace" or sometimes "less than 0.1%". These have been entered as -999
- SAMPLE_TYPE: Sample type that the DNA was extracted from, e.g. "scat", "stomach content"
- DNA_EXTRACTION_METHOD: The method used to extract DNA (e.g. "DNA stool kit", "Maxwell robot", "salting out procedure")
- ANALYSIS_TYPE: e.g. "High-throughput sequencing", "cloning", "PCR amplification only"
- SEQUENCING_PLATFORM: e.g. "Ion torrent", "Miseq"
- TARGET_GENE: The gene area targeted, e.g. "16S", "12S", "18S", "CO1"
- TARGET_FOOD_GROUP: For the 18S region, this might be "all eukaryotes"; for 16S or 12S, this might be "fish" or "vertebrates"
- FORWARD_PRIMER: The sequence of the forward primer used, in the 5'-to-3' direction
- REVERSE_PRIMER: The sequence of the reverse primer used, in the 5'-to-3' direction
- BLOCKING_PRIMER: The sequence of the blocking primer if used, in the 5'-to-3' direction
- PRIMER_SOURCE_ID: The ID of the paper reference for where the primer was first designed. This reference will likely include the PCR conditions, annealing temperature and alignment of the primers
- PRIMER_SOURCE_DETAILS, PRIMER_SOURCE_DOI: The details and DOI of the PRIMER_SOURCE_ID, copied from the sources data table for convenience
- SEQUENCE_SOURCE_ID: The database that contains the sequence data, e.g. "Dryad", "GenBank"
- SEQUENCE_SOURCE_DETAILS, SEQUENCE_SOURCE_DOI: The details and DOI of the SEQUENCE_SOURCE_ID, copied from the sources data table for convenience
- SEQUENCE: DNA sequence for OTU or OTU cluster
- OTHER_METHODS_APPLIED: Were there any other methods applied to the sample to either improve amplification or block sequences?