Published May 15, 2017 | Version v1
Dataset Open

Sixty-one thousand Recent planktonic foraminifera from the Atlantic Ocean

  • 1. Yale University
  • 2. Swedish Museum of Natural HIstory
  • 3. University of Kansas
  • 4. University of California, Berkeley

Description

Here we provide an extensive image library of Recent microfossils (primarily planktonic foraminifera) from the Atlantic Ocean, with accompanying 2D and 3D coordinate data and morphometric measurements.  This data was generated using high-throughput imaging methods (AutoMorph) developed in P.M. Hull's lab at Yale University.

The dataset consists of microfossils from 34 sediment samples, mounted on 155- micropaleontological slides, primarily from the North Atlantic Ocean (metadata_tables.tar.gz: Table 1). All slides are accessioned to the Yale Peabody Museum of Natural History (YPM) Division of Invertebrate Paleontology with unique YPM catalog numbers (metadata_tables.tar.gz: Table 2). Slides of microfossils were imaged at multiple focal heights (z-planes) using a light microscope with an automated stage and processed with the image processing models of AutoMorph as detailed in Table 2. AutoMorph software and tutorials can be accessed here: https://github.com/HullLab. For an example of a raw slide scan see: Hsiang, Allison Y., Nealson, Kaylea, Elder, Leanne E., Liu, Yusu, & Hull, Pincelli M. (2016). Slide scan example for Automorph. Zenodo. http://doi.org/10.5281/zenodo.167557. 

124,230 unique objects (primarily microfossils) were identified from the 155 slides of 34 sediment samples, and were classified into 16 object categories (metadata_tables.tar.gz: Table 3).  Object classification is provided in Table 3 and summarized in Table 4 (metadata_tables.tar.gz). Table 5 in metadata_tables.tar.gz  provides a technical validation of the automated 2D morphometric measurements; comparable data validation for the 3D morphometric measurements are provided in Hsiang et al. 2016 (http://dx.doi.org/10.1098/rstb.2015.0227).

Images and morphometric data are provided in 12 additional datasets:

1) slide_images.tar.gz contains one image for each slide scanned in this study (155 slides), with a red box around each object extracted using the AutoMorph segment module. Slides are named according to their YPM catalog number and related sample and site information can be found in Tables 1 and 2 (metadata_tables.tar.gz), and in the YPM database (http://collections.peabody.yale.edu/search/) by YPM catalog number. 

2) edf_images.tar.gz contains the extended depth of focus images (EDF: a 2D image composite created from multiple z-stacked photographic images) generated by the AutoMorph focus module for each of the 124,230 individual objects identified by segment.

3 & 4) obj_zstacks_part1.tar.gz and obj_zstacks_part2.tar.gz contain the original zstack images of each object.  2D outlines and shape measurements for each object are extracted using the AutoMorph module run2dmorph. 

5) 2d_outline_check.tar.gz provides an overlay of the extracted 2D outline on the object EDF for quality control purposes for all extracted objects (113,847 objects) and a text file of all objects with failed 2D extractions (10,384 objects).

6) 2d_coordinates.tar.gz provides the 2D coordinates of each object in a single csv (all_coordinates.csv) and by slide (155 csv files named according to YPM catalog number), and a text file of all objects with failed 2D extractions (10,384 objects).

7) shape_measurements.csv contains the complete list of all objects in the dataset (124,230 objects) with the 2D and 3D measurements extracted by the AutoMorph routines run2morph and run3dmorph when available.

8 & 9) 3d_pdfs_part1.tar.gz and 3d_pdfs_part2.tar.gz provide 3D pdfs of each 3D object extracted (109,207 objects) for quality control purposes and a text file of all objects with failed 3D extractions (15,023 objects). 3D pdfs, meshes and shape measurements were generated by the AutoMorph module run3dmorph. Note that only some pdf viewers are able to display 3d pdfs properly. 

10-12) 3d_obj_files_part1.tar.gz, 3d_obj_files_part2.tar.gz, and 3d_obj_files_part3.tar.gz provide the 3D mesh coordinates as obj files for each extracted object and a text file of all objects with failed 3D extractions (15,023 objects).

This dataset accompanies the manuscript "Sixty-one thousand Recent planktonic foraminifera from the Atlantic Ocean" submitted to Scientific Data. The manuscript describes important details related to data collection and usage and should be consulted before using the data provided here. One key note about this dataset is repeated here as a precaution. We provide image classification for only 4/5ths of the complete data set. A random subset (1/5th of the classifications) are excluded as a test set, so that this image database can be used in machine learning.

Notes

This work was funded by the American Chemical Society PRF Grant #55837-DNI8 and by support from Yale University.

Files

shape_measurements.csv

Files (300.4 GB)

Name Size Download all
md5:632107fe9e85cbf371900d557d08f071
467.7 MB Download
md5:c71caf9905b1d402887d6e251eeae977
13.1 GB Download
md5:a6e04e84c995e4500ebdef7bbac6c3ad
44.2 GB Download
md5:37d6145e32f58ae33ecb0ecf1a3eb297
38.7 GB Download
md5:91308a661506f10ef9ddc797c40ceb29
31.2 GB Download
md5:09a67193e4fbc468dcb8140b7094046b
46.1 GB Download
md5:df50bf8372f6a035e2041346c717bcea
37.3 GB Download
md5:738901c0907d1a4ae05b437d9c39133d
8.4 GB Download
md5:041c6e9abe0caa9b5b6a80fd967a7cb4
2.5 MB Download
md5:df6e2c64e07b041fdc364d98767b019d
41.2 GB Download
md5:60e903f5ed81bf38bb3ef395fa3c1910
33.3 GB Download
md5:042e06975198dfc2ae0f551e0a4dd592
25.4 MB Preview Download
md5:b05dc29a28185cdc4e24c83c326d4f67
6.4 GB Download

Additional details

Related works