Published November 20, 2023 | Version 1.0.0
Dataset Open

SCLabels: Labelled rectified RGB images from the Spanish CoastSnap network

  • 1. ROR icon Balearic Islands Coastal Observing and Forecasting System
  • 2. ROR icon Universidade de Vigo
  • 1. ROR icon Balearic Islands Coastal Observing and Forecasting System
  • 2. ROR icon Universitat de les Illes Balears

Description

Training dataset

The SCLabels dataset is intended to be used in the exploring and development of Artificial Intelligence (AI) applications aimed at the automation of the shoreline extraction process from rectified images. SCLabels includes rectified RGB images from the Spanish CoastSnap network and their corresponding masks, together with a metadata file and a README file. RGB images encompass variable geographic locations, fields of view, beach types and degrees of occupation, tidal regimes, meteoceanic and lightning conditions, and a variety of environmental characteristics. Masks account for dense pixel labels including 5 categories: i) No data; ii) Not classified; iii) Landwards; iv) Seawards; and v) Shoreline. In the metadata file, images are linked to their corresponding masks, and information about the geographic location of each image, capture characteristics and image source, shoreline position and other auxiliary data are provided. The README file enhances the explainability and comprehension of the dataset, elaborating on the context and contents, and providing detailed explanations of the metadata, potential limitations, technical aspects of the image processing and annotation stages, usage recommendations, and related works.  

Technical details

The SCLabels dataset version 1.0.0 is packaged in a compressed file (SCLabels_v1.0.0.zip). A total of 1717 RGB images are shared in JPG format, corresponding masks in PNG format, a metadata file in JSON format, and the README file in PDF format.

Data preprocessing

To generate the SCLabels masks, rectified RGB images and their corresponding shorelines were used. RGB images were cropped to the minimum and maximum alongshore pixel coordinates of the shoreline (vertical axis) plus 10 additional pixels above and below to preserve contextual information. A grayscale image was then derived from each cropped RGB image for subsequent pixel labelling. First, a binary mask was derived, marking "NoData'' for black and white padded pixels resulting from the registration and rectification steps. Subsequently, the shoreline was densified, ensuring at least one pixel per row was assigned the "Shoreline" label. Next, "Landwards" and "Seawards" labels were assigned to the right and left of the shoreline. Pixels left unlabelled were categorised as "NotClassified". Finally, masks’ values were reclassified to align with the predefined labels, and the grayscale masks were exported. For additional information, please consult the README file. 

Data splitting

Data splitting requirements may vary depending on the chosen AI approach (e.g., splitting by entire images, image patches, or image rows). Researchers should use a consistent data splitting method and document the approach and splits used in publications. This transparency enables reproducible results and facilitates comparisons between studies.

Classes, labels and annotations

The SCLabels dataset includes one mask per rectified RGB image, sharing the same width and height. These masks are in greyscale and PNG format, and consist of five different labels:

 Mask value         Label                                                                              Description
0 NoData High probability of being black or white padded pixels, used to pad non-rectangular images within the image registration and rectification processes
25 NotClassified Not labeled pixels
75 Landwards All pixels that are towards the landside with respect to the shoreline (row-wise), excluding “NoData” ones
150 Seawards All pixels that are towards the seaside with respect to the shoreline (row-wise), excluding “NoData” ones
255 Shoreline Pixels intersected by the mapped shoreline densified to cover one pixel per row, at least

Parameters

RGB values or any transformation in the colour space can be used as parameters.

Data sources

In the  CoastSnap initiative, citizens capture images (oblique smartphone photos) from fixed CoastSnap stations and share them with the scientific managers. Images are subjected to a quality control process, spatially registered to a designated target image, and rectified (georeferencing). The shoreline is subsequently digitised from each rectified image.

Data quality

All images included have been supervised by CSs’ scientific managers. However, citizen scientists take images by smartphones (different camera quality) at irregular intervals across various sites with varying weather and illumination conditions. Users of SCLabels dataset must be aware of this variance.

Image resolution

The resolution of the images depends on the CoastSnap station and the length of the shoreline, ranging from 241x188 pixels to 801x796 pixels.

Spatial coverage

The SCLabels dataset version 1.0.0 contains data from five Spanish CoastSnap stations, including sandy beaches in the northwest (agrelo), the Cíes Islands (cies), the south (cadiz), and the Balearic Islands (samarador and arenaldentem).

  CoastSnap station  Longitude   Latitude
agrelo -8.772 42.331
cies -8.900 42.226
cadiz -6.288 36.522
samarador 3.185 39.350
arenaldentem 2.974 39.353

Contact information

For further technical inquiries or additional information about the annotated dataset, please contact jsoriano@socib.es.

Notes (English)

The SCLabels dataset is a product of the "Beach Monitoring Use Case" within the "iMagine project" with funding from the European Union's Horizon Europe research and innovation programme. The authors express their gratitude to the project managers and all partners involved for fostering the creation of open-access image repositories for AI-based image analysis services. Special thanks are extended to the researchers and institutions that contributed to the SCShores dataset, which forms the foundation for SCLabels. Additionally, the authors appreciate the invaluable contributions of citizen scientists and fellow practitioners of citizen science for monitoring coastlines.

Files

SCLabels_v1.0.0.zip

Files (571.4 MB)

Name Size Download all
md5:77eea257c247a5e3f2353f6110c61b64
571.4 MB Preview Download

Additional details

Related works

Is derived from
Dataset: 10.5281/zenodo.8056415 (DOI)
Journal article: 10.5194/essd-15-4613-2023 (DOI)

Funding

This work was supported by the iMagine project with funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No. 101058625. 101058625
European Union
iMagine – Imaging data and services for aquatic science 101058625
European Commission

Dates

Created
2023-11