SCLabels: Labelled rectified RGB images from the Spanish CoastSnap network
Creators
Contributors
Description
Training dataset
The SCLabels dataset is intended to be used in the exploring and development of Artificial Intelligence (AI) applications aimed at the automation of the shoreline extraction process from rectified images. SCLabels includes rectified RGB images from the Spanish CoastSnap network and their corresponding masks, together with a metadata file and a README file. RGB images encompass variable geographic locations, fields of view, beach types and degrees of occupation, tidal regimes, meteoceanic and lightning conditions, and a variety of environmental characteristics. Masks account for dense pixel labels including 5 categories: i) No data; ii) Not classified; iii) Landwards; iv) Seawards; and v) Shoreline. In the metadata file, images are linked to their corresponding masks, and information about the geographic location of each image, capture characteristics and image source, shoreline position and other auxiliary data are provided. The README file enhances the explainability and comprehension of the dataset, elaborating on the context and contents, and providing detailed explanations of the metadata, potential limitations, technical aspects of the image processing and annotation stages, usage recommendations, and related works.
Technical details
The SCLabels dataset version 1.0.0 is packaged in a compressed file (SCLabels_v1.0.0.zip). A total of 1717 RGB images are shared in JPG format, corresponding masks in PNG format, a metadata file in JSON format, and the README file in PDF format.
Data preprocessing
To generate the SCLabels masks, rectified RGB images and their corresponding shorelines were used. RGB images were cropped to the minimum and maximum alongshore pixel coordinates of the shoreline (vertical axis) plus 10 additional pixels above and below to preserve contextual information. A grayscale image was then derived from each cropped RGB image for subsequent pixel labelling. First, a binary mask was derived, marking "NoData'' for black and white padded pixels resulting from the registration and rectification steps. Subsequently, the shoreline was densified, ensuring at least one pixel per row was assigned the "Shoreline" label. Next, "Landwards" and "Seawards" labels were assigned to the right and left of the shoreline. Pixels left unlabelled were categorised as "NotClassified". Finally, masks’ values were reclassified to align with the predefined labels, and the grayscale masks were exported. For additional information, please consult the README file.
Data splitting
Data splitting requirements may vary depending on the chosen AI approach (e.g., splitting by entire images, image patches, or image rows). Researchers should use a consistent data splitting method and document the approach and splits used in publications. This transparency enables reproducible results and facilitates comparisons between studies.
Classes, labels and annotations
The SCLabels dataset includes one mask per rectified RGB image, sharing the same width and height. These masks are in greyscale and PNG format, and consist of five different labels:
Mask value | Label | Description |
0 | NoData | High probability of being black or white padded pixels, used to pad non-rectangular images within the image registration and rectification processes |
25 | NotClassified | Not labeled pixels |
75 | Landwards | All pixels that are towards the landside with respect to the shoreline (row-wise), excluding “NoData” ones |
150 | Seawards | All pixels that are towards the seaside with respect to the shoreline (row-wise), excluding “NoData” ones |
255 | Shoreline | Pixels intersected by the mapped shoreline densified to cover one pixel per row, at least |
Parameters
RGB values or any transformation in the colour space can be used as parameters.
Data sources
In the CoastSnap initiative, citizens capture images (oblique smartphone photos) from fixed CoastSnap stations and share them with the scientific managers. Images are subjected to a quality control process, spatially registered to a designated target image, and rectified (georeferencing). The shoreline is subsequently digitised from each rectified image.
Data quality
All images included have been supervised by CSs’ scientific managers. However, citizen scientists take images by smartphones (different camera quality) at irregular intervals across various sites with varying weather and illumination conditions. Users of SCLabels dataset must be aware of this variance.
Image resolution
The resolution of the images depends on the CoastSnap station and the length of the shoreline, ranging from 241x188 pixels to 801x796 pixels.
Spatial coverage
The SCLabels dataset version 1.0.0 contains data from five Spanish CoastSnap stations, including sandy beaches in the northwest (agrelo), the Cíes Islands (cies), the south (cadiz), and the Balearic Islands (samarador and arenaldentem).
CoastSnap station | Longitude | Latitude |
agrelo | -8.772 | 42.331 |
cies | -8.900 | 42.226 |
cadiz | -6.288 | 36.522 |
samarador | 3.185 | 39.350 |
arenaldentem | 2.974 | 39.353 |
Contact information
For further technical inquiries or additional information about the annotated dataset, please contact jsoriano@socib.es.
Notes (English)
Files
SCLabels_v1.0.0.zip
Files
(571.4 MB)
Name | Size | Download all |
---|---|---|
md5:77eea257c247a5e3f2353f6110c61b64
|
571.4 MB | Preview Download |
Additional details
Related works
- Is derived from
- Dataset: 10.5281/zenodo.8056415 (DOI)
- Journal article: 10.5194/essd-15-4613-2023 (DOI)
Funding
Dates
- Created
-
2023-11