Published February 27, 2024 | Version 1.0
Dataset Open

The Turku UAS DeepSeaSalama - GAN dataset 1 (TDSS-G1)

Contributors

Data curator:

  • 1. ROR icon Turku University of Applied Sciences

Description

The Turku UAS DeepSeaSalama-GAN dataset 1 (TDSS-G1) is a comprehensive image dataset obtained from a maritime environment. This dataset was assembled in the southwest Finnish archipelago area at Taalintehdas, using two stationary RGB fisheye cameras in the month of August 2022. The technical setup is described in the section “Sensor Platform design” in report “Development of Applied Research Platforms for Autonomous and Remotely Operated Systems” (https://www.theseus.fi/handle/10024/815628).

The data collection and annotation process was carried out in the Autonomous and Intelligent Systems laboratory at Turku University of Applied Sciences. The dataset is a blend of original images captured by our cameras and synthetic data generated by a Generative Adversarial Network (GAN), simulating 18 distinct weather conditions.

The TDSS-G1 dataset comprises 199 original images and a substantial addition of 3582 synthetic images, culminating in a total of 3781 annotated images. These images provide a diverse representation of various maritime objects, including motorboats, sailing boats, and seamarks.

The creation of TDSS-G1 involved extracting images from videos recorded in MPEG format, with a resolution of 720p at 30 frames per second (FPS). An image was extracted every 100 milliseconds.

The distribution of labels within TDSS-G1 is as follows: motorboats (62.1%), sailing boats (16.8%), and seamarks (21.1%).

This distribution highlights a class imbalance, with motorboats being the most represented class and sailing boats being the least. This imbalance is an important factor to consider during the model training process, as it could influence the model’s ability to accurately recognize underrepresented classes. In the future synthetic datasets, vision Transformers will be used to tackle this problem.

 

The TDSS-G1 dataset is organized into three distinct subsets for the purpose of training and evaluating machine learning models. These subsets are as follows:

  • Training Set: Located in dataset/train/images, this set is used to train the model. It learns to recognize the different classes of maritime objects from this data.
  • Validation Set: Stored in dataset/valid/images, this set is used to tune the model parameters and to prevent overfitting during the training process.
  • Test Set: Found in dataset/test/images, this set is used to evaluate the final performance of the model. It provides an unbiased assessment of how the model will perform on unseen data.

The dataset comprises three classes (nc: 3), each representing a different type of maritime object. The classes are as follows:

  1. Motor Boat (motor_boat)
  2. Sailing Boat (sailing_boat)
  3. Seamark (seamark)

These labels correspond to the annotated objects in the images. The model trained on this dataset will be capable of identifying these three types of maritime objects. As mentioned earlier, the distribution of these classes is imbalanced, which is an important factor to consider during the training process.

Files

TDSS-G1.pdf

Files (1.7 GB)

Name Size Download all
md5:14c399c86cee28e1787b58c5ffc193c1
313.1 kB Preview Download
md5:18833610ffc04279b1b1074a698f53e2
1.7 GB Preview Download

Additional details

Funding

European Union
RoboSea A80845
European Union
TEHOTEKO A78624
European Union
SafeSea A80633

Dates

Created
2024-02
Dataset with synthetic data is created