The Turku UAS DeepSeaSalama - GAN dataset 1 (TDSS-G1)
Creators
Description
The Turku UAS DeepSeaSalama-GAN dataset 1 (TDSS-G1) is a comprehensive image dataset obtained from a maritime environment. This dataset was assembled in the southwest Finnish archipelago area at Taalintehdas, using two stationary RGB fisheye cameras in the month of August 2022. The technical setup is described in the section “Sensor Platform design” in report “Development of Applied Research Platforms for Autonomous and Remotely Operated Systems” (https://www.theseus.fi/handle/10024/815628).
The data collection and annotation process was carried out in the Autonomous and Intelligent Systems laboratory at Turku University of Applied Sciences. The dataset is a blend of original images captured by our cameras and synthetic data generated by a Generative Adversarial Network (GAN), simulating 18 distinct weather conditions.
The TDSS-G1 dataset comprises 199 original images and a substantial addition of 3582 synthetic images, culminating in a total of 3781 annotated images. These images provide a diverse representation of various maritime objects, including motorboats, sailing boats, and seamarks.
The creation of TDSS-G1 involved extracting images from videos recorded in MPEG format, with a resolution of 720p at 30 frames per second (FPS). An image was extracted every 100 milliseconds.
The distribution of labels within TDSS-G1 is as follows: motorboats (62.1%), sailing boats (16.8%), and seamarks (21.1%).
This distribution highlights a class imbalance, with motorboats being the most represented class and sailing boats being the least. This imbalance is an important factor to consider during the model training process, as it could influence the model’s ability to accurately recognize underrepresented classes. In the future synthetic datasets, vision Transformers will be used to tackle this problem.
The TDSS-G1 dataset is organized into three distinct subsets for the purpose of training and evaluating machine learning models. These subsets are as follows:
- Training Set: Located in dataset/train/images, this set is used to train the model. It learns to recognize the different classes of maritime objects from this data.
- Validation Set: Stored in dataset/valid/images, this set is used to tune the model parameters and to prevent overfitting during the training process.
- Test Set: Found in dataset/test/images, this set is used to evaluate the final performance of the model. It provides an unbiased assessment of how the model will perform on unseen data.
The dataset comprises three classes (nc: 3), each representing a different type of maritime object. The classes are as follows:
- Motor Boat (motor_boat)
- Sailing Boat (sailing_boat)
- Seamark (seamark)
These labels correspond to the annotated objects in the images. The model trained on this dataset will be capable of identifying these three types of maritime objects. As mentioned earlier, the distribution of these classes is imbalanced, which is an important factor to consider during the training process.
Files
TDSS-G1.pdf
Files
(1.7 GB)
Name | Size | Download all |
---|---|---|
md5:14c399c86cee28e1787b58c5ffc193c1
|
313.1 kB | Preview Download |
md5:18833610ffc04279b1b1074a698f53e2
|
1.7 GB | Preview Download |
Additional details
Dates
- Created
-
2024-02Dataset with synthetic data is created