SPARUS-LD: Gilthead seabream (Sparus aurata) landmark detection dataset
Authors/Creators
Description
SPARUS-LD: Gilthead seabream (Sparus aurata) landmark detection dataset
The dataset contains 2052 high-resolution images of Gilthead seabream specimen annotated with 18 landmarks/keypoints. Additionally, each specimen is associated with one of the three classes according to its origin: wild, farmed, or farm-associated.
Landmarks
Short descriptions of the 18 labeled landmarks are as follows:
- anterior tip of snout at the upper jaw
- vertical point above the most anterior point in the eye
- anterior insertion of the dorsal fin
- last spiny ray of the dorsal fin
- posterior insertion of the dorsal fin
- dorsal point at the least depth of the caudal peduncle
- posterior body extremity
- ventral point at the least depth of the caudal peduncle
- posterior insertion of the anal fin
- anterior insertion of the anal fin
- insertion of the pelvic fin
- ventral tip of the insertion of the operculum on the lateral profile
- point of maximum extension of the operculum on the lateral profile
- anterior extremity of the lateral line on the head profile
- dorsal insertion of the pectoral fin
- ventral insertion of the pectoral fin
- the most anterior point in the eye
- the most posterior point in the eye
Subsets
The dataset is split into three subsets for training, validation and testing. The distribution of images according to the subset and the origin is given in the table below.
| Origin | Training | Validation | Test |
|---|---|---|---|
| Wild | 419 | 97 | 190 |
| Farmed | 411 | 100 | 324 |
| Farm-associated | 254 | 51 | 206 |
| Total | 1084 | 248 | 720 |
File structure
Dataset file structure is illustrated below.
├── SPARUS-LD│ ├── res_5184x3456│ │ ├── train_landmark_configuration_expert.TPS│ │ ├── val_landmark_configuration_expert.TPS│ │ ├── test_landmark_configuration_expert.TPS│ │ ├── test_landmark_configuration_novice.TPS│ │ ├── test_landmark_configuration_machine.TPS│ │ ├── train│ │ │ ├── *.JPG│ │ ├── test│ │ │ ├── *.JPG│ │ ├── val│ │ │ ├── *.JPG
Each subset is accompanied by a single TPS file containing the landmark coordinates annotated by the expert and a directory with the corresponding images. Image files follow the naming convention {origin}_{id}.JPG, where origin can be wild, farmed, or farm-assoc, and id ranges from 00001 to 02052. This naming scheme allows the specimen’s origin to be automatically inferred from the image name.
For the test set, additional TPS files are provided that include landmark coordinates annotated by the machine (our deep learning model) and the novice annotator, in addition to the expert annotation.
The TPS file specifies the landmark coordinates for all images of the specific subset. The TPS represents one of the standard formats for geometric morphometrics. It is actually a text file, which means it can be read and edited with any regular text editor (e.g. notepad, gedit). It can also be easily read into python structures using the py-tps library. In our case, the TPS file structure is the following:
LM=18
w_1 h_1
w_2 h_2
...
w_18 h_18
IMAGE=image1_name
ID=image1_id
SCALE=image1_scale
LM=18
w_1 h_1
w_2 h_2
...
w_18 h_18
IMAGE=image2_name
ID=image2_id
SCALE=image2_scale
The line LM=18 marks the beginning of a new TPS record describing one specimen and the corresponding image. The following 18 lines w_i h_i specify the coordinates of the corresponding landmark, where w_i describes the width in pixels measured from the left side, and h_i height in pixels measured from the bottom side. Note that if you work with some image processing libraries, you may want to convert the height coordinates to be measured from the top side. The next two lines IMAGE=... and ID=... describe the corresponding image name and image id. Finally, the line SCALE=... specifies the image scale expressed in pixels/cm, which enables the conversion from pixel measurement to real-world metric measurement.
Additional Metadata
A supplementary CSV file (sparusld_per_specimen_metadata.csv) provides per-specimen metadata with the following columns: image, subset, origin, population, year_sampled, latitude, and longitude. This file enables linking each image to its respective subset, specimen origin, sampling population, and collection coordinates.
Code repository
Using this dataset, we developed a new method for automated landmark detection of Gilthead seabream based on deep learning. Check the details in our github repository.
Files
iso_meta.xml
Additional details
Related works
- Is part of
- Journal article: 10.3354/aei00294 (DOI)
- Journal article: 10.3389/fmars.2021.694627 (DOI)
Funding
- Croatian Science Foundation
- Enhancing Environmental Performance of Net-Pen Marine Aquaculture HRZZ-IP-2022-10-7232
Dates
- Collected
-
2015/2021Time period where all specimen samples were collected.
Software
- Repository URL
- https://github.com/jsaric/sparus-ld
- Programming language
- Python
- Development Status
- Active
Biodiversity
- Life stage
- Adult
- Sample size unit
- 2052
- Locality
- Eastern Adriatic Sea
- Country
- Croatia
- Species
- Gilthead seabream (Sparus aurata)