Published April 24, 2025 | Version v2
Dataset Open

SPARUS-LD: Gilthead seabream (Sparus aurata) landmark detection dataset

Contributors

Contact person:

Data curator:

  • 1. ROR icon University of Ljubljana
  • 2. ROR icon Institute of Oceanography and Fisheries

Description

SPARUS-LD: Gilthead seabream (Sparus aurata) landmark detection dataset

The dataset contains 2052 high-resolution images of Gilthead seabream specimen annotated with 18 landmarks/keypoints. Additionally, each specimen is associated with one of the three classes according to its origin: wild, farmed, or farm-associated. 

Landmarks

Short descriptions of the 18 labeled landmarks are as follows:

  1. anterior tip of snout at the upper jaw
  2. vertical point above the most anterior point in the eye
  3. anterior insertion of the dorsal fin
  4. last spiny ray of the dorsal fin
  5. posterior insertion of the dorsal fin
  6. dorsal point at the least depth of the caudal peduncle
  7. posterior body extremity
  8. ventral point at the least depth of the caudal peduncle
  9. posterior insertion of the anal fin
  10. anterior insertion of the anal fin
  11. insertion of the pelvic fin
  12. ventral tip of the insertion of the operculum on the lateral profile
  13. point of maximum extension of the operculum on the lateral profile
  14. anterior extremity of the lateral line on the head profile
  15. dorsal insertion of the pectoral fin
  16. ventral insertion of the pectoral fin
  17. the most anterior point in the eye
  18. the most posterior point in the eye

Subsets

The dataset is split into three subsets for training, validation and testing. The distribution of images according to the subset and the origin is given in the table below.

Origin Training Validation Test
Wild 419 97 190
Farmed 411 100 324
Farm-associated 254 51 206
Total 1084 248 720

File structure

Dataset file structure is illustrated below.

├── SPARUS-LD
│   ├── res_5184x3456
│   │   ├── train_landmark_configuration_expert.TPS
│   │   ├── val_landmark_configuration_expert.TPS
│   │   ├── test_landmark_configuration_expert.TPS
│   │   ├── test_landmark_configuration_novice.TPS
│   │   ├── test_landmark_configuration_machine.TPS
│   │   ├── train
│   │   │   ├── *.JPG
│   │   ├── test
│   │   │   ├── *.JPG
│   │   ├── val
│   │   │   ├── *.JPG

Each subset is accompanied by a single TPS file containing the landmark coordinates annotated by the expert and a directory with the corresponding images. Image files follow the naming convention {origin}_{id}.JPG, where origin can be wild, farmed, or farm-assoc, and id ranges from 00001 to 02052. This naming scheme allows the specimen’s origin to be automatically inferred from the image name.

For the test set, additional TPS files are provided that include landmark coordinates annotated by the machine (our deep learning model) and the novice annotator, in addition to the expert annotation.

The TPS file specifies the landmark coordinates for all images of the specific subset. The TPS represents one of the standard formats for geometric morphometrics. It is actually a text file, which means it can be read and edited with any regular text editor (e.g. notepad, gedit). It can also be easily read into python structures using the py-tps library. In our case, the TPS file structure is the following:

LM=18
w_1 h_1
w_2 h_2
...
w_18 h_18
IMAGE=image1_name
ID=image1_id
SCALE=image1_scale

LM=18
w_1 h_1
w_2 h_2
...
w_18 h_18
IMAGE=image2_name
ID=image2_id
SCALE=image2_scale

The line LM=18 marks the beginning of a new TPS record describing one specimen and the corresponding image. The following 18 lines w_i h_i specify the coordinates of the corresponding landmark, where w_i describes the width in pixels measured from the left side, and h_i height in pixels measured from the bottom side. Note that if you work with some image processing libraries, you may want to convert the height coordinates to be measured from the top side. The next two lines IMAGE=... and ID=... describe the corresponding image name and image id. Finally, the line SCALE=... specifies the image scale expressed in pixels/cm, which enables the conversion from pixel measurement to real-world metric measurement.

Additional Metadata

A supplementary CSV file (sparusld_per_specimen_metadata.csv) provides per-specimen metadata with the following columns: image, subset, origin, population, year_sampled, latitude, and longitude. This file enables linking each image to its respective subset, specimen origin, sampling population, and collection coordinates.

Code repository

Using this dataset, we developed a new method for automated landmark detection of Gilthead seabream based on deep learning. Check the details in our github repository.

Files

iso_meta.xml

Files (12.0 GB)

Name Size Download all
md5:273fe8c83f670a24a7592bb8159bb98b
21.5 kB Preview Download
md5:05e34077cbc727c97dac2c555f0ff646
12.0 GB Preview Download
md5:3d45b5c1ac85a63644b56fe6cc8433a4
125.5 kB Preview Download

Additional details

Related works

Is part of
Journal article: 10.3354/aei00294 (DOI)
Journal article: 10.3389/fmars.2021.694627 (DOI)

Funding

Croatian Science Foundation
Enhancing Environmental Performance of Net-Pen Marine Aquaculture HRZZ-IP-2022-10-7232

Dates

Collected
2015/2021
Time period where all specimen samples were collected.

Software

Repository URL
https://github.com/jsaric/sparus-ld
Programming language
Python
Development Status
Active

Biodiversity

Life stage
Adult
Sample size unit
2052
Locality
Eastern Adriatic Sea
Country
Croatia
Species
Gilthead seabream (Sparus aurata)