Published June 12, 2025 | Version v1
Dataset Open

CzechLynx Dataset (v1.0)

  • 1. ROR icon University of West Bohemia
  • 2. EDMO icon National Institute for Research in Computer and Control Sciences
  • 3. ROR icon Czech Technical University in Prague
  • 4. ROR icon Mendel University in Brno
  • 5. Friends of the Earth Czech Republic, Carnivore Conservation Programme
  • 6. ROR icon Czech University of Life Sciences Prague
  • 7. Šumava National Park Administration
  • 8. ROR icon Friends of the Earth Czech Republic

Description

The CzechLynx dataset includes real camera trap photographs and synthetic samples of the Eurasian lynx (Lynx lynx), organized around three downstream tasks:

  1. individual identification, 
  2. pose estimation, and
  3. instance segmentation. 

The main part of the dataset, consisting of 39,760 manually verified and labeled camera-trap images, is fixed, whereas the synthetic part, in practice, can be scaled to any size (for simple use, a synthetic subset with a similar number of individuals and images is provided above in the CzechLynx.zip file. The real images span more than 15 years and come from two geographically distinct regions in Central Europe: Southwest Bohemia and the Western Carpathians.

All images are stored in JPEG format (with 90% compression), with metadata provided in a structured CSV file.
To simplify access to the data and support standardized development and evaluation of downstream tasks, the Images  and Metadata are distributed in a single zip file, even though not all components are required for every task. Instead of maintaining separate annotation files for each downstream task, a single shared CSV file with all annotations and necessary information is provided.

Summary of data sources

Source # Images # Observations # Individuals Sites Localities Period
FoE CZ – The Western Carpathians 17,997  9,753   95 361 39 2009 – 2025
FoE CZ – Southwest Bohemia   6,822  1,957  102   79 32 2015 – 2023
Šumava National Park Administration 14,941  7,072 169 219 27 2016 – 2024
Total 39,760 18,782 319 659 86 2009 – 2025

Metadata

Most images in the CzechLynx dataset come with rich metadata that help you understand when, where, and who was captured, and how each image can be used in downstream tasks. For each observation, we include basic provenance (which monitoring project it came from), temporal information (date of capture, how long since the individual was first seen, and an encounter ID for sequences from the same trap), and spatial context (10 × 10 km ETRS89-LAEA grid-cell code, nearest administrative region, trap ID, and centroid GPS coordinates).

On top of that, we provide phenotypic labels (lynx coat pattern), computer-vision annotations (instance segmentation masks and 2D pose keypoints), and flags for predefined dataset splits (geo-aware, time-aware open/closed, and pose splits). Together, these fields make it easy to filter and group images by identity, time, space, appearance, or benchmark split, so you can quickly set up reproducible experiments for re-identification, pose estimation, and segmentation.

Metadata Description
Source Data provider. The string foe_carpaths, foe_bohemia, or snpa corresponds to FoE CZ – The Western Carpathians, FoE CZ – Southwest Bohemia, and Šumava National Park Administration sources, respectively.
Unique name Unique identification of Lynx lynx individual. The format is lynx_<integer>.
Path Relative path to the file in the dataset.
Date Date when the animal was observed in yyyy-mm-dd format.
Relative age Relative age derived from the difference between the actual date and the first observation of the individual in the dataset.
Encounter ID of a unique sequence of images in the same camera trap location.
Coat pattern Describes lynx’s coat pattern with values marbled and spotted.
Latitude, Longitude WGS84 coordinates of the center of the 10×10 km grid cell containing the observation.
Cell code 10×10 km grid‐cell identifier in the ETRS89-LAEA (EPSG:3035) pan-European coordinate system. Each entry has the form 10kmE<easting_index>N<northing_index>.
Location Unique location identifier. The closest geopolitical region to the center of the 10×10 km cell.
Trap ID Unique identification of the camera trap. There may be multiple in each grid cell.
Geo-aware split Train/test split. Distinct populations belong to one or the other.
Time-open split Train/test split. Individuals unseen in the train split are included in the test split.
Time-closed split Train/test split. All individuals are included in both the training and test subsets.
Pose split Train/test split. Empty if the image is not used for pose estimation.
Mask Pixel-level instance segmentation mask, stored as a COCO-style RLE.
Pose 2D pose annotation, with up to 20 visible keypoints per individual, stored as a dict {<keypoint_name>: [x, y]}; empty if no pose annotation is available.

 

Task-specific subsets

The CzechLynx dataset is organized into three subsets tailored for: (i) Individual re-identification, (ii) Animal pose estimation, and (iii) Instance segmentation. Each subset largely overlaps but differs by inclusion criteria and annotation detail. For individual identification and instance segmentation, the same set
of images with clearly visible coat patterns, for which human experts confirm the identity, is provided. Each image is paired with an identity label and a pixel-level mask outlining the lynx body, which enables the training of segmentation models while providing suitable input for re-identification. The pose estimation part is a subset of the identification/segmentation images and is smaller due to the labor-intensive annotation process.

Predefined Splits

To support robust evaluation under real-world constraints, CzechLynx provides three distinct splits:

  • Geo-aware-open: Train on Carpathians, test on southwest Bohemia (disjoint individuals).
  • Time-aware-open: Train on the earlier period, test on the later period with some unseen individuals.
  • Time-aware-closed: Train/test split by time, all identities in test appear in training.
Split

Training images

Test images

Training identities

Test identities

Training sites

Test sites

Training locations

Test locations

Geo-aware-open 21,763 17,997 224 95 298 361 47 39
Time-aware-open 27,587 12,173 275 126 565 313 82 63
Time-aware-closed 27,836 11,924 319 319   603 464 83 77

 

Files

CzechLynx-Synthetic.zip

Files (12.1 GB)

Name Size Download all
md5:8f48d6078e80ce287164aca2d0250b8b
3.5 GB Preview Download
md5:3a2002ffa6a17b1e60e34c9ced2e7598
7.6 GB Preview Download
md5:8d1d0933aaddae8d7a2491a8c70abcd6
110.0 MB Preview Download
md5:d71f94829ef83e2a21614010494c36e1
81.9 MB Preview Download
md5:e5aba5c6eb346075b8568f1151582347
819.6 MB Preview Download

Additional details

Additional titles

Alternative title
CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx

Identifiers

Software

Repository URL
https://github.com/WildlifeDatasets
Programming language
Python
Development Status
Active