CzechLynx Dataset (v1.0)

Picek, Lukas; Jiřík, Miroslav; Čermák, Vojtěch; Straka, Jakub; Duľa, Martin; Kutal, Miroslav; Belotti, Elisa; Bojda, Michal; Luděk, Bufka; Dvořák, Rostislav; Hrdý, Luboslav; Kocourek, Václav; Labuda, Jiří; Toman, Luděk; Trulík, Vlado; Váňa, Martin; Krausová, Josefa

doi:10.5281/zenodo.17592004

Published June 12, 2025 | Version v1

Dataset Open

CzechLynx Dataset (v1.0)

1. University of West Bohemia
2. National Institute for Research in Computer and Control Sciences
3. Czech Technical University in Prague
4. Mendel University in Brno
5. Friends of the Earth Czech Republic, Carnivore Conservation Programme
6. Czech University of Life Sciences Prague
7. Šumava National Park Administration
8. Friends of the Earth Czech Republic

The CzechLynx dataset includes real camera trap photographs and synthetic samples of the Eurasian lynx (Lynx lynx), organized around three downstream tasks:

individual identification,
pose estimation, and
instance segmentation.

The main part of the dataset, consisting of 39,760 manually verified and labeled camera-trap images, is fixed, whereas the synthetic part, in practice, can be scaled to any size (for simple use, a synthetic subset with a similar number of individuals and images is provided above in the CzechLynx.zip file. The real images span more than 15 years and come from two geographically distinct regions in Central Europe: Southwest Bohemia and the Western Carpathians.

All images are stored in JPEG format (with 90% compression), with metadata provided in a structured CSV file.
To simplify access to the data and support standardized development and evaluation of downstream tasks, the Images and Metadata are distributed in a single zip file, even though not all components are required for every task. Instead of maintaining separate annotation files for each downstream task, a single shared CSV file with all annotations and necessary information is provided.

Summary of data sources

Source	# Images	# Observations	# Individuals	Sites	Localities	Period
FoE CZ – The Western Carpathians	17,997	9,753	95	361	39	2009 – 2025
FoE CZ – Southwest Bohemia	6,822	1,957	102	79	32	2015 – 2023
Šumava National Park Administration	14,941	7,072	169	219	27	2016 – 2024
Total	39,760	18,782	319	659	86	2009 – 2025

Metadata

Most images in the CzechLynx dataset come with rich metadata that help you understand when, where, and who was captured, and how each image can be used in downstream tasks. For each observation, we include basic provenance (which monitoring project it came from), temporal information (date of capture, how long since the individual was first seen, and an encounter ID for sequences from the same trap), and spatial context (10 × 10 km ETRS89-LAEA grid-cell code, nearest administrative region, trap ID, and centroid GPS coordinates).

On top of that, we provide phenotypic labels (lynx coat pattern), computer-vision annotations (instance segmentation masks and 2D pose keypoints), and flags for predefined dataset splits (geo-aware, time-aware open/closed, and pose splits). Together, these fields make it easy to filter and group images by identity, time, space, appearance, or benchmark split, so you can quickly set up reproducible experiments for re-identification, pose estimation, and segmentation.

Metadata	Description
Source	Data provider. The string `foe_carpaths`, `foe_bohemia`, or `snpa` corresponds to FoE CZ – The Western Carpathians, FoE CZ – Southwest Bohemia, and Šumava National Park Administration sources, respectively.
Unique name	Unique identification of Lynx lynx individual. The format is `lynx_<integer>`.
Path	Relative path to the file in the dataset.
Date	Date when the animal was observed in `yyyy-mm-dd` format.
Relative age	Relative age derived from the difference between the actual date and the first observation of the individual in the dataset.
Encounter	ID of a unique sequence of images in the same camera trap location.
Coat pattern	Describes lynx’s coat pattern with values `marbled` and `spotted`.
Latitude, Longitude	WGS84 coordinates of the center of the 10×10 km grid cell containing the observation.
Cell code	10×10 km grid‐cell identifier in the ETRS89-LAEA (EPSG:3035) pan-European coordinate system. Each entry has the form `10kmE<easting_index>N<northing_index>.`
Location	Unique location identifier. The closest geopolitical region to the center of the 10×10 km cell.
Trap ID	Unique identification of the camera trap. There may be multiple in each grid cell.
Geo-aware split	Train/test split. Distinct populations belong to one or the other.
Time-open split	Train/test split. Individuals unseen in the train split are included in the test split.
Time-closed split	Train/test split. All individuals are included in both the training and test subsets.
Pose split	Train/test split. Empty if the image is not used for pose estimation.
Mask	Pixel-level instance segmentation mask, stored as a COCO-style RLE.
Pose	2D pose annotation, with up to 20 visible keypoints per individual, stored as a dict `{<keypoint_name>: [x, y]};` empty if no pose annotation is available.

Task-specific subsets

The CzechLynx dataset is organized into three subsets tailored for: (i) Individual re-identification, (ii) Animal pose estimation, and (iii) Instance segmentation. Each subset largely overlaps but differs by inclusion criteria and annotation detail. For individual identification and instance segmentation, the same set
of images with clearly visible coat patterns, for which human experts confirm the identity, is provided. Each image is paired with an identity label and a pixel-level mask outlining the lynx body, which enables the training of segmentation models while providing suitable input for re-identification. The pose estimation part is a subset of the identification/segmentation images and is smaller due to the labor-intensive annotation process.

Predefined Splits

To support robust evaluation under real-world constraints, CzechLynx provides three distinct splits:

Geo-aware-open: Train on Carpathians, test on southwest Bohemia (disjoint individuals).
Time-aware-open: Train on the earlier period, test on the later period with some unseen individuals.
Time-aware-closed: Train/test split by time, all identities in test appear in training.

Split	Training images	Test images	Training identities	Test identities	Training sites	Test sites	Training locations	Test locations
Geo-aware-open	21,763	17,997	224	95	298	361	47	39
Time-aware-open	27,587	12,173	275	126	565	313	82	63
Time-aware-closed	27,836	11,924	319	319	603	464	83	77

Files

CzechLynx-Synthetic.zip

Files (12.1 GB)

Name	Size	Download all
CzechLynx-Synthetic.zip md5:8f48d6078e80ce287164aca2d0250b8b	3.5 GB	Preview Download
CzechLynx.zip md5:3a2002ffa6a17b1e60e34c9ced2e7598	7.6 GB	Preview Download
CzechLynxDataset-Metadata-Real.csv md5:8d1d0933aaddae8d7a2491a8c70abcd6	110.0 MB	Preview Download
CzechLynxDataset-Metadata-Synthetic.csv md5:d71f94829ef83e2a21614010494c36e1	81.9 MB	Preview Download
synthetic-meshes.zip md5:e5aba5c6eb346075b8568f1151582347	819.6 MB	Preview Download

Additional details

Alternative title: CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx

arXiv: arXiv:2506.04931

Repository URL: https://github.com/WildlifeDatasets
Programming language: Python
Development Status: Active

	All versions	This version
Views	80	80
Downloads	33	33
Data volume	108.0 GB	108.0 GB

Summary of data sources

Metadata

Task-specific subsets

Predefined Splits

CzechLynx-Synthetic.zip

Files (12.1 GB)

Additional titles

Identifiers

Software

CzechLynx Dataset (v1.0)

Authors/Creators

Description

Summary of data sources

Metadata

Task-specific subsets

Predefined Splits

Files

CzechLynx-Synthetic.zip

Files (12.1 GB)

Additional details

Additional titles

Identifiers

Software