CzechLynx Dataset (v1.0)
Authors/Creators
-
Picek, Lukas
(Project leader)1, 2
-
Jiřík, Miroslav
(Researcher)1
-
Čermák, Vojtěch
(Researcher)3
-
Straka, Jakub
(Researcher)1
-
Duľa, Martin
(Data collector)4
-
Kutal, Miroslav
(Data collector)4, 5
- Belotti, Elisa (Data manager)6, 7
- Bojda, Michal4, 8
- Luděk, Bufka (Data manager)7
- Dvořák, Rostislav (Data collector)8
- Hrdý, Luboslav (Data collector)8
- Kocourek, Václav (Data collector)8
- Labuda, Jiří (Data collector)4, 8
- Toman, Luděk (Data collector)8
- Trulík, Vlado (Data collector)8
- Váňa, Martin (Data collector)8
- Krausová, Josefa (Data manager)8, 7
-
1.
University of West Bohemia
-
2.
National Institute for Research in Computer and Control Sciences
-
3.
Czech Technical University in Prague
-
4.
Mendel University in Brno
- 5. Friends of the Earth Czech Republic, Carnivore Conservation Programme
-
6.
Czech University of Life Sciences Prague
- 7. Šumava National Park Administration
-
8.
Friends of the Earth Czech Republic
Description
The CzechLynx dataset includes real camera trap photographs and synthetic samples of the Eurasian lynx (Lynx lynx), organized around three downstream tasks:
- individual identification,
- pose estimation, and
- instance segmentation.
The main part of the dataset, consisting of 39,760 manually verified and labeled camera-trap images, is fixed, whereas the synthetic part, in practice, can be scaled to any size (for simple use, a synthetic subset with a similar number of individuals and images is provided above in the CzechLynx.zip file. The real images span more than 15 years and come from two geographically distinct regions in Central Europe: Southwest Bohemia and the Western Carpathians.
All images are stored in JPEG format (with 90% compression), with metadata provided in a structured CSV file.
To simplify access to the data and support standardized development and evaluation of downstream tasks, the Images and Metadata are distributed in a single zip file, even though not all components are required for every task. Instead of maintaining separate annotation files for each downstream task, a single shared CSV file with all annotations and necessary information is provided.
Summary of data sources
| Source | # Images | # Observations | # Individuals | Sites | Localities | Period |
| FoE CZ – The Western Carpathians | 17,997 | 9,753 | 95 | 361 | 39 | 2009 – 2025 |
| FoE CZ – Southwest Bohemia | 6,822 | 1,957 | 102 | 79 | 32 | 2015 – 2023 |
| Šumava National Park Administration | 14,941 | 7,072 | 169 | 219 | 27 | 2016 – 2024 |
| Total | 39,760 | 18,782 | 319 | 659 | 86 | 2009 – 2025 |
Metadata
Most images in the CzechLynx dataset come with rich metadata that help you understand when, where, and who was captured, and how each image can be used in downstream tasks. For each observation, we include basic provenance (which monitoring project it came from), temporal information (date of capture, how long since the individual was first seen, and an encounter ID for sequences from the same trap), and spatial context (10 × 10 km ETRS89-LAEA grid-cell code, nearest administrative region, trap ID, and centroid GPS coordinates).
On top of that, we provide phenotypic labels (lynx coat pattern), computer-vision annotations (instance segmentation masks and 2D pose keypoints), and flags for predefined dataset splits (geo-aware, time-aware open/closed, and pose splits). Together, these fields make it easy to filter and group images by identity, time, space, appearance, or benchmark split, so you can quickly set up reproducible experiments for re-identification, pose estimation, and segmentation.
| Metadata | Description |
| Source | Data provider. The string foe_carpaths, foe_bohemia, or snpa corresponds to FoE CZ – The Western Carpathians, FoE CZ – Southwest Bohemia, and Šumava National Park Administration sources, respectively. |
| Unique name | Unique identification of Lynx lynx individual. The format is lynx_<integer>. |
| Path | Relative path to the file in the dataset. |
| Date | Date when the animal was observed in yyyy-mm-dd format. |
| Relative age | Relative age derived from the difference between the actual date and the first observation of the individual in the dataset. |
| Encounter | ID of a unique sequence of images in the same camera trap location. |
| Coat pattern | Describes lynx’s coat pattern with values marbled and spotted. |
| Latitude, Longitude | WGS84 coordinates of the center of the 10×10 km grid cell containing the observation. |
| Cell code | 10×10 km grid‐cell identifier in the ETRS89-LAEA (EPSG:3035) pan-European coordinate system. Each entry has the form 10kmE<easting_index>N<northing_index>. |
| Location | Unique location identifier. The closest geopolitical region to the center of the 10×10 km cell. |
| Trap ID | Unique identification of the camera trap. There may be multiple in each grid cell. |
| Geo-aware split | Train/test split. Distinct populations belong to one or the other. |
| Time-open split | Train/test split. Individuals unseen in the train split are included in the test split. |
| Time-closed split | Train/test split. All individuals are included in both the training and test subsets. |
| Pose split | Train/test split. Empty if the image is not used for pose estimation. |
| Mask | Pixel-level instance segmentation mask, stored as a COCO-style RLE. |
| Pose | 2D pose annotation, with up to 20 visible keypoints per individual, stored as a dict {<keypoint_name>: [x, y]}; empty if no pose annotation is available. |
Task-specific subsets
The CzechLynx dataset is organized into three subsets tailored for: (i) Individual re-identification, (ii) Animal pose estimation, and (iii) Instance segmentation. Each subset largely overlaps but differs by inclusion criteria and annotation detail. For individual identification and instance segmentation, the same set
of images with clearly visible coat patterns, for which human experts confirm the identity, is provided. Each image is paired with an identity label and a pixel-level mask outlining the lynx body, which enables the training of segmentation models while providing suitable input for re-identification. The pose estimation part is a subset of the identification/segmentation images and is smaller due to the labor-intensive annotation process.
Predefined Splits
To support robust evaluation under real-world constraints, CzechLynx provides three distinct splits:
- Geo-aware-open: Train on Carpathians, test on southwest Bohemia (disjoint individuals).
- Time-aware-open: Train on the earlier period, test on the later period with some unseen individuals.
- Time-aware-closed: Train/test split by time, all identities in test appear in training.
| Split |
Training images |
Test images |
Training identities |
Test identities |
Training sites |
Test sites |
Training locations |
Test locations |
| Geo-aware-open | 21,763 | 17,997 | 224 | 95 | 298 | 361 | 47 | 39 |
| Time-aware-open | 27,587 | 12,173 | 275 | 126 | 565 | 313 | 82 | 63 |
| Time-aware-closed | 27,836 | 11,924 | 319 | 319 | 603 | 464 | 83 | 77 |
Files
CzechLynx-Synthetic.zip
Files
(12.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:8f48d6078e80ce287164aca2d0250b8b
|
3.5 GB | Preview Download |
|
md5:3a2002ffa6a17b1e60e34c9ced2e7598
|
7.6 GB | Preview Download |
|
md5:8d1d0933aaddae8d7a2491a8c70abcd6
|
110.0 MB | Preview Download |
|
md5:d71f94829ef83e2a21614010494c36e1
|
81.9 MB | Preview Download |
|
md5:e5aba5c6eb346075b8568f1151582347
|
819.6 MB | Preview Download |
Additional details
Additional titles
- Alternative title
- CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx
Identifiers
- arXiv
- arXiv:2506.04931
Software
- Repository URL
- https://github.com/WildlifeDatasets
- Programming language
- Python
- Development Status
- Active