LSST light curves for constant and variable sources, and for point-like and extended objects microlensing

Crispim Romão, Miguel; Croon, Djuna; Godines, Daniel

doi:10.5281/zenodo.15005108

Published March 11, 2025 | Version v1

Dataset Open

LSST light curves for constant and variable sources, and for point-like and extended objects microlensing

1. Durham University
2. New Mexico State University

This repository contains the dataset that accompanies the paper Anomaly Detection to Identify Transients in LSST Time Series Data, which should be consulted for further details, along with the artefacts of the trained machine learning models. The dataset was generated using simulated LSST light curves for the Vera C. Rubin Observatory cadence and observational conditions via rubin-sim. It comprises approximately 600 000 light curves designed to detect various transient events, including microlensing signals and variable stars, as well as non-variable signal-less sources used to train the anomaly detection model.

The dataset includes six distinct classes: Constant (non-variable signal-less sources), RR Lyrae variables, Point-like Microlensing (ML), Binary Microlensing (Binary ML), Boson Stars (BS), and NFW Subhalos (NFW). The total number of simulated light curves for each class is as follows:

BS: 320 494
Binary ML: 84 022
ML: 53 565
RR Lyrae: 49 573
NFW: 47 837
Constant: 41 522

The light curves incorporate rubin-sim noise simulation and the LSST 10-year baseline cadence strategy (v2.0). Light curves for Constant, variable, and point-like microlensing events were simulated using MicroLIA, while binary microlensing events were generated using pyLIMA. Light curves for the BS and NFW objects were simulated using the code from this work.

The dataset contains 182 columns covering simulation and generation parameters, observable time series features, the time series itself, and the predictions from the machine learning models used in the paper. The columns are organised by type using prefixes and suffixes:

'timestamps', 'mag', 'magerr': Light curve data.
'gen': Generation parameters (metadata).
'sim': Simulation parameters (metadata).
'feature_' prefix: Features extracted from the light curve and its derivative, marked with the suffix 'deriv'.
'iforest_output': iForest anomaly score.
'pred_': Probabilities and class prediction for the multiclass classifier.

The dataset is provided in 'parquet' format, accessible in Python via 'pandas' by installing the 'parquet' optional dependency (i.e., pip install pandas[parquet]).

The artefacts were generated in Python 3.9.21 using scikit-learn 1.4.1. The imputer_train.pkl file is required to impute missing values before predicting with the iForest model (final_isolation_forest_model.pkl), as it does not handle missing or nan values. The multiclass classifier (classifier.pck) handles missing and nan values directly and was trained without imputed data.

Please cite the paper alongisde the zenodo entry if you use this dataset:

@article{CrispimRomao:2025pyl,
author = "Crispim Romao, Miguel and Croon, Djuna and Godines, Daniel",
title = "{Anomaly Detection to identify Transients in LSST Time Series Data}",
eprint = "2503.09699",
archivePrefix = "arXiv",
primaryClass = "astro-ph.SR",
reportNumber = "IPPP/25/15",
month = "3",
year = "2025"
}

Files

data_header.txt

Files (1.5 GB)

Name	Size	Download all
classifier.pck md5:5a17bc19f54c957d893e2bd59af22e8c	4.5 MB	Download
data_header.txt md5:e55b58ba98ea5ab0fc0e222ef8a5e5c2	4.3 kB	Preview Download
final_isolation_forest_model.pkl md5:ee15234217eef8ca6d478689540a8f79	14.0 MB	Download
imputer_train.pkl md5:63647f36c8f67d4e4dbf3ac5ec02fa27	27.5 MB	Download
processed.parquet md5:643872dcada50354d91c3faed3de886f	1.4 GB	Download

Additional details

Is derived from: Preprint: arXiv:2503.09699 (arXiv)

UK Research and Innovation
Proposal for IPPP (UK National Phenomenology Institute), 2020-2023 ST/T001011/1

	All versions	This version
Views	189	189
Downloads	250	250
Data volume	89.8 GB	89.8 GB

LSST light curves for constant and variable sources, and for point-like and extended objects microlensing

Files

data_header.txt

Files (1.5 GB)

Additional details

Related works

Funding

LSST light curves for constant and variable sources, and for point-like and extended objects microlensing

Creators

Description

Files

data_header.txt

Files (1.5 GB)

Additional details

Related works

Funding