Published January 25, 2024 | Version v1
Dataset Open

Light curves for variable, point-like microlensing, and extended objects microlensing sources with regular cadence and OGLE-II timestamps cadence.

  • 1. Institute for Particle Physics Phenomenology (IPPP)
  • 2. ROR icon Durham University

Description

This is the dataset used in the paper Microlensing signatures of extended dark objects using machine learning, which should be read for more details, and the code repository for the simulation code.

This dataset comprises 600,000 light curves designed for the detection of microlensing events. These curves are categorised into six classes: Cataclysmic Variables (CV), RR Lyrae and Cepheid Variables (VARIABLE), Mira Long-Period variable (LPV), Point-like Microlensing (ML), Boson Stars (BS), and NFW Subhalos (NFW).

The dataset includes simulated light curves for each class, with 100,000 instances per class. ML light curves are simulated using MicroLIA, while BS and NFW light curves were simulated using the respective mass profiles first computed here. Selection criteria, including a minimum magnification of 1.34, were applied to mimic a survey selection. The light curves have magnitudes between 15 and 20, incorporate Gaussian noise, and were generated with two cadence scenarios: OGLE-II timestamps and Regular Daily Cadence. For the extended microlensing sources (BS and NFW), mass profiles depend on a parameter, τₘ, sampled logarithmically from a uniform distribution. The minimal impact parameter, u₀, is sampled differently for each microlensing source class. A total of 148 features are computed for each light curve, encompassing statistics and derivatives of the time series.

The dataset has 189 columns (see 'columns.txt' file), grouped by type identified by a prefix:

  • 'lc' columns: light curve.
  • 'gen' columns: generation parameters (metadata).
  • 'sim' columns: simulation parameters (metadata).
  • 'feat' columns: light curve time series features. See MicroLIA (and respective paper) for more details. Features computed on the derivative time series are marked with suffix 'deriv'.

The dataset files are stored in 'parquet' format, which can be read in python using 'pandas' by installing the 'parquet' optional depence (i.e. 'pip install pandas[parquet]').

Files

columns.txt

Files (15.4 GB)

Name Size Download all
md5:c04d0e99c345bead437e7b25460c76b7
4.1 kB Preview Download
md5:778150045ed8799627e19f0c99fe52eb
7.8 GB Download
md5:9a2ec8c8df318da6107e06ecd42cf3a3
7.6 GB Download

Additional details

Related works

Is derived from
Preprint: arXiv:2402.00107 (arXiv)

Funding

Proposal for IPPP (UK National Phenomenology Institute) ST/T001011/1
UK Research and Innovation

Software

Repository URL
https://gitlab.com/miguel.romao/microlensing-extended-objects-machine-learning
Programming language
Python
Development Status
Active