Published October 29, 2025 | Version v2
Dataset Open

Machine-learning analysis of solar flare light-curve morphology and implications for stellar CME prediction

  • 1. State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, People's Republic of China
  • 2. College of Physics, Guizhou University, Guiyang 550025, People's Republic of China

Description

This repository contains the training and testing datasets, extracted features of solar flares, FGKM-type late-type main-sequence star superflares and Sun-like star superflares, as well as event catalogs, the five trained LR sub-models, and the prediction code.

"train_datasets.zip" and "test_datasets.zip". These files contain the training and testing datasets, respectively. The solar flare events were randomly split into training and testing sets with a ratio of 8:2 using five different random seeds, resulting in five independent subsets. The column names are consistent with the feature extraction descriptions in the paper.

"solar_feature.csv", "FGKM_late_main_sequence_stars_feature.csv" and "sun_like_stars_feature.csv". These files contain the extracted features of solar flares, FGKM-type late-type main-sequence star superflares, and Sun-like star superflares, respectively.

"solar_flare_event_catalog.csv" and "stellar_superflare_catalog.csv". These files represent the catalogs of solar flare events and stellar superflare events. "solar_flare_event_catalog.csv"— SDO/EVE-ESP 0.1-7nm broadband solar flare samples containing 1,156 events. Columns: No. — Event serial number; Start_Time(UTC), Peak_Time(UTC), End_Time(UTC) — Event start, peak, and end times in UTC; ED — Equivalent duration (s); Energy — Bolometric energy (erg); Class_Type — Flare class; CME_Association — 1 = eruptive flare (associated with CME), 0 = confined flare (no CME). "stellar_superflare_catalog.csv" — After quality selection, the final Kepler 30-minutes cadence white-light stellar flare sample contains 17,717 events. Columns: No. — Event serial number; Star_Name — Host star identifier; Start_Time, End_Time — Flare start and end times (BJD_TDB − 2,454,833; days); ED — Equivalent duration (s); Energy — Bolometric energy (erg); CME_Association — 1 = with CME, 0 = without CME; Reference — Source of flare.

"logic_models.zip". This file contains the five trained LR sub-models.

"prediction_code.zip". This file contains the prediction code.

Files

FGKM_late_main_sequence_stars_feature.csv

Files (8.0 MB)

Name Size Download all
md5:b1ad065f0ddda33ec7e1760ba1d90c0c
4.7 MB Preview Download
md5:928454bec58d37cb8a49cd15ae2226af
9.0 kB Preview Download
md5:cd6a7f807e308852b0526aa8ac684d37
1.2 kB Preview Download
md5:eb1a4e07d8fc199ab5d2647f05c61ed9
347.2 kB Preview Download
md5:b786495a22612e084f3ff3da64f75385
92.2 kB Preview Download
md5:339c15fac50704b07b49992a9513186f
1.3 MB Preview Download
md5:257f1609a122beeac05b95c9f71254b5
846.0 kB Preview Download
md5:b191ed82026bde6ae3f305722cd58415
149.5 kB Preview Download
md5:47c89275f82ccd92fdcf10698dc5ddd5
569.5 kB Preview Download