Published August 1, 2021 | Version v1
Poster Open

Leveraging Statistical Analysis to Develop Classification Labels for Time Series Data

  • 1. Western Washington University
  • 2. University of Washington

Description

Developing effective machine learning training labels can be a time-consuming task if done manually, particularly for a classification that has many representations. This manual classification task takes even longer when only a small portion of the data contains the desired class. Utilizing statistical analysis to pre-classify a portion of the data can potentially lessen the burden of manual classification and provide more effective scientific outcomes than manual classification alone. However, this method has not been heavily utilized in research. Using data from the Transiting Exoplanet Survey Satellite (TESS) as a use case, we applied a binary statistical classification to ~275,000 light curves (LCs) to determine whether each curve contains an eclipse. This gave us 6,000 LCs to classify manually, which resulted in a label set that included a balanced variety of eclipses—labeled as an eclipsing binary (EB)—and non-eclipses—labeled non-EBs. The statistical analysis completed its classification of over 275k LCs in less than ten hours, reducing time spent manually classifying.

Files

Leveraging Statistical Analysis to Develop Classification Labels for Time Series Data-Poster.pdf

Additional details

Funding

Sustainable Diversity in the Computing Research Pipeline 1246649
National Science Foundation