Leveraging Statistical Analysis to Develop Classification Labels for Time Series Data

Howard, Erin; Covey, Kevin R.; Davenport, James R. A.

doi:10.5281/zenodo.5151000

Published August 1, 2021 | Version v1

Poster Open

Leveraging Statistical Analysis to Develop Classification Labels for Time Series Data

1. Western Washington University
2. University of Washington

Developing effective machine learning training labels can be a time-consuming task if done manually, particularly for a classification that has many representations. This manual classification task takes even longer when only a small portion of the data contains the desired class. Utilizing statistical analysis to pre-classify a portion of the data can potentially lessen the burden of manual classification and provide more effective scientific outcomes than manual classification alone. However, this method has not been heavily utilized in research. Using data from the Transiting Exoplanet Survey Satellite (TESS) as a use case, we applied a binary statistical classification to ~275,000 light curves (LCs) to determine whether each curve contains an eclipse. This gave us 6,000 LCs to classify manually, which resulted in a label set that included a balanced variety of eclipses—labeled as an eclipsing binary (EB)—and non-eclipses—labeled non-EBs. The statistical analysis completed its classification of over 275k LCs in less than ten hours, reducing time spent manually classifying.

Files

Leveraging Statistical Analysis to Develop Classification Labels for Time Series Data-Poster.pdf

Files (770.7 kB)

Name	Size	Download all
Leveraging Statistical Analysis to Develop Classification Labels for Time Series Data-Poster.pdf md5:894665d04c38773b0eeb8f4e71b92318	770.7 kB	Preview Download

Additional details

U.S. National Science Foundation
Sustainable Diversity in the Computing Research Pipeline 1246649

	All versions	This version
Views	210	209
Downloads	125	125
Data volume	101.7 MB	101.7 MB

Leveraging Statistical Analysis to Develop Classification Labels for Time Series Data

Creators

Description

Files

Leveraging Statistical Analysis to Develop Classification Labels for Time Series Data-Poster.pdf

Files (770.7 kB)

Additional details

Funding