Handling complex missing data problems in time series, why R? Conference 2020, Warsaw, Poland

10.5281/zenodo.4054594 https://zenodo.org/records/4054594 oai:zenodo.org:4054594 Moritz, Steffen Steffen Moritz 0000-0002-0085-1804 Institute for Data Science, Engineering, and Analytics, Technische Hochschule Köln Bartz-Beielstein, Thomas Thomas Bartz-Beielstein 0000-0002-5938-5158 Institute for Data Science, Engineering, and Analytics, Technische Hochschule Köln Handling complex missing data problems in time series, why R? Conference 2020, Warsaw, Poland Zenodo 2020 Time Series Time Series Imputation Missing Data Preprocessing Imputation 2020-09-26 2020-09-28 eng Presentation 10.5281/zenodo.4054593 Creative Commons Attribution 4.0 International Presentation 'Handling complex missing data problems in time series' at why R? Conference in Warsaw, Poland. Abstract: Missing data is a common problem in time series. As an example, when sensors are used for data recording, missing values can be caused by multiple issues. There can be problems with the data recording itself (e.g. defect sensors), with the data transmission (e.g. internet outages) or with the data processing (e.g. faulty program code). These missing values often complicate further processing and analysis steps. Replacing the missing values with reasonable values (‘imputation’) is one way to mitigate this problem. Hereby it is crucial to choose the right algorithm for the data at hand (as it is for most machine learning related tasks). Sometimes the solution for these time series missing data problems is surprisingly easy and a simple linear interpolation will already give reasonably good results. This is often the case, with short gaps (only few successive NAs) in relative to the measuring interval slow-moving processes. E.g. the water temperature in a big lake won’t change significantly from one minute to another. Additionally, these changes will happen without big offsets in a very continuous way. But there are also more complex cases: long periods of missing data, fast-moving processes, noncontinuous changes, strong periodicities ,and seasonalities. In these cases, a simple interpolation usually won’t provide good imputation results. This talk looks at how these problems can be approached for (univariate) time series and how the imputeTS R package can help here. The imputeTS package offers several different imputation functions for (univariate) time series. Some of the more advanced functions the package provides like ‘Seasonally Decomposed Imputation’ or ‘Kalman Smoothing on Structural Time Series Models’ can be good choices for these more complex imputation problems. The Goal of the talk is to give a short intro into imputeTS and its usage for handling missing data problems that are not straightforward to solve. Keywords: Time Series Imputation, Imputation, Time Series, Missing Data, Preprocessing