Presentation Open Access

Handling complex missing data problems in time series, why R? Conference 2020, Warsaw, Poland

Moritz, Steffen; Bartz-Beielstein, Thomas


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="DOI">10.5281/zenodo.4054594</identifier>
  <creators>
    <creator>
      <creatorName>Moritz, Steffen</creatorName>
      <givenName>Steffen</givenName>
      <familyName>Moritz</familyName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-0085-1804</nameIdentifier>
      <affiliation>Institute for Data Science, Engineering, and Analytics, Technische Hochschule Köln</affiliation>
    </creator>
    <creator>
      <creatorName>Bartz-Beielstein, Thomas</creatorName>
      <givenName>Thomas</givenName>
      <familyName>Bartz-Beielstein</familyName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-5938-5158</nameIdentifier>
      <affiliation>Institute for Data Science, Engineering, and Analytics, Technische Hochschule Köln</affiliation>
    </creator>
  </creators>
  <titles>
    <title>Handling complex missing data problems in time series, why R? Conference 2020, Warsaw, Poland</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2020</publicationYear>
  <subjects>
    <subject>Time Series</subject>
    <subject>Time Series Imputation</subject>
    <subject>Missing Data</subject>
    <subject>Preprocessing</subject>
    <subject>Imputation</subject>
  </subjects>
  <dates>
    <date dateType="Issued">2020-09-26</date>
  </dates>
  <language>en</language>
  <resourceType resourceTypeGeneral="Text">Presentation</resourceType>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/4054594</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.4054593</relatedIdentifier>
  </relatedIdentifiers>
  <rightsList>
    <rights rightsURI="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;Presentation &amp;#39;Handling complex missing data problems in time series&amp;#39; at why R? Conference in&amp;nbsp;Warsaw, Poland.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Abstract:&lt;/p&gt;

&lt;p&gt;Missing data is a common problem in time series. As an example, when sensors are used for data recording, missing values can be caused by multiple issues. There can be problems with the data recording itself (e.g. defect sensors), with the data transmission (e.g. internet outages) or with the data processing (e.g. faulty program code).&lt;/p&gt;

&lt;p&gt;These missing values often complicate further processing and analysis steps. Replacing the missing values with reasonable values (&amp;lsquo;imputation&amp;rsquo;) is one way to mitigate this problem. Hereby it is crucial to choose the right algorithm for the data at hand (as it is for most machine learning related tasks).&lt;/p&gt;

&lt;p&gt;Sometimes the solution for these time series missing data problems is surprisingly easy and a simple linear interpolation will already give reasonably good results. This is often the case, with short gaps (only few successive NAs) in relative to the measuring interval slow-moving processes. E.g. the water temperature in a big lake won&amp;rsquo;t change significantly from one minute to another. Additionally, these changes will happen without big offsets in a very continuous way.&lt;/p&gt;

&lt;p&gt;But there are also more complex cases: long periods of missing data, fast-moving processes, noncontinuous changes, strong periodicities ,and seasonalities. In these cases, a simple interpolation usually won&amp;rsquo;t provide good imputation results.&lt;/p&gt;

&lt;p&gt;This talk looks at how these problems can be approached for (univariate) time series and how the imputeTS R package can help here. The imputeTS package offers several different imputation functions for (univariate) time series. Some of the more advanced functions the package provides like &amp;lsquo;Seasonally Decomposed Imputation&amp;rsquo; or &amp;lsquo;Kalman Smoothing on Structural Time Series Models&amp;rsquo; can be good choices for these more complex imputation problems.&lt;/p&gt;

&lt;p&gt;The Goal of the talk is to give a short intro into imputeTS and its usage for handling missing data problems that are not straightforward to solve.&lt;/p&gt;

&lt;p&gt;Keywords: Time Series Imputation, Imputation, Time Series, Missing Data, Preprocessing&lt;/p&gt;</description>
    <description descriptionType="Other">{"references": ["Moritz, S., Bartz-Beielstein, T. (2017). imputeTS: time series missing value imputation in R. The R Journal, 9(1), 207-218. doi: https://doi.org/10.32614/RJ-2017-009"]}</description>
  </descriptions>
</resource>
18
18
views
downloads
All versions This version
Views 1818
Downloads 1818
Data volume 250.9 MB250.9 MB
Unique views 1818
Unique downloads 1818

Share

Cite as