Presentation Open Access

Handling complex missing data problems in time series, why R? Conference 2020, Warsaw, Poland

Moritz, Steffen; Bartz-Beielstein, Thomas


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nam##2200000uu#4500</leader>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">Moritz, S., Bartz-Beielstein, T. (2017). imputeTS: time series missing value imputation in R. The R Journal, 9(1), 207-218. doi: https://doi.org/10.32614/RJ-2017-009</subfield>
  </datafield>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Time Series</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Time Series Imputation</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Missing Data</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Preprocessing</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Imputation</subfield>
  </datafield>
  <controlfield tag="005">20200928122653.0</controlfield>
  <controlfield tag="001">4054594</controlfield>
  <datafield tag="711" ind1=" " ind2=" ">
    <subfield code="d">24 September - 27 September 2020</subfield>
    <subfield code="a">why R? Conference</subfield>
    <subfield code="c">Warsaw, Poland</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Institute for Data Science, Engineering, and Analytics, Technische Hochschule Köln</subfield>
    <subfield code="0">(orcid)0000-0002-5938-5158</subfield>
    <subfield code="a">Bartz-Beielstein, Thomas</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">13939405</subfield>
    <subfield code="z">md5:3cb4d7549834b6007b336440523d9dd0</subfield>
    <subfield code="u">https://zenodo.org/record/4054594/files/Presentation_Steffen_Moritz_whyR_Warsaw_26_09_2020.pdf</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="y">Conference website</subfield>
    <subfield code="u">https://2020.whyr.pl/</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2020-09-26</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire</subfield>
    <subfield code="o">oai:zenodo.org:4054594</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">Institute for Data Science, Engineering, and Analytics, Technische Hochschule Köln</subfield>
    <subfield code="0">(orcid)0000-0002-0085-1804</subfield>
    <subfield code="a">Moritz, Steffen</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Handling complex missing data problems in time series, why R? Conference 2020, Warsaw, Poland</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;Presentation &amp;#39;Handling complex missing data problems in time series&amp;#39; at why R? Conference in&amp;nbsp;Warsaw, Poland.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Abstract:&lt;/p&gt;

&lt;p&gt;Missing data is a common problem in time series. As an example, when sensors are used for data recording, missing values can be caused by multiple issues. There can be problems with the data recording itself (e.g. defect sensors), with the data transmission (e.g. internet outages) or with the data processing (e.g. faulty program code).&lt;/p&gt;

&lt;p&gt;These missing values often complicate further processing and analysis steps. Replacing the missing values with reasonable values (&amp;lsquo;imputation&amp;rsquo;) is one way to mitigate this problem. Hereby it is crucial to choose the right algorithm for the data at hand (as it is for most machine learning related tasks).&lt;/p&gt;

&lt;p&gt;Sometimes the solution for these time series missing data problems is surprisingly easy and a simple linear interpolation will already give reasonably good results. This is often the case, with short gaps (only few successive NAs) in relative to the measuring interval slow-moving processes. E.g. the water temperature in a big lake won&amp;rsquo;t change significantly from one minute to another. Additionally, these changes will happen without big offsets in a very continuous way.&lt;/p&gt;

&lt;p&gt;But there are also more complex cases: long periods of missing data, fast-moving processes, noncontinuous changes, strong periodicities ,and seasonalities. In these cases, a simple interpolation usually won&amp;rsquo;t provide good imputation results.&lt;/p&gt;

&lt;p&gt;This talk looks at how these problems can be approached for (univariate) time series and how the imputeTS R package can help here. The imputeTS package offers several different imputation functions for (univariate) time series. Some of the more advanced functions the package provides like &amp;lsquo;Seasonally Decomposed Imputation&amp;rsquo; or &amp;lsquo;Kalman Smoothing on Structural Time Series Models&amp;rsquo; can be good choices for these more complex imputation problems.&lt;/p&gt;

&lt;p&gt;The Goal of the talk is to give a short intro into imputeTS and its usage for handling missing data problems that are not straightforward to solve.&lt;/p&gt;

&lt;p&gt;Keywords: Time Series Imputation, Imputation, Time Series, Missing Data, Preprocessing&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.4054593</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.4054594</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">presentation</subfield>
  </datafield>
</record>
18
18
views
downloads
All versions This version
Views 1818
Downloads 1818
Data volume 250.9 MB250.9 MB
Unique views 1818
Unique downloads 1818

Share

Cite as