Conference paper Open Access

Linking provenance with system logs: a context aware information integration and exploration framework for analyzing workflow execution

el Khaldi Ahanach, Elias; Koulouzis, Spiros; Zhao, Zhiming

MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="">
  <controlfield tag="005">20200120164544.0</controlfield>
  <controlfield tag="001">3466766</controlfield>
  <datafield tag="711" ind1=" " ind2=" ">
    <subfield code="d">12-14, June 2019</subfield>
    <subfield code="g">IWSG19</subfield>
    <subfield code="a">11th International Workshop on Science Gateways</subfield>
    <subfield code="c">Ljubljana, Slovenia</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Amsterdam</subfield>
    <subfield code="a">Koulouzis, Spiros</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Amsterdam</subfield>
    <subfield code="0">(orcid)0000-0002-6717-9418</subfield>
    <subfield code="a">Zhao, Zhiming</subfield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">633302</subfield>
    <subfield code="z">md5:20ab8e0bfc5f7dd1bd094fe5fa1b4991</subfield>
    <subfield code="u"></subfield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="y">Conference website</subfield>
    <subfield code="u"></subfield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2019-10-01</subfield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire</subfield>
    <subfield code="o"></subfield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">University of Amsterdam</subfield>
    <subfield code="a">el Khaldi Ahanach, Elias</subfield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Linking provenance with system logs: a context aware information integration and exploration framework for analyzing workflow execution</subfield>
  <datafield tag="536" ind1=" " ind2=" ">
    <subfield code="c">676247</subfield>
    <subfield code="a">A Europe-wide Interoperable Virtual Research Environment to Empower Multidisciplinary Research Communities and Accelerate Innovation and Collaboration</subfield>
  <datafield tag="536" ind1=" " ind2=" ">
    <subfield code="c">654182</subfield>
    <subfield code="a">Environmental Research Infrastructures Providing Shared Solutions for Science and Society</subfield>
  <datafield tag="536" ind1=" " ind2=" ">
    <subfield code="c">643963</subfield>
    <subfield code="a">Software Workbench for Interactive, Time Critical and Highly self-adaptive cloud applications</subfield>
  <datafield tag="536" ind1=" " ind2=" ">
    <subfield code="c">825134</subfield>
    <subfield code="a">smART socIal media eCOsytstem in a blockchaiN Federated environment</subfield>
  <datafield tag="536" ind1=" " ind2=" ">
    <subfield code="c">824068</subfield>
    <subfield code="a">ENVironmental Research Infrastructures building Fair services Accessible for society, Innovation and Research</subfield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u"></subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2"></subfield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;When executing scientific workflows in a distributed&lt;/p&gt;

&lt;p&gt;environment, anomalies of the workflow behavior are often&lt;/p&gt;

&lt;p&gt;caused by a mixture of different issues, e.g., careless design&lt;/p&gt;

&lt;p&gt;of the workflow logic, buggy workflow components, unexpected&lt;/p&gt;

&lt;p&gt;performance bottlenecks or resource failure at the underlying&lt;/p&gt;

&lt;p&gt;infrastructure. The provenance information only defines data&lt;/p&gt;

&lt;p&gt;evolution at the workflow level, which does not have an explicit&lt;/p&gt;

&lt;p&gt;connection with the system logs provided by the underlying&lt;/p&gt;

&lt;p&gt;infrastructure. Analyzing provenance information and apposite&lt;/p&gt;

&lt;p&gt;system metrics requires expertise and a considerable amount of&lt;/p&gt;

&lt;p&gt;manual effort. Moreover, it is often time-consuming to aggregate&lt;/p&gt;

&lt;p&gt;this information and correlate events occurring at different levels&lt;/p&gt;

&lt;p&gt;in the infrastructure. In this paper, we propose an architecture&lt;/p&gt;

&lt;p&gt;to automate the integration among the workflow provenance&lt;/p&gt;

&lt;p&gt;information with the performance information collected from&lt;/p&gt;

&lt;p&gt;infrastructure nodes running workflow tasks. Our architecture&lt;/p&gt;

&lt;p&gt;enables workflow developers or domain scientists to effectively&lt;/p&gt;

&lt;p&gt;browse workflow execution information together with the system&lt;/p&gt;

&lt;p&gt;metrics, and analyze contextual information for possible anomalies.&lt;/p&gt;</subfield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.3466765</subfield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.3466766</subfield>
    <subfield code="2">doi</subfield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">publication</subfield>
    <subfield code="b">conferencepaper</subfield>
