{
  "access": {
    "embargo": {
      "active": false,
      "reason": null
    },
    "files": "public",
    "record": "public",
    "status": "open"
  },
  "created": "2019-11-22T13:32:52.899852+00:00",
  "custom_fields": {},
  "deletion_status": {
    "is_deleted": false,
    "status": "P"
  },
  "files": {
    "count": 1,
    "enabled": true,
    "entries": {
      "Report_Michal_Bien.pdf": {
        "checksum": "md5:f4ba7f91816350a8bcd872a4ba138c71",
        "ext": "pdf",
        "id": "f304b657-d867-4289-b620-55b746f21c72",
        "key": "Report_Michal_Bien.pdf",
        "metadata": null,
        "mimetype": "application/pdf",
        "size": 1943637
      }
    },
    "order": [],
    "total_bytes": 1943637
  },
  "id": "3550777",
  "is_draft": false,
  "is_published": true,
  "links": {
    "access": "https://zenodo.org/api/records/3550777/access",
    "access_grants": "https://zenodo.org/api/records/3550777/access/grants",
    "access_links": "https://zenodo.org/api/records/3550777/access/links",
    "access_request": "https://zenodo.org/api/records/3550777/access/request",
    "access_users": "https://zenodo.org/api/records/3550777/access/users",
    "archive": "https://zenodo.org/api/records/3550777/files-archive",
    "archive_media": "https://zenodo.org/api/records/3550777/media-files-archive",
    "communities": "https://zenodo.org/api/records/3550777/communities",
    "communities-suggestions": "https://zenodo.org/api/records/3550777/communities-suggestions",
    "doi": "https://doi.org/10.5281/zenodo.3550777",
    "draft": "https://zenodo.org/api/records/3550777/draft",
    "files": "https://zenodo.org/api/records/3550777/files",
    "latest": "https://zenodo.org/api/records/3550777/versions/latest",
    "latest_html": "https://zenodo.org/records/3550777/latest",
    "media_files": "https://zenodo.org/api/records/3550777/media-files",
    "parent": "https://zenodo.org/api/records/3550776",
    "parent_doi": "https://zenodo.org/doi/10.5281/zenodo.3550776",
    "parent_html": "https://zenodo.org/records/3550776",
    "requests": "https://zenodo.org/api/records/3550777/requests",
    "reserve_doi": "https://zenodo.org/api/records/3550777/draft/pids/doi",
    "self": "https://zenodo.org/api/records/3550777",
    "self_doi": "https://zenodo.org/doi/10.5281/zenodo.3550777",
    "self_html": "https://zenodo.org/records/3550777",
    "self_iiif_manifest": "https://zenodo.org/api/iiif/record:3550777/manifest",
    "self_iiif_sequence": "https://zenodo.org/api/iiif/record:3550777/sequence/default",
    "versions": "https://zenodo.org/api/records/3550777/versions"
  },
  "media_files": {
    "count": 0,
    "enabled": false,
    "entries": {},
    "order": [],
    "total_bytes": 0
  },
  "metadata": {
    "creators": [
      {
        "person_or_org": {
          "family_name": "Micha\u0142 Bie\u0144",
          "name": "Micha\u0142 Bie\u0144",
          "type": "personal"
        }
      }
    ],
    "description": "<p>This work has successfully deployed two different use cases of interest for High Energy Physics&nbsp;<br>\nusing cloud resources:&nbsp;<br>\n\uf0b7 CMS Big data reduction: This use case consists in running a data reduction workloads for&nbsp;<br>\nphysics data. The code and implementation has originally been developed by CERN openlab&nbsp;<br>\nin collaboration with CMS and Intel in 2017-2018. It aims at demonstrating the scalability of a&nbsp;<br>\ndata reduction workflow, by processing ROOT files using Apache Spark&nbsp;<br>\n\uf0b7 Spark DL Trigger: This use case consists in the deployment of a full data preparation and&nbsp;<br>\nmachine learning pipeline, starting from data ingestion (4.5 TB of ROOT data), to the training&nbsp;<br>\nof classifier using neural networks. This use case is implemented using Apache Spark and&nbsp;<br>\nthe Keras API, following previous work in collaboration with CERN openlab.&nbsp;<br>\nResources for this work have been deployed using Oracle Cloud Infrastructure (OCI). In particular&nbsp;<br>\nthis project has allowed to complete:&nbsp;<br>\n\uf0b7 Setup of the project using Oracle Container Engine for Kubernetes and Oracle Cloud&nbsp;<br>\nresources&nbsp;<br>\n\uf0b7 Troubleshooting of the oci-hdfs-connector to run Apache Spark at scale on OCI Object&nbsp;<br>\nStorage&nbsp;<br>\n\uf0b7 Measurements of OCI Object Storage performance for the selected use cases&nbsp;<br>\n\uf0b7 Investigations and performance measurements of the resource utilisation on Oracle&nbsp;<br>\nContainer Engine for Kubernetes (OKE), when running the TensorFlow/Keras neural network&nbsp;<br>\nmodel training at scale, using CPU resources, and when using GPU.&nbsp;<br>\nNotable results of this project:&nbsp;<br>\n\uf0b7 Produced several key improvements to the oci-hdfs-connector. The improvements are&nbsp;<br>\nnecessary to run the latest Spark version (Spark 2.4.x) on Oracle Cloud. The connector is&nbsp;<br>\ndistributed by Oracle with open source licensing, and the improvements will be fed back to&nbsp;<br>\nOracle.&nbsp;<br>\n\uf0b7 Improved instrumentation infrastructure for measuring Spark workloads on cloud resources,&nbsp;<br>\nby streamlining the deployment of Spark performance dashboard on Kubernetes and&nbsp;<br>\ndeveloping a Helm chart&nbsp;<br>\n\uf0b7 Produced a solution for direct measurement of I/O latency for Spark workloads reading from&nbsp;<br>\nOCI or S3 storage. The results are of general interest for Spark users, notably including the&nbsp;<br>\nSpark service at CERN&nbsp;<br>\n\uf0b7 Developed methods to parallelize TensorFlow/Keras on Kubernetes using TensorFlow 2.0&nbsp;<br>\nnew tf.distribute features. These are of general interest for ML practitioners, notably including&nbsp;<br>\nthe users of CERN cloud services.</p>",
    "publication_date": "2019-11-22",
    "publisher": "Zenodo",
    "resource_type": {
      "id": "publication-report",
      "title": {
        "de": "Bericht",
        "en": "Report"
      }
    },
    "rights": [
      {
        "description": {
          "en": "The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited."
        },
        "icon": "cc-by-icon",
        "id": "cc-by-4.0",
        "props": {
          "scheme": "spdx",
          "url": "https://creativecommons.org/licenses/by/4.0/legalcode"
        },
        "title": {
          "en": "Creative Commons Attribution 4.0 International"
        }
      }
    ],
    "subjects": [
      {
        "subject": "CERN openlab"
      },
      {
        "subject": "summer-student programme"
      }
    ],
    "title": "Big Data Analysis and Machine Learning  at Scale with Oracle Cloud Infrastructure"
  },
  "parent": {
    "access": {
      "owned_by": {
        "user": "31739"
      }
    },
    "communities": {
      "default": "ef357ed7-957c-47d5-8bde-96876ebedc7b",
      "entries": [
        {
          "access": {
            "member_policy": "open",
            "members_visibility": "public",
            "record_policy": "open",
            "review_policy": "open",
            "visibility": "public"
          },
          "children": {
            "allow": false
          },
          "created": "2013-09-23T10:06:38+00:00",
          "custom_fields": {},
          "deletion_status": {
            "is_deleted": false,
            "status": "P"
          },
          "id": "ef357ed7-957c-47d5-8bde-96876ebedc7b",
          "links": {},
          "metadata": {
            "curation_policy": "<p>New uploads in this community must contain publicly distributable material related to the activities of the CERN openlab and its partners</p>",
            "description": "CERN openlab is a unique public-private partnership between CERN and leading ICT companies. Its mission is to accelerate the development of cutting-edge solutions to be used by the worldwide LHC community.",
            "organizations": [
              {
                "id": "01ggx4157"
              }
            ],
            "page": "<p>CERN openlab is a unique public-private partnership between CERN and leading ICT companies. Its mission is to accelerate the development of cutting-edge solutions to be used by the worldwide LHC community. This ZENODO community contains open access material published by the CERN openlab, such as reports, white papers, presentations, videos, etc.</p>",
            "title": "CERN openlab",
            "website": "https://openlab.cern"
          },
          "revision_id": 2,
          "slug": "cernopenlab",
          "updated": "2023-12-08T15:10:25.673707+00:00"
        }
      ],
      "ids": [
        "ef357ed7-957c-47d5-8bde-96876ebedc7b"
      ]
    },
    "id": "3550776",
    "pids": {
      "doi": {
        "client": "datacite",
        "identifier": "10.5281/zenodo.3550776",
        "provider": "datacite"
      }
    }
  },
  "pids": {
    "doi": {
      "client": "datacite",
      "identifier": "10.5281/zenodo.3550777",
      "provider": "datacite"
    },
    "oai": {
      "identifier": "oai:zenodo.org:3550777",
      "provider": "oai"
    }
  },
  "revision_id": 4,
  "stats": {
    "all_versions": {
      "data_volume": 1220604036.0,
      "downloads": 628,
      "unique_downloads": 594,
      "unique_views": 690,
      "views": 734
    },
    "this_version": {
      "data_volume": 1220604036.0,
      "downloads": 628,
      "unique_downloads": 594,
      "unique_views": 689,
      "views": 733
    }
  },
  "status": "published",
  "updated": "2023-07-14T08:44:47.072950+00:00",
  "versions": {
    "index": 1,
    "is_latest": true
  }
}