Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published September 1, 2015 | Version v1
Report Open

CMS Data-Services Ingestion into CERN’s Hadoop Big Data Analytics Infrastructure

  • 1. CERN openlab Summer Student
  • 2. Summer Student Supervisor

Description

Abstract

This document introduces a new data ingestion framework called HLoader, built around Apache Sqoop to perform data ingestion jobs between RDBMS and Hadoop Distributed File System (HDFS). The HLoader framework deployed as a service inside CERN will be used for CMS Data Popularity ingestion into Hadoop clusters. HLoader could also be used for similar use cases like CMS and ATLAS Job Monitoring, ACCLOG databases, etc. The first part of the report describes the architectural details of HLoader, giving some background information about Apache Sqoop. The rest of the report focuses on the HLoader programming API, and is meant to be an entry point for developers describing how HLoader works, and possible directions of extending the framework in future. 

Files

SummerStudentReport-AnirudhaBose.pdf

Files (973.0 kB)

Name Size Download all
md5:c3d246ce62d1f993d1272b9e5542291a
973.0 kB Preview Download