Published October 23, 2017 | Version v1
Report Open

Anomaly detection for database connections

  • 1. CERN openlab summer student
  • 2. CERN

Description

The project investigates the nature of database connection logs by analysing these logs for potential anomalies. We apply different models to the data in building an ensemble of classifiers that is able to flag potentially anomalous or malicious connections to the database instances within the CERN network.  Further, we also utilise this research to shed light on usage patterns within the network in order to better understand the temporal dependencies and implement them in the decision-making process within the CERN system.

These models are trained on subsets of a data lake that comprises daily connection logs across all instances of databases on the network. The data lake comprises Javascript Object Notation (JSON) logs that may be visualised using short-term storage, Elasticsearch, Grafana and Kibana or pushed to long-term storage on Hadoop Distributed File Storage (HDFS). We extract subsets of this data and apply models that vary based on different parameters of the data, and enable us to classify the outliers among the dataset as anomalies. The approaches adopted vary based on distance, density, and classification policies. Further, we juxtapose the results of the models against each other in order to understand the distribution of anomalies and eliminate false positives occurring across different  evaluative models.

Files

report_Swapneel_Mehta.pdf

Files (1.5 MB)

Name Size Download all
md5:3b7567bfb16d660ec9dd824b58f06569
1.5 MB Preview Download