Conference paper Open Access

Initial Encryption of large Searchable Data Sets using Hadoop

Wang; Kohler; Schaad

With the introduction and the widely use of external hosted infrastructures, secure storage of sensitive data becomes more and more important. There are systems available to store and query encrypted data in a database, but not all applications may start with empty tables rather than having sets of legacy data. Hence, there is a need to transform existing plaintext databases to encrypted form. Usually existing enterprise databases may contain terabytes of data. A single machine would require many months for the initial encryption of a large data set. We propose encrypting data in parallel using a Hadoop cluster which is a simple five step process including the Hadoop set up, target preparation, source data import, encrypting the data, and finally exporting it to the target. We evaluated our solution on real world data and report on performance and data consumption. The results show that encrypting data in parallel can be done in a very scalable manner. Using a parallelized encryption cluster compared to a single server machine reduces the encryption time from months down to days or even hours. 

Files (150.8 kB)
Name Size
Initial encryption of large searchable data sets using Hadoop (Demo Paper).pdf
150.8 kB Download
Views 58
Downloads 232
Data volume 35.0 MB
Unique views 55
Unique downloads 224


Cite as