Conference paper Open Access

Initial Encryption of large Searchable Data Sets using Hadoop

Wang; Kohler; Schaad

With the introduction and the widely use of external hosted infrastructures, secure storage of sensitive data becomes more and more important. There are systems available to store and query encrypted data in a database, but not all applications may start with empty tables rather than having sets of legacy data. Hence, there is a need to transform existing plaintext databases to encrypted form. Usually existing enterprise databases may contain terabytes of data. A single machine would require many months for the initial encryption of a large data set. We propose encrypting data in parallel using a Hadoop cluster which is a simple five step process including the Hadoop set up, target preparation, source data import, encrypting the data, and finally exporting it to the target. We evaluated our solution on real world data and report on performance and data consumption. The results show that encrypting data in parallel can be done in a very scalable manner. Using a parallelized encryption cluster compared to a single server machine reduces the encryption time from months down to days or even hours. 

Files (150.8 kB)
Name Size
Initial encryption of large searchable data sets using Hadoop (Demo Paper).pdf
md5:7d9d319b9b0a7b43730c8cff0d78da04
150.8 kB Download
42
99
views
downloads
Views 42
Downloads 99
Data volume 14.9 MB
Unique views 41
Unique downloads 95

Share

Cite as