Performance Evaluation of Parallel International Data Encryption Algorithm on IMAN1 Super Computer

Distributed security is an evolving sub-domain of information and network security. Security applications play a serious role when data exchanging, different volumes of data should be transferred from one site to another safely and at high speed. In this paper, the parallel International Data Encryption Algorithm (IDEA) which is one of the security applications is implemented and evaluated in terms of running time, speedup, and efficiency. The parallel IDEA has been implemented using message passing interface (MPI) library, and the results have been conducted using IMAN1 Supercomputer, where a set of simulation runs carried out on different data sizes to define the best number of processor which can be used to manipulate these data sizes and to build a visualization about the processor number that can be used while the size of data increased. The experimental results show a good performance by reducing the running time, and increasing speed up of encryption and decryption processes for parallel IDEA when the number of processors ranges from 2 to 8 with achieved efficiency 97% to 83% respectively. <br>


INTRODUCTION
The communication between different devices on computer networks can be a serious issue which it motivated various researchers to develop secure communication to protect data exchanged between both sender and receiver. They have proposed cryptography algorithms like DES, AES, RC2 [14] [15] and IDEA [1]. There are other issues which focus on improving the speed of cryptography algorithms to reduce end to end delay on networks.
Cryptography is a method of storing and transmitting data in a particular form so that only those for whom it is intended can read and process it. IDEA is one of the ciphers which encrypt the text into an unreadable format and makes it secured in order to send it over to the network. The IDEA encryption algorithm provides high-level security which does not based on keeping the algorithm a secret, but rather upon ignorance of the secret key.
Many algorithms used in the encryption field which they are divided into symmetric and asymmetric encryption approaches. International Data Encryption Algorithm (IDEA) is symmetric secret-key cryptography ciphers algorithm to encrypt and decrypt date which has been Electronic copy available at: https://ssrn.com/abstract=3350418 developed in 1991 by James L. Massey and Xuejia Lai [1]. Originally, this approach is used in different applications such as financial applications and named Improved Proposed Encryption Standard (IPES) that was described by [5].
Parallel and distributed computing systems are high-performance computing systems that spread out a single application over many multi-core and multi-processor computers in order to rapidly complete the task. Parallel and distributed computing systems divide large problems into smaller sub-problems and assign each of them to different processors in a typically distributed system running concurrently in parallel [7][8] [9] [10] [11][12] [13].
In this paper, the parallel international data encryption algorithm (IDEA) is implemented and the performance of it evaluated in term execution time, speed up, and parallel efficiency according to different data size and different number of processors using Message Parallel Interface (MPI) on IMAN1 supercomputer. IMAN1 supercomputer which is Jordan's first and fastest supercomputer. It is available for use by academia and industry in Jordan and the region and provides multiple resources and clusters to run and test High-Performance Computing (HPC) codes. It uses 2260 PlayStation3 devices [7] [16].Our work has two limitations. First, it may yield different results with other programming languages and parallel frameworks. Second, it did not take in account that communication time between processors and processing time of processors can be separately calculated, rather, they were calculated by summing them as a single value.
The rest of this paper is organized as follows; Section 2 presents the background and related works. Section 3 presents the experiments and results, and Section 4 presents the conclusion.

BACKGROUND AND RELATED WORK
IDEA belongs to a class of secret-key cryptosystems which is characterized by the symmetry of encryption and decryption processes, and the possibility of implying the decryption key from the encryption key and vice versa [5---]. In the encryption process, the 64-bit plain text is divided by IDEA into four portions where each sub-blocks with 16 bits (P1 to P4) as shown in Figure 1. These four sub-blocks will proceed through eight rounds. In each round, every 16-bit of four blocks will be manipulated by different six sub-keys of 52 keys of 128-bits cipher key which will be agreed upon by sender and receiver. After completing eight rounds, the output of four blokes can be manipulated by the OUTPUT TRANSFORMATION phase. Moreover, in each round, data produced from previous round (P1 to P4) are input in current round and processed by logical and arithmetic operations with six sub-keys which are assigned to the round. Finally, OUTPUT TRANSFORMATION phase, contains arithmetic operations and just four sub-keys, where the final output produces cipher data (C1 to C4) divided to 16-bits sizes for each block [2] [3].
In the decryption phase, there are different approach used in divide the Key to sub-keys and same techniques used in the encryption process are used. Logical and arithmetic operations of each round are Multiplication modulo 216 +1, Addition modulo 216 and XOR. But, in final phase is Multiplication module 216 +1 and Addition modulo 216 [3]. These operations will be applied on (P1 to P4) blocks by assigned keys. Figure 2 shows IDEA steps of logical and arithmetic operations executed in each round to produce encrypted data to be input of next round. The steps are summarized as following: (1) Multiply X1 and Z1.
(2) Add X2 and Z2.   [2] Electronic copy available at: https://ssrn.com/abstract=3350418 The cipher speed is one of the major functional features in cryptographic techniques, this feature is significantly important when they usually work on huge datasets, there are many studies and researchers striving to increase the speed of encryption techniques using parallelism.
In [16], the authors showed the performance evaluation of the blowfish algorithm in the parallel platform, the algorithm is implemented using the MPI library, and the experiment is performed on an IMAN1 supercomputer. The experimental results showed that the parallel algorithm achieved the best value when the number of processors is 32 for a plaintext size of 160 Mbyte.
The author in [17] used the parallel implementation for the encryption algorithm by using eight Quad-Core (32) Intel Xeon Processors 7310 Series -1.60 GHz and the Intel C++ Compiler, the experimental results showed that the parallel encryption algorithm by multiprocessor from 2 to 32 processors improved the time of the data encryption and decryption.
The authors of [6] presented results of parallelizing the IDEA on 1 to 4 processors. The OpenMP standard was chosen for presenting the parallelism of the algorithm. The efficiency measurement for a parallel program was presented. They did not find the best number of processors to be used for different sizes of data.
In this paper, we implemented the IDEA algorithm on a parallel platform which is different from the above researches in architecture and the number of processors used. This work evaluated the parallel IDEA to find the best number of processors to be used for different sizes of data.

EXPERIMENTS AND RESULTS
In parallel computations, the number of processors should be defined to run concurrently by writing special instructions in a programming language. In this section, Parallel IDEA results are evaluated according to performance in terms of execution time, speedup and efficiency of serial and parallel IDEA.
All experiments obtained by using IMAN1 cluster as hardware is "Dual Quad-Core Intel Xeon CPU with SMP, 16 GB RAM" and Scientific Linux 6.4 [7] with MPI library which implemented by C++ is used in parallel IDEA implementation.
MPI library provides different functions to support distributing data through different available machines to be processed by these machines simultaneously, MPI_Scatterv procedure is one of them which used in our implementation to split data in the same sizes where it responses to allocate each specific size of data to one processor. The IDEA implementation based on portioning data where every processor executes the same code with the same key for all concurrently on data portion which allocated to it. Parallel IDEA evaluated by multiple input sizes (0.09, 0.19, 0.39,0.78, 1.56,3.12, 6.25,12.5, 25,50 MByte) and different numbers of processors (1, 2, 3, 4, 8, 16, 32, 64, 128).
The assumptions in our work as following: first, all processors have same capacity and throughput. Second, execution time for plaintext size on assigned processors is taken from the processor which consumes larger time. Finally, data will be evenly partitioned on assigned processors.

ENCRYPTION AND DECRYPTION TIME EVALUATION
The experimental results were repeated 10 times for every number of processors then the average is taken and they were recorded in Table1 where it shows the encryption and decryption time of serial and parallel IDEA according to multiple input sizes. Figure 3 shows that the execution time for the IDEA algorithm, using a single processor (sequential), increased while the plaintext size increased.  figure 6 illustrate execution time of different number of processors, we choose ten different data sizes from 0.09 up to 50 MB, which covers small and large data size, the figures show the behavior of parallel IDEA is the same for all input sizes and can be described in the following: •When the number of processors increases, the encryption and decryption time decreases due to the work distributed among the processors. This is obvious when moving from 2, 3, 4, 8, 16, 32, or to 64 processors.
• The encryption and decryption time increases when using 128 processors due to the increase in overhead of communication of processors and needing for more time to split particular data size. Therefore, more processors on particular input size.

SPEEDUP EVALUATION
Is calculated by taking the ratio between the serial and parallel time [7]. According to Figure 7 which shows the speed up to five different plaintext sizes on2, 3,4,8,16,32,64, and 128 processors. The results show as following: • The speedup increases when the number of processors increases, and the increments are the same for all plaintext sizes on processors 2 to 16.
• When the number of processors equals 32 or 64, the speedup for these plaintext sizes (1.56, 3.12, 25, and 50 MByte) are increased while the speed up for 0.19 MByte is decreased.50 MByte size achieved the best speedup value as compared with other plaintext sizes on all number of processors.
• When the number of processors equals 128 the speed up is smaller for all evaluated plaintext sizes. So the speed up for the large size of data is better when using the number of processors larger than 16.

PARALLEL EFFICIENCY
Is computed by taking the ratio between speedup and number of processors [6]. Figure 8 shows the parallel efficiency of parallel IDEA algorithm for different plaintext sizes on a different number of processors. The results show that the parallel efficiency of a parallel IDEA algorithm is the best when the number of processors equals 2, 3, 4, and 8. The parallel efficiency decreases when the number of processors increases from 16 to 128. Finally, the large plaintext sizes achieved the best and kept on high-efficiency values across the different number of processors.  Electronic copy available at: https://ssrn.com/abstract=3350418

CONCLUSION
In this paper, Performance of parallel IDEA was evaluated according to execution time, speedup and efficiency for different sizes of data and the various number of processors. Parallel IDEA was implemented by C++ using open MPI library and executed on IMAN1 supercomputer. The experimental results show that execution time and speedup of parallel IDEA decreases when the increasing number of processors. When a large number of processors are used to manipulate small data size will increase run time because the amount of communication between processors will be huge. The best value of speedup was achieved when the number of processors equals 64 for data of 50 MByte. Moreover, the best values of the parallel efficiency is when the number of processors is 2, 4, or 8. It achieve up to 99% , 97% , 94%,respectively ,whereas, when number of processors 16 , 32, 64, 128 the parallel efficiency achieve up to 83% ,68% ,49% ,23% respectively.