File Encryption and Hiding Application Based on AES and Append Insertion Steganography

Steganography is a method of hiding secret message in other innocuous looking object so that its presence is not revealed. In this paper, a message, which is a computer file of any type and size, is hidden in a selected cover or carrier, which is a computer file of certain types. The steganography method used in this paper is called append insertion steganography method. This method is chosen as an attempt to remove the limitation of message format, which appears in many popular steganography methods. To scramble the meaning of the hidden message, AES-256 (Rijndael algorithm) is used to encrypt the message with a secret passphrase. A special block of bytes is used to identify and verify the original message so it can be recovered while retaining its integrity. The C# programming language and .NET Framework are chosen to implement the algorithm into a Windows application. In this paper, one cover file contains exactly one message file. In the testing, five random files are used as secret message. Their integrity is calculated using SHA-256 before hiding and after recovery. In the testing, they all retain their integrity, proven by exact hash values. Thus, the application as the implementation of proposed algorithm is proven feasible but only for personal use as some improvements still have to be implemented.


Introduction
Steganography is the art of concealed communication, which keeps the existence of a message secret. It is an alternative tool for privacy and security, which may suppress and prevent unnecessary attention of third parties, as encrypted messages are obvious even though they are in nonsense form [1]. Thus, steganography can be applied to avoid unwanted attention or in places where the usage of encryption is illegal. Encryption and steganography can be used together to ensure data security. Any possible eavesdropper should not realize any attempt of secret communication at all, and when the presence of the message is compromised, it is still not in meaningful form.

Append Insertion Steganography
Appending data to the end of file is one of the most flexible form of digital steganography, as many file types allow data to be appended into them, without causing error, and it has potential to remove limitations such as size and type of data that could be hidden. Many types of file have end-of-file (eof) marker inside it, and most file readers will simply ignore any data after the eof [2].

Related Works
Many articles are reviewed not only to study the algorithm, but also to find similarity in them that can be considered as limitation, which solution can be proposed. In [3], grayscale secret images with bit depth of 8-bits are hidden in cover images with bit depth of 24-bits. Cover images have size of 800x800 pixels, while secret images have 100x100 pixels. The resulting stego images have 99% conformity with the original images and it can withstand both intentional and unintentional attacks to some extent.In [4][5], images are processed using binary mode. Transform domain method such as DWT and DCT are used. In addition, the size of the images is often too small [6] uses pixel differencing with n-bit Least Significant Bit (LSB) substitution to provide higher embedding capacity without sacrificing the imperceptibility of the host data. Some images used in the testing have up to 359 kilobytes hiding capacity. In [7], a steganography technique using LSB substitution and PVD method is presented as an adaptive scheme in the spatial domain. The method partitions the grayscale image into several nonoverlapping blocks with three consecutive pixels. This can also increase hiding capacity. Both [6] and [7] have not yet implemented encryption.

Typical Limitation of Various Steganography Methods
As mentioned previously, many steganography methods limit the type and size of message that can be hidden inside cover/carrier files. Some even do not implement any encryption. At best, the hiding capacity of cover files can be optimized using methods that are relatively new and they were tested against various attacks with certains parameters.
This study aims to design and implement an algorithm based on append insertion steganography method that is flexible for the user to choose what type and size of sec ret file he/she wants to hide, instead of focusing to limit the message in some way to meet the requirements of a steganography method.

Research Method 2.1. Limitation of Cover File Types
While the message file can be practically any file, the carrier file must have certain characteristic so it can be opened after being embedded with encrypted data. To avoid losing the embedded data, the best types of carrier file are the files intended for sharing and viewing only. The file format must also have some kind of stop marker, found either at the end of file or at other structure, which is used as a point for their default handler application to stop reading the stream of data. This way, the resulted stego file can still be opened normally. Some file formats which have that characteristic, are: a. Portable Document Format (pdf), b. Bitmap image (bmp), c. Joint Photographic Expert Group (jpg/jpeg), and d. Portable network graphics (png).

Proposed Algorithm for Identifying Secret Message
A special block of bytes, called Identification (ID) Block, is needed for the algorithm to find out the following information of a secret message: a. Hash value. This is used as secondary measure for authentication and integrity. The original message hash value is kept and later compared when the message is extracted. Forty digits long SHA1 hexadecimal value is chosen. b. Type (extension) of file. Even though the file has no extension at all, it will pose no problem because empty string value ("") is supported. c. Key index. A 0-based index value is needed to tell the application at which point the message sequence start in the stego file, so it can be extracted completely. This value is equal to the length of cover file. Vertical line/pipe (|) character is used to separate those three values . The key index is generated last and always padded so the total length of ID block is 64 bytes. Therefore, the structure of this vector is: This ID block is encrypted and then appended to the end of encrypted message.

AES-256 Encryption
Advanced Encryption Standard (AES) with Rijndael algorithm, which replaced the aging DES, is used to enhance the security and confidentality of data [8]. In this study, 256-bits key length with block size of 128 bits is used. The core of Rijndael algorithm is the round function, which consists of four steps; each has its own particular function [9]: a. Bytesub (BSB), b. Shift Row (SR), c. Mix Column (MC), and d. Round Key Addition (RKA)

Cipher Block Chaining
CBC is a block cipher mode of operation to define how block ciphers (such as in DES and AES) may be applied correctly depending on the specification [9]. CBC eliminates the near obvious data pattern weakness found in the older Electronic Codebook (ECB) mode.
In CBC mode, each block of plaintext (128 bit for AES) is XORed with the previous ciphertext block before being encrypted using chosen algorithm, hence the name "chaining". For the first plaintext, an initialization vector (IV), which best randomly and securely generated, is used to XOR the first plaintext. For the decryption, previous ciphertext block is used to XOR the decrypted ciphertext block to generate the plaintext. The same IV used in the encryption is used to XOR the first decrypted ciphertext. Figure 1 shown the diagram of CBC operations.

PKCS7 Byte Padding
During encryption process, the message chosen by user may come in various sizes , and the plaintext block cannot be empty and must be filled completely with the help of padding. Thus, PKCS7 padding is used in this study for its simplicity. In PKCS7, the value of each added byte is the number of bytes that are added. For example, if 3 bytes of message is put into the 8bytes block, additional 5 bytes are needed to complete the block. Using PKCS7, the value of those bytes is 05. Even if the message size in bytes divided by block size results in integer value, padding is still used for consistency. PKCS7 is thoroughly described in [10].

Proposed Algorithm for File Hiding and Recovery 2.4.1. File Hiding
The steganography method used to hide the confidential message file is append insertion method. The primary reason is to remove the file type and size limitation found in many other methods. The activity diagram of the proposed file hiding scheme is illustrated in Figure 2. The algorithm is as follows: a. Choose a carrier file and a message file. b. Read all contents of those files as binary into array of byte. c. Generate 64 bytes identification (ID) block. d. Encrypt the of the message byte array. e. Append the ID block to the encrypted message byte array.  ; < number of blocks; ++: a. If , XOR the message block with generated IV, encrypt the resulting block using the secret key to generate ciphertext block. b. Else, XOR the message block with previous ciphertext, encrypt the resulting block using the secret key to generate ciphertext block. 5. Prepend the salt and the IV to the generated ciphertext.
The functions to generate salt, IV, and secret key need to be secure. The resulting stego file will have this structure:

File Recovery
To recover the hidden message file, the ID block has to be retrieved first before encrypted message bytes can be extracted and decrypted with sec ret passphrase. The algorithm is as follows: a. Choose a stego file. b. Read all contents as binary into a byte array. c. Get the 64 bytes ID block at the end of stego byte array. d. Extract and read the encrypted message bytes into a byte array. e. Decrypt the encrypted byte array. a. If = 0, decrypt the ciphertext block using the key, and XOR it with IV to generate the message block. b. Else, decrypt the ciphertext block using the key, and XOR it with previous ciphertext block to generate the message block.

Results and Analysis
The application was made using C# programming language and NET Framework optimized for Windows operating system. The size of final assembly is approximately 58.5 KB. Figure 4 shown user interface of application. The integrity testing examined whether the message file, which is the file to be hidden in the carrier/cover file, retained its integrity before being encrypted and embedded into the cover file, and after being successfully extracted and decrypted. SHA-256 hash function was used to produce hash value (also called signature or checksum). The same message file would always produce the same hash value.
In this testing, five randomly generated files, each with different size, got their hash value computed. Those files was encrypted and hidden in a carrier file and then retrieved back, using proposed algorithm explained previously. Table 1 shown the details of those files. A random 2 MB portable document format (PDF) file was used as cover. Note that the characteristics of cover files would not matter most of the times, as long as they comply with the    Table 3 shown the result: During the testing, CPU, memory, and hard drive were on idle and used to process only necessary programs and services, including the operating system, the IDE and its debugger, and the application itself. Based on the result, obviously, the bigger the message file, the longer the encryption and hiding time. Exception was found at the last test, which the hiding time for a file with 50000000 bytes in size only took 1272 milliseconds. This could be caused by the .NET Framework optimization when operating with large data.

Conclusion
The proposed file hiding and file recovery algorithm involving identification (ID) block was proven usable for hiding any kind and any size of data (file) using append insertion method in the selected cover file types. In addition, this method allows the usage of current standard AES to scramble the message/hidden files, further enhancing data security. The application as the implementation of proposed concept could be used to hide and later recover a message file up to 47 MB in roughly 2005 milliseconds, which is acceptable for personal usage. The maximum size of files depends on the system that runs this application. The hash values of each dummy file before hiding operation and after successfully recovered were the same. The final assembly can be found at https://github.com/GottfriedCP/FileStegano. Per January 2018, it has been downloaded 2100 times. This study is an initial attempt to remove the limitation of message type and size in in steganography method, thus giving users freedom to choose any file format. Some improvements have to be done, such as to increase its robustness to detection and various attacks by making the cover and the message seem inseparable through implementation of special file formatting and handling.