Design of optimal short-length LT codes using evolution strategies

Luby Transform (LT) and its companion Raptor codes are the most popular implementations of digital fountain codes. Performance of these rateless forward erasure correction codes is determined mainly by the degree distributions of their encoded symbols. Although the asymptotic behaviors of LT codes with large (>;105) symbol blocks have been deduced analytically, a proficient method for finding the optimal degree distributions of short length (<;103) LT codes is still absent. In this paper, we propose a practical approach to employ evolution strategies in finding the degree distributions of optimal short-length LT codes for different applications. Our approach begins with the development of a new performance model for LT codes based on three measurements: coding overhead ε, failure ratio r and failure occurrence probability p. Three evolution strategies (DE, CMA-ES and NES) were then employed to minimize these performance measurements separately with careful design of fitness functions and necessary transformations of decision variables. Throughout the evolution process, the performance of individual LT code in the population was evaluated with numerical simulations. Our experiments showed that the optimal degree distributions can be found using all three evolution strategies but with different convergence rates and the (r,p,ε) values of these optimized codes are all distributed on a smooth concave surface.


I. INTRODUCTION
IGITAL FOUNTAIN CODES [1] [2] were introduced by Byers et al. in 1998 [3] as a new class of forward erasure correction (FEC) codes [4] for protecting asynchronous data transfer over packetized digital networks. These rateless codes can produce a limitless supply of randomly encoded symbols from given source symbols. The encoded symbols can then be put into data packets and sent via multicasting or broadcasting. Receivers over the networks can simply capture any packets in the data streams at any time. Once sufficient number of packets is captured-which usually equal to the amount of source symbols plus a small overhead -a receiver can decode the received packets and most likely recover all the source symbols using a belief propagation algorithm with linear computation complexity. If the receiver fails to recover some of the source symbols, it may capture a few more packets and continue the decoding process until all the symbols are recovered. The scenario is like filling a cup from a water fountain and hence the name of these codes.
Luby [5] proposed the first practical implementation of digital fountain codes in 2002 and christened them, the Luby Transform (LT) codes. The decoding failure rate of an LT code is completely determined by the amount of source symbols it handles, the degree distribution it uses to conduct the randomized encoding process and the amount of encoding symbols it receives before starting the decoding process. In order to improve the performance of an LT code, Shokrollahi added a low-density parity check (LDPC) code to the LT codes as a precoder and called these composite codes, the raptor codes [6]. Due to their rateless forward erasure correcting capability and Shannon capacity approaching coding efficiency the raptor codes become widely used in data transfers over wireless networks and the Internet.
Since their inception, the asymptotic behaviors of LT codes with infinite amount of source symbols have been deduced analytically. However, the deduction cannot be extended to the cases with small number of source symbols as the assumption of ergodic behaviors is no longer valid. Over the past years, researchers made various attempts to establish design guidelines for optimal "short length" LT codes or SL-LT codes. Those attempts [8]- [11] were met with limited success. Thus, the design of optimal SL-LT codes remains an open problem.
In this paper, we propose a practical and robust approach to employ evolution strategies in designing optimal SL-LT codes with decoding performance that suit different applications. The success of our approach depends largely on a new performance model we devised for the SL-LT codes in terms of their coding overhead , failure ratio and failure occurrence probability . Special efforts have also been made to ensure proper use of the evolution strategies. They include (1) the selection of evolution strategies by balancing their convergence rates and computation efficiency, (2) the choice of decision variables, (3) the specification of fitness functions, (4) the choice of initial population and (5) the criteria for selecting population samples in every generation. Our experiment outcomes were remarkably promising: significant reduction of overheads, failure rates and failure probabilities have been achieved for SL-LT codes with various block sizes.
The remainder of this paper is organized as follows. Section II provides a brief overview of LT encoding/decoding operations including the soliton distributions. Section III describes , , based LT code performance model. Section IV formulates the optimization problem and describes its two scenarios. Section V discusses the rationale for choosing the three evolution strategies while Section VI specifies the designed initial distributions, transformation of decision variables and the fitness penalty added to the out-of-range samples. Experiment results and their analyses are given in section VII and contributions are summarized in Section VIII.

A. Encoding and Decoding Operations
In binary LT codes, each source symbol is a binary tuple. These source symbols are randomly chosen and combined into an encoded symbol or codeword by a bitwise exclusive OR (XOR) operation. The number of source symbols combined into an encoded symbol is known as its degree. The relation between the source and encoded symbols can be represented as a bipartite graph or a binary encoding matrix with each column showing the presence or absence of connections between an encoded symbol and the source symbols.
A block of source symbols can be decoded or recovered from the same amount or more encoded symbols with the use of Gaussian elimination. However, this maximum likelihood algorithm with complexity can be replaced by a suboptimal belief propagation (BP) algorithm [12] with only complexity. Figure 3 illustrates the BP decoding process with a simple example. The only price for using this simpler process would be a small increase in the number of encoded symbols necessary to ensure successful decoding and the devise of "good" degree distributions to work with the BP algorithm.

B. Degree Distribution
The degree distribution of an LT code is a discrete probability mass function that specifies the probability of producing an encoded symbol from source symbols [13]. In his LT code proposal [5], Luby deduced an ideal soliton degree distribution based on an analogy of throwing infinite number of balls randomly into number of bins: With its theoretical rigor, ideal soliton is however not suitable for practical use because the decoding behavior of finitelength LT codes may fluctuate among randomly generated codewords. In the same paper, Luby proposed another degree distribution, known as the robust soliton distribution. Its probability mass function can be derived by adding an adjustment term to that of the ideal soliton distribution: d ∑ Where is the number of source symbols and is the failure probability of decoding process for some real constant 0.
Robust soliton is good for practical use but only in the cases with infinitely many source symbols. Even though Karp et al. later published a recursive formula for estimating performance of finite-length LT codes in [7], no formula is currently available for constructing optimal degree distributions for finite-length LT codes not to mention those for SL-LT codes.

C. Code Performance
The performance of finite-length LT codes has been hard to predict because these codes employ a randomized encoding process. Unlike the fixed-rate deterministic codes, which offer sturdy erasure correction capability until the symbol loss rate exceeds a limit, the randomized encoding process of LT code may produce a sequence of highly decodable codewords in one instance and a sequence of poorly decodable ones in another. Figure 2 shows the "bi-modal" profile of the decoding failure rates of a SL-LT code with 1000. There is a high chance that up to 70% of source symbols may not be recovered if only a small number of codewords were used for decoding. In these situations, the conventional approaches of minimizing the average failure rate or the total area under the failure probability profile may be misleading. The proper objective should be the reduction of the maximum failure rate or the probability of high failure instances.

D. Code Applications
Another reason for devising a new model to quantify the performance of finite-length LT codes is that different applications want the performance of the codes to be optimized in different fashions. To our knowledge, there are at least three distinct applications of finite-length LT codes.

1) Erasure protection for lossless data transfer:
Applications such as file downloads demand perfect data reception among the receivers. These applications also prefer to transmit as few symbols as possible in order to avoid waste of communication bandwidth or increase in transport latency. These applications may also conduct data transfers in asynchronous receiver-driven fashion so that a receiver can obtain data from several broadcasting/multicasting sources.
2) Data transfer with limited overhead allowance: Applications such as video streaming can tolerate small amount of decoding failures as long as the pictures can be playback with reasonable quality. These real-time applications however can not tolerate large increase in bandwidth or latency. In other words, they expect to have the best picture quality (or lowest error rates) attainable under available data rates.
3) Postcoding in rateless composite codes: In these cases, the LT codes only need to bring the decoding failure rates below certain threshold so that the precoder can recover the missing data symbols. Hence, the LT codes can tolerate certain amount of decoding failures but expect the failure rates to remain almost always below the threshold level.

III. PERFORMANCE MODELS
Based on our understanding towards the "bi-modal" performance profile and the different uses of finite-length LT codes, we devised a new performance model for these codes in terms of three basic measurements: Overhead | 0 : It is the ratio between the number of extra encoded symbols received in comparison with the number of source symbols.

Failure Rate
| 0 1 : It is the fraction of the unrecovered source symbols during a decoding process.

Failure Probability
: It is the probability of the decoding failure rate to be higher than a threshold value while the code is decoded with an overhead . Each type of applications mentioned above would like to use LT codes that are optimized with respect to one of these three measurements while having the other two maintained within their acceptable ranges. Obviously, lossless data transfers want to minimize the overhead ; multimedia streaming want to reduce their transfer failure rate while the post-coders should always minimize their decoding failure probability .

A. Design Variables
Because our ultimate goal is to find the best degree distributions using evolutionary strategies, we shall begin with the design of the decision variables that can properly represent the probability mass function . For that purpose, we created two M-tuples: the degree tuple and the probability tuple . In the two tuples, each pair of corresponding elements , captures a non-trivial entry of the probability mass function: Since the degree distribution of SL-LT codes is sparse for most actual applications, the dimension M of the tuples can be set at a much lower value than the maximum degree K.
We also encountered a few practical issues when these tuples were employed in the actual optimization processes: (1) Degree elements need to be rounded to the closest integers within 1, . (2) Several degree elements may arrive at the same value in the final result. In those cases, we need to sum up all the corresponding probability elements in order to obtain the actual probability value .
(3) Since a fixed dimension M was chosen for each optimization process, some probability elements may become insignificant in the final results. In those cases, these probability values may be added to the significant probability element of the closest degree(s).

B. Fitness Function Evaluation
Although Karp et al. derived a recursive formula for estimating the decoding failure rates of finite-length LT codes [7], the use of this formula is only computationally feasible if the number of source symbols remains very small ( 10 ). Hence, we decided to evaluate the performance of different SL-LT code samples in each generation by means of numerical simulation of the actual decoding process. In order to speed up the computation, we used incremental BP decoding and stop the simulation once approx. 10 failures were cumulated at every test point.

C. Optimization Scenarios
Since we were unsure of the convergence behaviors of designed optimization scheme, we devised two scenarios to test how the evolution strategies would behave in different settings.

1) Fixed Degree Scenario:
In this scenario, the components of the degree tuple are kept constant throughout the optimization process. Thus, the design variables consist only of the components of probability tuple.

2) Variable Degree Scenario:
In this scenario, the design variables consist of the components of both the degree and the probability tuples. In other words, the design variables become heterogeneous in nature: some of them represent the degrees of encoding symbols while the others specify the probability of having those degrees.

V. OPTIMIZATION METHODS
We attempted to employ a new performance model along with three different evolution strategies to search for SL-LT codes with specific performance characteristics.

A. Covariance Matrix Adaptation Evolution Strategy
CMA-ES [14] is the de-facto evolutionary optimization strategy. It works by iteratively updating the covariance matrix of a multivariate normal distribution of mutated population. CMA-ES is especially successful in solving badly conditioned, multimodal and noisy problems with high-dimensional rugged search landscape. We chose CMA-ES as a benchmark method for our experiments.

B. Natural Evolution Strategy
NES [15] is a new numerical optimization method, which performs gradient ascent along the natural gradient in the population parameter space [16]. The employment of natural gradient helps to prevent oscillations, premature convergence, and other undesired effects. Akitomo et al. [17] showed that CMA-ES can be regarded as a special case of NES if its stepsize control value and rank-one update value are set to zero. In our experiments we used an enhanced exponential NES (xNES) [18] for its more robust convergence behavior.

C. Differential Evolution
DE [19] is an evolution strategy, which creates new candidate solutions by combining existing ones according to simple formulae based on vector differences. DE performs especially well on noisy constrained optimization problems with multidimensional real-valued functions and problems that change over time. In our experiments we have obtained consistent and robust results applying DE versions Rand/1/Bin and Rand/2/Bin, while versions Rand/1/Exp and Rand/2/Exp as well as all corresponding Best versions in most cases suffered from premature convergence.

VI. OPTIMIZATION SCHEME
Various issues needed to be considered as we employed different evolution strategies (ES) for the design of optimal SL-LT codes with specific performance characteristics.

A. Decision Variables 1) Transformations between Decision and Design Variables:
Since the design variables of SL-LT codes, i.e. the degree tuple and the probability tuple , have bounded value ranges, it is necessary to adapt their ranges to those of the decision variables of the evolution strategies. Towards that end, we defined two decision variable tuples: and as well as the necessary transformations between the decision and the design variables: and . ES population samples are specified in terms of decision variables, while the fitness functions are evaluated based on design variables. Separate transformations were devised for different evolution strategies. This is because CMA-ES and NES are stochastic strategies that use Gaussian distributions to produce random off-springs over an unbounded variable space while DE is a population-based evolution strategy that can evolve its off-springs in a bounded variable space.

2) Penalty for Out-of-range Degree Values:
The fact that CMA-ES and NES can generate unbounded values for their variables affects both, degrees and probabilities. A quadratic function was thus introduced to add monotonically increasing penalty to the fitness value when the degree values lie beyond their admissible range: Where for 0 for

B. Initial Degree Distribution
Both CMA-ES and NES require their users to specify the initial values of their decision variables. For the optimization of SL-LT codes, these initial values correspond to their initial degree distribution.

1) Choices of Initial Degrees:
Since the SL-LT codes often have small average degrees and sparse degree distributions, we chose their initial degrees from the range of 1, /5 . As shown in TABLE I. , we chose the degree values to be either the prime numbers or the powers of two in the fixed degree optimization scenario so that they are spread even over a logarithmic scale and do not possess a common factor. The initial degree values used in the variable degree scenario are underlined elements of TABLE I. They were the ones having the highest probabilities in the fixed degree scenario. 2) Specification of Initial Probabilities: We started our optimization experiments using three different initial degree distributions. Two of them were derived from the soliton distributions. We chose them because ideal and robust soliton work perfectly well in the asympotic cases of infinite-length LT codes. We chose uniform distribution as the third alternative so that we can perform a "stress test" onto the optimization strategies with initial values located far from the optimal solution. Sparse Robust Soliton Distribution: This sparse initial degree distribution was created by gathering probability values under the adjacent degrees of the robust soliton distribution to those under the selected initial degrees. The boundaries of these gathering intervals are set at the midpoints between two adjacent initial degrees: Where is the probability mass function of robust soliton distribution and the degree tuple is sorted in an increasing order. Sampled Ideal Soliton Distribution: Except the last one, every component of the probability tuple was assigned the corresponding value of the ideal soliton distribution. The last component then absorbed the remaining probability [20]: for 1,2, … , 1 1 for Uniform Distribution: Every component of the probability tuple was assigned the same value: 1 for 1,2, … ,

A. Specific Values of Performance Measurements
To test the optimization strategies, we tried to find optimal SL-LT codes based on the following performance measurements that may appear in real-world situations:  In each optimization task, we fixed the values of two performance measurements while optimizing the third one.

1) CMA-ES and NES Cases:
Users only need to specify the initial decision variable values and their standard deviations. For the standard deviations, we chose the values from [10,30] and [0.02, 0.2] for decision variables representing degrees and probabilities respectively, depending on the kind of the initial degree distribution being used.

2) DE Parameters:
In our experiments, we assigned the values of crossover probability and scaling factor to be equal 0.5 and 0.7 respectively, as suggested [19][21] [22] to be used when the optimization problem is continuous, noisy, and exhibit respective problem dimensionality.

C. Comparison of Convergence Behaviors
We traced the fitness function values and average degrees of every generation throughout the evolution process and plotted the behaviors of different optimization runs in Figure 4 to Figure 12. Among them, Figure 4- Figure 6 show the results of the experiments at which each of the three performance measurements ( , , ) were being optimized. Convergence behaviors of CMA-ES/NES/DE are compared in Figure 7 - Figure 9 while the convergence behaviors of CMA-ES with three different initial degree distributions are shown in Figure  10- Figure 12. Figure 3 shows the optimized performance measurements obtained using CMA-ES as coordinates in the ( , , ) space. The blue, purple and red dots highlight the results that minimized overhead ε, failure rate and failure probability respectively. Obviously, these dots are distributed over a concave surface. We regard this phenomenon as evidence that CMA-ES has found the truly optimized values. Also, certain basic properties of the LT codes are dictating their performance.

E. Observations
We can summarize the results of our experiments into the following observations. Evolution strategies were shown to be a practical method for designing optimal SL-LT codes. All three strategies used in our experiments managed to converge and produce degree distributions for SL-LT codes that perform much better than the robust soliton distribution. DE appeared to be inferior to CMA-ES and NES in terms of the rate and robustness of its convergence behaviors. CMA-ES and NES showed similar convergence behaviors regardless of the choice of initial degree distribution. NES appeared to be the most robust evolutionary strategy for finding the optimal SL-LT codes. In a few stringent cases, NES succeeded in producing optimal results while CMA-ES has failed to converge. The optimization scenarios with fixed and variable degrees produce similar performance measurement values as well as similar degree distribution profiles although the variable degree scenario tends to produce more sparse distributions.

VIII. CONCLUSION
In this paper, we propose a practical approach to employ evolution strategies (ES) in finding the degree distributions of optimal short-length LT codes. Experiment results confirmed that ES, especially NES, can produce optimal degree distributions suitable for different applications. Moreover, we believe that the consistent results obtained by applying three different ES to different optimization tasks with different initial degree distributions is a sign that certain basic properties of the SL-LT codes are dictating the performance of these codes.