Experimental Verification of Complex-Valued Artificial Neural Network for Nonlinear Equalization in Coherent Optical Communication Systems

We propose a novel design of neural network for mitigating the fiber nonlinearity, employing a structure based on physical modelling. The neural network achieved nearly 5 times BER reduction in a field trial when transmitting WDM 200G DP-16QAM over a 620 km legacy link.

sion compensation aided by a phase/amplitude normalization. Moreover, when compared to NLE based on the dynamic neural network [7], [11] with 2 layers and 192 neurons, a BER decline of ≈ 2.5× and ≈ 5×, correspondingly, was obtained for LEAF and SSMF. Finally, the proposed technique was experimentally validated by applying it to a 31×200G channel WDM system employing a DP-16QAM signal, transmitted over a 612 km SSMF legacy link [12] . A ≈ 5× BER drop has been demonstrated on top of electronic dispersion compensation combined with phase/amplitude normalization only.

Neural Network Design
We start from the standard pass-averaged Manakov equation [13] describing the DP signal propagation along the optical channel by: where u H/V (t, z) are the normalized optical fields of H and V polarization, respectively, β 2 is the group velocity dispersion (GVD) coefficient,γ is the effective averaged nonlinearity coefficient, that includes the effective length scale L eff = (1 − e −αL )/α emerging due to the averaging over N s periodic span loss and gains, with α being the fiber loss coefficient, γ is the effective nonlinear coefficient, L is the span length. In the case of no GVD β 2 = 0, the analytical solution of Eq. (1) written in terms of the transmitted x k and received y k symbols, can be expressed as [14] : We assume that Eq. (2), given the adaptive parameters are added, can describe the nonlinear signal distortion remaining after the chromatic dispersion compensation (CDC) in the links with the weak dispersive broadening. It leads us to the following model of nonlinear distortion: where x (V /H) k is the k-th recovered transmitted symbol, Y (V /H) k is a sequence of the received soft symbols described as [y (V /H) k−N , ..., y (V /H) k , ..., y (V /H) k+N ], 2N + 1 is the sequence length, {c 1 , ..., c 5 } are complexvalued parameter vectors with the same length 2N + 1, and Θ 1/2 and Ξ 1/2 are two nonlinear functions that depend on the sequences Y H k and Y V k . The sequences Y (V /H) k account for the complex GVD-induced memory in the nonlinear interactions equalised by the adaptive filters represented by the c k vectors. The nonlinear functions Θ 1/2 and Ξ 1/2 mimic the inter-channel Kerr nonlinearity and the transciever impairments.
The proposed NN topology, implementing the nonlinear distortion model Eq. (3), is schematically depicted in Fig. 1. Since the complex-valued symbols, weights and functions are used in our approach, we design our NN as a complex-valued one [15], [16] . We use non-conventional neuron activation functions -|x| 2 , ln(x) and e x to make the NN resembling Eq. (3). Furthermore, we introduce Θ and Ξ as three-layer perceptrons, since a multilayer perceptron is a universal nonlinear function approximator [6] . The NN that learns Θ has three layers with n 1 , n 2 , and n 3 neurons, respectively, in the first, second, and third layer. In the same way, the NN representing Ξ has layers containing n 4 , n 5 , and n 6 neurons. All the neurons in both Θ and Ξ NNs have the following complex-valued activation function: The NN is applied to the received soft-symbols already pre-processed by the linear digital signal processing (DSP) [17] including full frequencydomain CDC (see Fig. 1). Since the optimal NLC input size and the functions Θ and Ξ may change from case to case, we implemented a Bayesian optimization (BO) algorithm [18] to define in each scenario the best values for the following NN hyper-parameters: input memory size N and neuron numbers n 1 , ..., n 6 . A target parameter for BO was the best BER reached by the NN during 5000 training iterations. The NN itself was trained with the Adam optimizer [19] minimising the mean-squared error (MSE) between the predicted and actual transmitted symbols with the learning rate 10 −3 , and batch size 1000. Optimal configurations found for each studied testcase, are listed in Table 1. Notably, the NN training data (both simulated and experimental ones) was randomly shuffled before every training iteration to eliminate possible dataset periodicity and avoid the risk of NN overfitting [20] .

Results and Discussion
Fistly, we numerically considered the transmission of DP-64QAM symbols shaped by an RRC with roll-off 0.06 over the metro coherent link consisting of 6×80 km LEAF or SSMF spans. The fiber loss, dispersion and effective nonlinear coefficients used were, respectively, for LEAF: α = 0.225 dB/km, D = 4.2 ps/(nm·km), γ = 1.3 W −1 km −1 and, for SSMF, α = 0.21 dB/km, D = 16.8 ps/(nm·km), γ = 1.14 W −1 km −1 . Every span was followed by an ideal lumped optical amplifier (OA) fully compensating for the losses on the fiber span and injecting the additive white Gaussian noise (AWGN) with noise figure NF = 4.5 dB. The transmission was simulated by the split-step Fourier method [14] . The proposed NN was benchmarked in the role of NLC in the receiver-based DSP (see Fig. 1) against the dynamic neural network (DNN), suggested for NLC in [11] , and the classic DBP [21] . To ensure a fair comparison, the employed DNN had two layers with 192 neurons each, as in [7] , the same length of input symbol sequences as our NN, and operated on the symbols sequences from both polarizations Y H k , Y H k , as in [22] . The DBP operated on 2 samples/symbol signal with 2 steps per span (StPS). Fig. 2 and Fig. 3 show the results obtained by the considered NLC strategies in the simulated LEAF and SSMF based transmission systems, respectively. For LEAF testcase, the proposed NN reduced the BER by up to ≈ 10× when compared with CDC followed by a least-squares amplitudephase shift (Norm) and up to ≈ 5× when compared with a two layer DNN. The proposed NN also overcame the DBP 2 StPS performance. Additionally, the optimum launch power increased from about 0 dBm to about 3 dBm when using the proposed NN instead of CDC + norm. For the SSMF system, we have observed a reduction of the BER of up to ≈ 2.5× when using the proposed NN with respect to both the CDC plus Norm. The DNN produced negligible BER im-provement in this case.
The experimental setup used for the field trial is described in [12] . The transmission link consists of multiple pairs of G.652 SSMF deployed between Torino and Chivasso in Italy, leading to a total length of 612 km. The transmitted spectrum consists of the channel under test surrounded by 15×200G 16QAM neighbor channels from both the right-and left-hand sides, resulting in a 31 WDM transmission system in the 37.5 GHz grid (33.01 GBaud). Fig. 4 shows the NLC performance obtained in the field trial by the proposed NN and the DNN. The optimal launch power increased from 3 to 4 dBm and the BER fell off by up to ≈ 5× and ≈ 3× when compared with the CDC plus Norm and the DNN approaches, respectively. Moreover, in the experimental study, the NN led to a BER improvement in the linear regime too. This result highlights the ability of the suggested NN to mitigate the impact of not only the Kerr nonlinearity, but also of the additional limitations such as, e.g., the impact of narrow transmitter bandwidth, low-resolution digital-to-analog converters, and imperfections of the driver amplifier and the dual polarization Mach-Zehnder modulator.

Conclusions
We investigated the NLC performance of the suggested complex-valued artificial NN considering different fiber systems and compared it with the dynamic multi-layer neural network. Our results demonstrated that the proposed NN leads to significant system performance improvement when considering both simulated and experimental data. Furthermore, the NN was shown being able to compensate not only channel nonlinearities, but transceiver distortions too.