Domain Adaptation in Power Line Segmentation: A New Synthetic Dataset

Power line segmentation is a critical component of UAV intelligent inspection systems to ensure the safe and reliable operation of power grids. For challenging-to-label tasks like this, simulators can efficiently generate large amounts of labeled data. In this work, a large-scale annotated synthetic power lines dataset generated utilizing the unity game engine and the unity perception package 1. To address domain shift between real and synthetic domain, input-level adaptation performed. Additionally, a new power line segmentation loss developed to mitigate the effects of unbalanced pixel distributions among power lines and background. Experiments demonstrate that our approach achieves state-of-the-art performance on power line segmentation task.


INTRODUCTION
Transmission line networks have almost expanded everywhere due to rising electrical consumption.To ensure the safety of low-altitude flights, which are seriously affected by the widely spread power network, like those performed by Unmanned Aerial Vehicles (UAVs) is necessary to recognize power lines beforehand.Particularly, low-altitude flight accidents involving electrical lines might cause significant damage to those lines, resulting in widespread power outages and affecting transmission line reliability.Traditional methods for inspecting electricity networks that have been utilized for decades include visual examination by human inspectors, and helicopter-assisted inspection.The major restriction of employing the aforementioned methods is that it is relatively slow, expensive, labor-intensive, expose inspectors to hazardous working circumstances, and is also restricted by the inspectors visual observation skills.
Power line segmentation is a crucial component of the UAVs intelligent power line inspection process for powergrid security and low-altitude safety.The effectiveness of This work has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 871479 (AERIAL-CORE).
1 https://github.com/george-kalitsios/Synthetic-Power-Lines-SPL-Datasetsegmentation methods is dependent on the availability of labeled training data.Collecting and annotating large datasets for each new task and area is costly, time-consuming, laborintensive, and error-prone.For example annotation and annotation quality evaluation for just a single image in the wellknown Cityscapes [1] dataset took over 1.5 hours.The complexity and diversity of natural scenes also contribute to the difficulty of image segmentation task.Building a synthetic dataset with a rendering engine is extremely effective because it includes automatically generated annotations and, in some cases, is required due to the enormous amount of labor-intensive time required to collect and label a real-world dataset of this scale.In this work, a large-scale annotated synthetic dataset with power lines generated.Nevertheless, due to the domain shift, training the model with simulated data doesn't always yield satisfactory results when applied to real data.Domain Adaptation (DA) has been recently used in the semantic segmentation field to bridge the distribution gap between the real domain and the synthetic domain, hence boosting the generalization capabilities of learned models on real data.Our framework explore the idea that performance can improved without additional training beyond the core task of semantic segmentation by simply aligning low-level statistics among synthetic and real distributions.
The following are the major contributions of this work: i) a large-scale synthetic power lines dataset.To the best of our knowledge, this is the first synthetic dataset including power lines that will be made publicly available, and it constitutes a significant contribution for both the scientific community and industry.ii) a new DA framework for power line segmentation, particularly input-level adaptation performed utilizing a lightweight fourier-based image-to-image translation strategy.This strategy outperforms well-known domain adversarial adaptation strategies in terms of training speed, ease of implementation, and memory.iii) a weighted power line segmentation loss function combining focal loss and dice loss was developed to mitigate the impact of the unbalanced distribution of pixels between power lines and background on segmentation accuracy.iv) The proposed framework achieve SOTA performance in the task of power line segmentation when applied to the well-known TTPLA power lines dataset [2].

RELATED WORK
Power line Segmentation: Traditional approaches identify power lines primarily by recognizing power line features [3] or objects associated [4] with power lines based on the assumption that power lines are either straight lines or polynomial curves that are parallel to each other.Such methods begin by distinguishing potential power line pixels from the background utilizing edge detector, then they employ hough transformation [5], followed by previous knowledge to refine the detected outcomes.Most of the previous approaches rely on complex parameter adjustments, making them less stable in practice.
Since Deep Neural Networks (DNNs) can automatically learn useful features and enable an end-to-end solution, researchers have started to build power line inspection systems leveraging DL.In [6] authors proposed an attentional convolutional network for pixel-level, which consists of an encoderdecoder information fusion module as well as an attention module.A fast single-shot line segment detector trained with artificially generated power line images proposed in [7].In [8] a network for pixel-wise straight and curved power lines was presented.They utilized the edge attention fusion module and a high pass block to extract semantic and spatial data to enhance the detection result along the boundaries.In [9] a approach based on the edge structure and scene constraints was proposed.In [10] an approach based on Generative Adversarial Networks (GANs) is described for segmenting power lines from aerial images.Domain Adaptation: Adaptation methods can be applied at several levels, including the input-image level, internalfeature representations, and output-level.Recently, the majority of approaches have relied on adversarial learning [11] enabling pixel or feature-level adaptations or self-training [12] through the refinement of pseudolabels.In adversarial learning, the core issue is the necessity of learning multiple networks in addition to the target one, and the challenge of stabilizing adversarial training, but also the self-training approach that is self-referential, demands careful design to avoid error propagation.
Image-to-image translation [13] approaches have been also investigated, typically they transfer an image color, lighting, and other stylization characteristics from one domain to the other, or even from both domains to just an neutral domain.To that goal, a variety of GANs have been developed for transferring image styles while modifying image structures as little as possible.Nevertheless, GAN-based translation models operates in spatial space, where image styles as well as image structures are intimately connected, eventually modifying image structures in an undesirable way.

INTRODUCED SYNTHETIC DATASET
There are two works with artificial datasets containing power lines [7], [14], but both of them aren't very photorealistic and are not publically available.The previous issues served as inspiration for this work.First and foremost, there is a demand for a publicly available large-scale Synthetic Power Lines (SPL) dataset consisting of thousands of RGB images.Moreover, this dataset can expand at any time, at any desired resolution, without requiring the sensor to be adjusted or the rebuilding of any previously used environments, and it's also considerably more photorealistic as shown in figure 1.The Fig. 1: Images from the synthetic power lines dataset captured in various environments, angles, and camera distances.
SPL dataset was generated utilizing unity, one of the most popular gaming engines in the world, and unity's perception package.The introduced dataset containing aerial images of power lines in every possible combination of different locations, lighting conditions, tilts, angles, FOVs.All power lines in SPL are recorded from several viewpoints, including front view, top view, and side view.As a result, there are almost no occlusions and UAVs can fly in any direction without concerned about detection accuracy.Power lines must be correctly detected against backgrounds, the SPL includes a large number of images of power lines with noisy backgrounds, making the process of extracting power lines challenging and close to a real-world scenario.The SPL dataset statistics are summarized in Table 1, and were divided into 70% for training, 10% for validation, and 20% for testing.

PROPOSED FRAMEWORK
Fourier-based Adaptation: Although the majority of interdomain low-level statistical differences lack any semantic importance, they are likely to cause an unexpected performance drop on target samples, even though the source and target images share a higher level of semantic similarity in terms of scene structure and content.This is important as it seems that DNNs don't really transfer well between various lowlevel statistics.Inspired by [15] we employ a Fourier-based translation technique at the image input level by swapping a synthetic image's amplitude spectrum with that of a random real image.This method, doesn't employ discriminators that align pixel/feature-level distributions or image-to-image translation networks to generate training images.Fourier DA utilized as a separate step and doesn't at all require any training to achieve domain alignment, instead relying on a simple Fourier Transform and its inverse.
In DA, given a source (synthetic) dataset D s {(x s i , y s i )} Ns i=1 and a target (real) dataset D t = {(x t i , y t i )} Nt i=1 , where x s , x t ∈ R H×W ×3 is a RGB image, and y s , y t ∈ R H×W is the segmentation map associated with x s , x t accordingly.Utilizing Fourier-based DA, we aim to bridge the domain discrepancy between the two datasets and enhance performance on the D T .The amplitude and phase components of the Fourier Transform are defined as: where the real and imaginary components of the Fourier Transform F (x) are represented by R (x) and I (x), respectively.Assuming that the image's center is (0, 0), we indicate with M (β) a mask whose value is zero everywhere but not in the region where β ∈ [0, 1].The altered spectral representation of x s , indicated as X (x s , x t ), in which the low frequency component of the amplitude of source image Π (x s ) is swapped with that of the target image Π (x t ), can be formalized as: Fourier-based adaptation formalized given a set of randomly sampled images x s , x t as: Where the altered spectral representation X (x s , x t ) is projected back to the image x s→t while keeping the phase component Φ (x s ) unaltered, whose content is similar to x s , but whose appearance is similar to a sample from D T .Segmentation Architecture: DeeplabV3+ [16], the most recent DeepLab family version that combines a wide range of strategies such as skip connections, dilated convolution, global context, and robust backbone network, was selected in this work.This high-quality segmentation model has an encoder-decoder architecture with dilated separable convolution built of depthwise and pointwise convolution.As an encoder, DeeplabV3+ employed the DeepLabV3 [17].The segmentation architecture, as shown in figure 2, was trained using the proposed power line segmentation loss.
Power line segmentation loss: Power line inspection is challenging due to poor visual appearance and complex backgrounds.Dice loss [18] considers both local and global loss information, has no trouble learning from classes with less spatial representation inside an image, and focuses mostly on mining the foreground during the training phase.Dice loss, formulated as follows: where p i is the i-th pixel's estimated probability and g i is the i-th pixel's ground truth.However, it only solves the problem among foreground and background while ignoring another imbalance among easy and hard examples.
To address this restriction of dice loss, we employ focal loss [19], which emphasizes on examples where the model is inaccurate rather than examples where it can reliably estimate, allowing predictions on difficult examples to get better over time rather than the model being overconfident with easy ones.This is accomplished by a process known as down weighting.Focal loss, formulated as follows: The weight vector α = [0.25,075] for background and power lines class, respectively.The higher the value of focusing parameter γ, the greater the attention paid to misclassified hard examples (power lines) and the smaller the loss propagated from simple examples (background), based on our experiments, we set γ = 3.
In this work, in order to enhance the performance of power line segmentation on aerial images we introduced a weighted loss function for power line segmentation that combines focal loss and dice loss to benefit from the advantages of both.During training, focal loss is utilized to encourage the network to pay more attention to challenging examples like power lines, and dice loss is used to help the network improve its exploitation of foreground regions as well as learn proper boundary representations.When the loss functions are combined, we get our complete learning objective as: where λ 1 , λ 2 regulates the compromise between the dice loss and the focal loss.Based on the results of our experimental analysis, λ 1 , λ 2 are both set equal to 0.5.

Datasets and Implementation Details
There aren't many publically available datasets for power lines.For experimental evaluation, the SPL dataset presented in this work and the well-known real-world TTPLA dataset [2] were employed.We utilized 905 images from the TTPLA dataset for training, 110 images for validation, and 217 images for testing.We resized the images from both datasets to 1024 × 1024.
The ResNet-50 backbone pre-trained on ImageNet was utilized for feature extraction.To optimize our network, we utilize SGD, using an initial learning rate of 0.01, momentum of 0.9, and weight decay of 4e-5.The "poly" learning rate policy is employed to control how the learning rate decays during training.The model was trained on a single NVIDIA RTX 2080 TI GPU with 11GB of VRAM for 100 epochs.The output stride parameter set to 8, the ASPP module's output channels were set to 256, and the dilation rate in the ASSP module is set to [1,12,24,36].

CONCLUSIONS
In this work, a novel large-scale synthetic dataset for training and testing power line segmentation approaches generated, reducing the need to gather and label a large number of real-world images.Furthermore, a novel power line segmentation DA framework that can close the gap produced by the domain shift between the synthetic and real image presented, and it is a highly promising direction based on our experimental results.Our approach is motivated by the fact that low-level spectrum can differ significantly without impacting perception of higher-level semantics.In addition, a new weighted loss function proposed that combines focal loss and dice loss to improve the performance of power line segmentation on aerial images.Finally, our framework achieves state-of-the-art performance for power line segmentation task on the well-known real-world TTPLA dataset.We hope our work can shed some lights into the community.

Fig. 2 :
Fig. 2: The proposed framework consists of two components: a) an input-level DA module that employs a Fourier-based image translation strategy, b) a high-performance semantic segmentation architecture trained with a power line segmentation loss.

Table 2 :
Datasets and Loss functions Experiments.

Table 2 ,
second row, demonstrates that training with just the SPL dataset decreases TTPLA performance compared to training with just the TTPLA dataset, Table2, first row.This happens as a result of the domain shift from synthetic to real.However, when both the TTPLA and SPL datasets are combined, we can observe in the third row that there is a considerable improvement in TTPLA, Table2, first row, particularly +5, 19% on the val set and +8, 32% on the test set.Loss function: We compare the power line segmentation loss introduced in these experiments to the standard CE loss used in semantic segmentation.The power line segmentation loss improved performance is +4.47% in the validation set and +2.41% in the test set, as shown in Table2.Domain adaptation: We tune β until artifacts in the transformed images are visible, which occurs when β exceeds 0.25.According to our findings, an intermediate value for β = 0, 01 yields the greatest outcomes (58.47% in the TTPLA test set).We gain +2.10% on the validation set and +3.23% on the test set with input-level DA, as shown in

Table 4 .
[10]arison with SOTA method on TTPLA dataset: To ensure fairness, the same ResNet-6 backbone and 512 × 512 resolution are chosen.Based on our previous results, we apply fourier-based adaptation with β = 0, 01.On the TTPLA test set, our method outperforms the recently presented PL-GAN architecture[10]by +3.82%, as shown in Table5.