A Deep Neural Network for Oil Spill Semantic Segmentation in Sar Images

Oil spills pose a major threat of the oceanic and coastal environments, hence, an automatic detection and a continuous monitoring system comprises an appealing option for minimizing the response time of relevant operations. Numerous efforts have been conducted towards such solutions by exploiting a variety of sensing systems such as satellite Synthetic Aperture Radar (SAR) which can identify oil spills over sea surfaces in any environmental conditions and operational time. Such approaches include the use of artificial neural networks which effectively identify the polluted areas. Considering their remarkable abilities in many applications, deep Convolutional Neural Networks (DCNN) could surpass limitations and performances of previously proposed methods. This paper describes the application of an approach that combines the merits of a DCNN with SAR imagery in order to provide a fully automated oil spill identification system. The model semantically segments the input SAR images into multiple areas of interest. The deployed DCNN was trained using multiple SAR images acquired from the Sentinel-1 satellite provided by ESA and based on EMSA records for maritime pollution events. Experiments on such challenging benchmark datasets for such an abstract problem demonstrate that the algorithm can accurately identify oil spills leading to an effective detection solution.


INTRODUCTION
Oil spill pollution is tightly connected not only with the ocean ecosystem but also with the increase of maritime commerce.Since early measures in such cases are of major importance, numerous algorithms have been presented to automatically identify such pollution spots.The vast majority of relevant methods exploits data acquired from satellites which are equipped with Synthetic Aperture Radar (SAR) capabilities due to the advantages they display.Such satellites can cover large areas of interest without the necessity of deploying extra This work was supported by ROBORDER and EOPEN projects partially funded by the European Commission under grant agreements No 740593 and No 776019, respectively.equipment, while SAR imagery is indispensable because it manifests light and weather condition invariability.On the other hand, SAR images insert a type of abstraction to the problem due to the existence of areas that may look like oil slicks but they do not constitute pollutions.
A typical process for oil spill detection can be concluded into four discrete phases [1].The first stage includes the detection of the dark formations while in the second process, features for such formations are extracted.Sequentially, the extracted features are compared with predefined values and a decision making model follows for labeling each formation.The approach poses several disadvantages, mainly due to the necessity of extracting a number of features and the non-unanimous agreement over their nature.In addition, the limitation of providing a single label to each of the input image poses constraints regarding the identified objects.The most common approaches involve a two-class classification process, one class that includes only oil spills and a second, more diverse class that involves any other dark formation.The latter can be further divided into subclasses like current shear, internal waves, algae blooms, shoals, floating vegetation and grease ice [2].Also, contextual information, like ship routes etc around the detected instances may affect significantly the final characterization of these "dark spots".
In contrast to the above approach, the presented work aims at deploying a deep Convolutional Neural Network (DCNN) for oil spill detection in order to alleviate the above shortcomings and mitigate the abstraction of the problem.While there have been some efforts using conventional neural networks [1,3], spatial features are required to be initially computed.Also these algorithms were limited to image classification, i.e. labeling the whole image and not semantic segmented territories inside the image.The problem can be considered as a rather complicated task since it requires specialized knowledge of the various phenomena related to the marine environments while optical means often fail to provide adequate solutions.Thus, the final pollution affirmation includes the mobilization of the relevant national/regional authorities and in situ identification.The proposed approach comprises a novel application of such models to accurately identify oil spills in SAR images without the prerequisite of extracting additional features leading to a completely automatic detection system.

RELATED WORK
One of the first attempts for oil spill detection was relied on the use of visible spectrum images.Various approaches were proposed such as utilizing polarized lenses [4] and hyperspectral imaging [5] among others.Although all relevant researches have proven that there is no wider distinction between oil and water in this spectrum, the field is active and the research is ongoing.
On the contrary, microwave sensors including radars are widely utilized for such applications in order to overcome the constraints that the optical sensors pose (weather and operation time dependency).For radar imaging, Synthetic Aperture Radar (SAR) is predominantly used [6] as they have been proven to be at a large scale invariant to light condition changes and clouds/fog occurrences [7].Capillary waves produce "bright" image regions known as sea clutters which in the case of oil spills are depressed and appear as dark formations.This type of detection is not exclusively observed for oils spills but also include among others wind slicks, wave shadows behind land, algae blooms, land territories etc. [7,2] rendering the problem as abstract.Since the identification of the above instances requires the definition of separate classes, an acceptable simplification regarding the desired oil spill identification could include the reduction of the problem into a two class problem, i.e. oil spills vs other phenomena.Until recently, oil spill detection approaches were based on the initial extraction of features that represent and/or simulate the physics of the oil dispersion.Such features include geometrical, physical or textural distinctive marks [1,8] based on which the models are trained.
In contrast to the above approaches, the proposed application introduces a DCNN which does not involve the extraction of any feature leading to a self-learning scheme that considers the natural phenomena behind the oil slicks over sea surface.The novelty of the application relies on the fact that no similar application addresses similarly the problem since the presented model semantically annotates the identified oil spills.

METHODOLOGY
The proposed application aims at identifying the status of each SAR pixel and highlighting the included objects instead of just specifying a simple label to the entire representation.The assignment of multiple labels/tags in each image [9] or the extraction of bounding boxes with the use of object detection techniques [10] could be potential alternatives.Nonetheless, since oil spills display a large variety of irregular shapes and could be heavily intersected with look-alike objects, semantic segmentation could be considered as the most effective solution.The advantage of the approach relies on manipulating images that can contain multiple objects of different nature without the prerequisite of splitting the image into multiple image patches manipulating oil spills, look-alikes etc.The model was inspired from the "DeepLab", initially proposed in [11], which has been proven to be sufficiently effective in multi-class segmentation.In the referred work, multiple experiments were conducted on a variety of methods and network models, including VGG-16 [12] and ResNet-101 [13] with the latter resulting the best performance.
Similar to the "DeepLab", the proposed oil spill detector includes the use of a deep convolutional neural network trained in the task of semantic segmentation and the convolution with upsampled filters, which originally developed in [14] and utilized in the DCNN context by [15].To efficiently extract the required dense features and widen the field-ofview of filters, atrous convolution was utilized as well as an Atrous Spatial Pyramid Pooling (ASPP) to employ parallel filters with various rates.The resulted maps are enlarged with bilinear interpolation to restore their original resolution.A higher level representation is provided in Fig. 1.
More specific, the deployed model for this application initially uses a DCNN in a fully convolutional fashion and relies on a ResNet-101 network model.The ResNet-101 network was selected due to the highest detection rates it results in image semantic segmentation objectives.The model was redefined, trained from scratch and properly fine-tuned for this specific scenario.However, the repeated combination of maxpooling and striding at subsequent layers decreases the final resolution of the extracted feature maps and significantly increases the overall computational cost.Thus, atrous convolution was applied to explicitly control the resolution at which feature responses are extracted.In the context of DCNNs, atrous convolution can be utilized in a chain of layers to compute the networks' responses at a randomly high resolution.
For example in one-dimensional signals, the output y[i] of atrous convolution of a 1-D input signal x[i] with a filter w[k] of length K is defined as: The rate parameter r corresponds to the stride with which we sample the input signal.Basic convolution is a special case for r = 1.Despite their ability to sufficiently represent scale by trained on multi-resolution images, DCNNs' competence for object scale can still be improved to detect both large and small objects.In applications that involve satellite image processing where operational heights vary, and oil spill detection where the size and shape of the object display extreme diversity, the scale problem can significantly affect the detection results.Therefore, to handle the scale variability, the deployed model adopted an approach which was based on an R-CNN spatial pyramid pooling method initially proposed in [16] where regions of a random scale can be efficiently classified by resampling features extracted at a single scale.These features are extracted for each sampling rate and further processed and fused to compute the final result.The deployed ASPP is depicted in Fig. 2. The final processing step involves the feature map resolution increment to restore the original resolution by applying the basic bilinear interpolation as in the "DeepLab" system.It should be highlighted that the final CRF module of the "DeppLab" system was excluded from the model since it is mostly used for refining the segmentation results.For this specific problem, the instances in SAR images display vague optical limits, hence, the CRF will not significantly improve the segmented regions and thus, pointless computational overhead will be applied.

DATASET
One of the main challenges that the research community has to face is the lack of a publicly available dataset for such applications.Previous works [1,17,8] confronted this problem by utilizing a manually created dataset.Nonetheless, the comparison with other related works using typical standards is limited.These restrictions in utilizing public benchmark datasets also limited our options into collecting SAR data from European Space Agency (ESA) databases, the Copernicus Open Access Hub1 .The downloaded SAR images were acquired using the Sentinel-1 European Satellite.The required geographic coordinates and time of the confirmed oil spills were provided by the European Maritime Safety Agency (EMSA) based on the CleanSeaNet service and its records covering a period from 28/09/2015 up to 31/10/2017.SAR raw data were properly preprocessed using fundamental remote sensing preprocessing algorithms including: 1.All potential oil spills were localized.
2. Cropped regions from the initial SAR data with the oil spills were extracted.All resulted images were rescaled to have the same resolution of 1252x609 pixels.
3. A radiometric calibration of the image was applied to project the images into the same plane.
4. A speckle filtering process followed to mitigate the effects of the sensor noise.A basic median filter of mask 7x7 was applied since speckle noise in remote sensing is similar to the salt-n-pepper in image processing.
5. A linear transformation from db to actual luminosity values was finally applied.
Numerous SAR images were processed in order to create a convenient database with a sufficient number of images which include confirmed oil spills, look-alikes and other geographical regions.The annotation of the images was based on information provided by EMSA and human identification (manually annotation).This process produced image masks where every desired object was marked with a distinct color (2 foreground+1 background).The processed images were randomly divided into a training and a testing set comprised of 571 and 106 images, respectively.Considering that relevant AI methods exploit a significantly smaller amount of training and testing images, the size of the utilized dataset can be considered as sufficient for both processes.The database is continuously updated and will be publicly available to the community in order to provide a common benchmark basis.

RESULTS
For our experiments, two foreground classes were defined for the classification process, oil spills and look-alikes, and one class for background pixels.The performance of the deployed model was initially measured in terms of pixel intersectionover-union (IoU) averaged across all classes (mIoU).In addition, the resulted IoU for each class is also provided so as to clarify the individual performance of the model and its effectiveness in each class.Table 1 includes the results from the initial experiments.
The comparison with relevant approaches would be iniquitous due to the lack of a common base (no proper database is available) and since no method currently approaches similarly the problem.Nonetheless, we utilized the accuracy so that the model could be somehow comparable to pure image classification models.Every image pair (ground truth and resulted detection mask) were cropped automatically in a predefined number of overlapping patches.In order to acquire a dataset Table 1.Segmentation results using mIoU/IoU.
which complies with the rules of image representation, some constraints were imposed: 1.A minimum number of pixels belonging to either of the two classes should be present in the patch.A threshold equal to 2% was applied meaning that the number of pixels of the largest class compared to the number of pixels of the background class should be at least 2%.
2. One of the basic classes should be dominant to label the patch.The threshold was set at most 50% of the non dominant pixels in relation to the equivalent dominant.
3. Patches recalcitrant to the above rules were discarded in accuracy calculation.
The results of patch image classification are provided in Table 2.The values are dependent of the number of patches cropped from each sample.Two different pair values for the horizontal and vertical number of patches, respectively, are presented.The results provided in Tables 1 and 2 are dissimilar since a single label per patch is evaluated and not on every pixel.Despite the invalid comparison with other relevant applications, some results were compared.The neural network based method in [3] reported a 91.6% and 98.3% accuracy for oil spills and look-alikes, respectively, with much higher number of look-alikes.The method in [1] using a decision  tree forest resulted an 85.0% accuracy as the highest value.Also, relevant methods like the probabilistic based in [18] reported results equal to 78% for oil spill and 99% for lookalikes.The initial results of the proposed application are quite similar to the corresponding state-of-the-art methods, but, without the need of extracting features and with the merit of semantically annotated regions.

CONCLUSIONS
In this paper, we introduced a new approach for oil spill identification using SAR images for maritime applications.With the adaptation of accurate DCNNs, the process can be further automated and be incorporated into a larger detection pipeline.The novelty of the application relies on the fact that no similar approach addresses similarly the problem while no feature extraction is initially required.The use of similar deep learning techniques may improve further the identification of oil slicks and decrease the abstraction of the problem.Improvements could be achieved by utilizing more advanced datasets which may include larger training sets and images from enhanced SAR sensors.Though it is a preliminary work, the results could be considered as promising to exploit further deep learning algorithms in the oil spill detection field.

Fig. 3 .
Fig. 3. Example of 4 testing images (from top to bottom): SAR images, ground truth masks and resulted detection masks.

Table 2 .
Segmentation results with accuracy.