A Physics-Assisted Deep Learning Microwave Imaging Framework for Real-Time Shape Reconstruction of Unknown Targets

In this article, an innovative approach to microwave imaging, which combines a qualitative imaging technique and deep learning (DL), is presented. The goal is to develop a tool for reliable and user-independent retrieval of the shape of unknown targets from the knowledge of the scattered fields. Qualitative imaging methods are powerful inverse scattering tools, as they provide morphological information in real time. However, their outcome is a continuous map, which has to be hard-thresholded to clearly identify the targets. This thresholding unavoidably results in case-dependent, often user-biased, results. To deal with this issue, a DL approach, based on a physics-assisted deep neural network, is proposed to automatically classify image pixels, i.e., to generate binary masks, separating the targets (foreground) from the background. In particular, the proposed network binarizes the output of a qualitative imaging inversion technique known as the orthogonality sampling method. For the sake of comparison, a DL method is also exploited, which generates the binary masks directly from the scattered fields without any qualitative imaging aid. A quantitative assessment of the performances of both methods and a test on experimental data are provided.

The mathematical problem underlying MWI is a nonlinear and ill-posed inverse scattering problem (ISP), and many techniques have been proposed in the literature to achieve reliable and accurate solutions [6]. Among them, qualitative imaging approaches, which aim at recovering just the presence, position, and shape of the unknown targets from the knowledge of the field they scatter, are worth mentioning [7]. As a matter of fact, in many practical cases, retrieving the morphological information of a target is sufficient information [1], and qualitative methods pursue this goal smartly, without introducing approximations and in real time [6], [7].
The typical outcome of qualitative imaging methods is a continuous map of an indicator function, which usually takes on large values where the target is supposed to be located and low values elsewhere [7]. While this can be sufficient to provide qualitative visual information about the target, it does not provide a clear indication of the actual morphological properties. This is particularly important when an objective characterization of the target is needed, or when the morphological information is exploited as a priori knowledge by a quantitative imaging technique aimed at achieving the estimation of the electromagnetic properties of the target. The issue of deriving the morphological information from the indicator is typically tackled by hard-thresholding the continuous map at some ad hoc defined values. However, this is a case-dependent approach, often biased by the user's expertise [8].
In this article, an innovative approach to tackle the shape retrieval problem is presented. The proposed approach consists of enhancing qualitative imaging with automated classification, based on deep learning (DL), to enable a user-independent procedure.
In recent years, machine learning, and especially its sub-field of DL, proved to be extremely successful in solving complex problems and attracted the attention of many researchers who explored its potential in the solution of ISPs [9]. In this regard, families of approaches can be identified according to the role of DL in the devised framework [9]. For instance, direct learning methods provide a reconstruction of a target directly from the scattered fields. In this family, works have been carried out to improve the accuracy of 0018-926X © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
well-known neural network architectures, e.g., incorporating a switch layer into an existing architecture [10]. Another option is learning assisted loss function methods, where the goal is to use DL to find the optimal parameters of an iterative optimization algorithm in the solution of the ISP. Notable examples of these methods employ the learning power to find the best searching directions for an iterative solver [11] or retrieve prior information on the scatterers [12], [13]. Finally, with the incorporation of domain knowledge into either the inputs or the internal structure of the DL architecture, the learning is assisted by the physics of the ISP. Most relevant contributions belonging to this family supply neural networks with prior information on the currents induced on the scatterers to retrieve a refined version of those currents [14], [15] but also to recover the dielectric properties of the targets [16], [17]. This article is focused on a physics-assisted approach. In particular, a convolutional neural network (CNN) [18] is exploited to build the DL model, to which domain knowledge achieved through a qualitative imaging algorithm is incorporated. CNNs have been extensively used for classification purposes, in different fields involving imaging [19]- [21]. As a matter of fact, the problem at hand can be seen as a pixel-wise classification (segmentation), where the end goal is the location of structures and their boundaries, instead of a regular image classification problem.
As far as the qualitative imaging approach is concerned, the proposed processing framework exploits the orthogonality sampling method (OSM) [22], [23]. In OSM, the evaluation of the indicator function does not require the explicit solution of an inverse problem so that there is no need for determining a regularization parameter. Such a circumstance is particularly attractive in view of the implementation of an automated processing chain. Moreover, OSM can be applied to a wide variety of measurement configurations (single-view, multiview, multifrequency, and a combination of them) and provides imaging results in real time.
Preliminary results related to this article have been presented in [24], where qualitative microwave imaging was combined with the DL approach, and imaging results were given for a limited number of numerically simulated geometries.
In the following, the proposed automated inversion framework is presented and assessed in the canonical 2-D scalar problem (TM polarized fields) in free space with both simulated and experimental data taken from the Institut Fresnel database [25], [26]. In addition, to prove the superior performance of the proposed approach, a comparison with a direct learning method is presented as well.
This article is organized as follows. In Section II, the problem is formulated. In Section III a description of qualitative imaging via OSM is given. Section IV provides a description of the physics-assisted framework proposed to cope with the shape retrieval problem. Section V describes the network implementation, the optimization, and the adopted training set. Section VI reports the assessment on simulated data, generated in the same way as the training set, while Section VII presents the assessment against Fresnel experimental data [25], [26] to appraise the performances in conditions different than those used for the network optimization. Conclusions follow. Throughout this article, a time-harmonic behavior was supposed; the corresponding time factor e j ωt is assumed and dropped.

II. FORMULATION OF THE PROBLEM
Let denote the imaging domain, which hosts the cross section of a collection of targets invariant along one direction (say the z-axis). The targets are embedded in a homogeneous and lossless medium of relative permittivity ε b , and each target is characterized by a relative dielectric permittivity ε(r ) and an electric conductivity σ (r ) with r = (x, y). All materials are supposed to be nonmagnetic, i.e., the magnetic permeability is everywhere that of vacuum, μ 0 .
The unknown targets are probed with TM-polarized incident fields E inc , transmitted by a set of antennas located in r t ∈ , with being a closed curve located in the far-zone of . For each transmitter, the interaction between the incident field and the targets gives raise to the scattered field E s , and the superposition of these two fields is the total field E = E inc + E s , which is measured by a set of receivers that, without any loss of generality, is assumed to be located on as well, with the receiver position being r s .
The overall phenomenon is cast through a Fredholm type integral equation as where G(r s , r ) is Green's function of the assumed homogeneous background medium and τ (r) = ε eq (r )/ε b − 1 is the contrast function encoding the properties of the targets. ε eq (r) = ε(r ) + j σ (r)/ωε 0 denotes the relative complex permittivity of the targets, with j being the imaginary unit, ω = 2π f being the angular frequency for the corresponding working frequency f , and ε 0 being the dielectric permittivity of vacuum. The problem that is faced is the retrieval of the shapes of the targets from measurements of the fields that they scatter. While simpler than the problem of achieving a complete characterization of the targets (i.e., not only their shape but also their dielectric properties), this problem is still nonlinear and ill-posed if (1) is directly solved.

III. QUALITATIVE IMAGING: ORTHOGONALITY SAMPLING METHOD
The above issue has been addressed in the literature by resorting to the so-called qualitative imaging methods [7]. These methods pursue the goal of solving (1) by resorting to an auxiliary problem, which allows to reformulate it in terms of a linear integral equation. However, such an auxiliary problem, while being simpler than the original one, is still ill-posed, thus requiring the introduction of a suitable regularization strategy. This is a nontrivial issue since the estimation of the proper regularizer is an optimization problem often more cumbersome than the solution of the auxiliary problem itself [27].
A remarkable exception is represented by OSM [22], in which regularization is enforced by means of the proper definition of an indicator function [23]. In particular, the OSM indicator function is built exploiting the reduced field, E red , which, for each received scattered field, is computed as [22], [23] E red r p , r t = E s r s , r t , G r s , r p (2) where , denotes the scalar product on and r p denotes a point of an arbitrary grid sampling the imaging domain .
The OSM indicator function is then obtained as with || || denoting the L 2 norm computed on . As shown in [23], the reduced field can be seen as an implicit way to enforce the regularization of the underlying inverse problem, as E red is linked to the radiating component of the currents induced in the targets by the incident fields, i.e., the one that is not affected by ill-conditioning. Having implicit regularization bestows the method with numerical stability, thus making the OSM robust against noise [28].
Taking into account the aforementioned properties of OSM, two advantages over other techniques seem obvious. First, the introduction of an auxiliary linear problem allows a wider range of validity as opposed to Born-like approximations, where assumptions on the targets' "weak-scattering" nature are adopted. Then, no matter what linear approximation, the problem will be still ill-posed, requiring the enforcement of an explicit regularization (such as the Tikhonov penalty or spectral truncation), whereas OSM does not require such an effort.
Although linear approximations can be partially avoided using backpropagated sources, explicit regularization must be dealt with anyways. Furthermore, the number of backpropagated sources is dictated by the number of transmitters, reducing the applicability of the technique in the proposed DL framework.
The indicator I provides an estimation of the geometrical properties of the unknown target as it usually assumes large values when r p ∈ and low values elsewhere. As anticipated, the evaluation of I is straightforward and avoids any explicit inversion process, as it only requires the computation of a scalar product in each sampling point to test the orthogonality between the scattered field pattern and Green's function. Moreover, calculations are performed considerably fast; therefore, it can be assumed that image formation is performed in real time.
It is worth noting that the above formulation of the OSM is valid for single frequency data. When data at multiple frequencies are available, a collection of indicators can be generated, say I k for each frequency k = 1, . . . , N F . Then, a multifrequency indicator can be calculated combining the single frequency indicators as [22] While not requiring the determination of the optimal regularization parameter, the OSM still does not provide an objective estimation of the shape of the target, similar to other qualitative methods. As a matter of fact, the transition of I from low to high values is smooth, and there is no general criterion to exactly determine if the sampling points in the vicinity of the target's boundary actually belong to or not. In practice, this is solved by hard-thresholding the indicator, which is obviously a suboptimal solution and leads to case-dependent and user-biased results.
IV. PHYSICS-ASSISTED DEEP LEARNING FOR OBJECTIVE SHAPE ESTIMATION DL techniques are increasingly used to solve ISPs [9] so that it is worth considering them as alternative approaches to tackle the problem of estimating the shapes of the targets from measurements of the fields they scatter with the goal of providing an objective and effective estimation.
From the perspective of DL, the resolution of an inverse problem is driven by data [9]. In particular, assuming a supervised learning approach and denoting with F θ the adopted DL architecture, the key step is the training process. In it, for a given set of N pairs (x n , y n ), where x n denotes the unknown characteristics the network should learn to retrieve and y n denotes the corresponding available knowledge that leads to x n , the set of parameters θ that characterize the network are iteratively optimized through (nonlinear) regression In (5), M stands for the measure of the mismatch between the outputs of the architecture F θ (y n ) and their corresponding ground-truth values x n , and R(θ ) is the regularization applied to overcome overfitting. To exploit the above general scheme for the specific problem at hand, the training pairs (x n , y n ), the loss function M, the regularization R, and the architecture F θ have to be determined.
As far as the architecture F θ is concerned, since the problem can be devised as a classification task (i.e., assigning each pixel to either a target or the background), a special type of CNN, called fully convolutional network (FCN), can be exploited [29]. Due to its inner structure, FCN is capable of retrieving modified versions of an input image. In particular, FCN allows retrieving a direct segmentation map by means of an image-to-image mapping.
A peculiar implementation of FCN, called U-Net [16], [30], is explored here. The original implementation of U-Net is meant to generate a binary mask to allow the user objectively discriminating the foreground (the targets) from the background in the output image. As such, a convenient choice of the loss function is the binary cross-entropy [31], which enforces a maximization of the probability of correctly labeling the pixels of an input image. Moreover, to avoid introducing regularization terms in the loss function, overfitting is addressed via dropout [32], which is by randomly deactivating connections in the inner structure of the network during the training.
As for the training pairs, the first choice can be that of considering the pairs given by the scattered field collected for all illuminations and receivers, and the corresponding binary mask encoding the targets. While being straightforward, this Fig. 1. Physics-assisted DL method for the automatic generation of binary masks with the MUMU approach. From the scattered fields, a stack of OSM indicator maps is built and fed into U-Net. The network automatically retrieves the predicted binary mask for those inputs. When U-Net is in the training stage, the feedback loop is used to optimize the inner parameters of the network by minimizing the binary cross-entropy between the ground truth and the prediction.
approach has the typical disadvantages of any "direct" learning approach since the network spends a significant computational cost to learn the underlying physics, and it basically works as a black box [9]. Moreover, in this specific case, there is an unnecessary effort due to the fact that the scattered field data encode more information than needed, as they not only convey information on the shape but also on the permittivity of the targets. As such, the learning process associated with such an approach is unavoidably less efficient.
To avoid the abovementioned issues, physics-assisted learning, in which some domain knowledge is fed into or incorporated into the network, can be exploited. For the problem at hand, a convenient choice is to take advantage of the estimation of that can be achieved by means of qualitative imaging, as this is expected to reduce the effort needed by the network to learn how to translate the inputs into the expected outputs. As a matter of fact, since qualitative processing provides an image of the scenario, by simply normalizing such an image, the task of the network becomes more consistent, as it would deal with transforming an image continuously varying between 0 and 1 into a binary image. In addition to this, while working directly on the data entails that the size of the input matrix changes when changing the measurement configuration (e.g., the number of transmitters and receivers), the qualitative preprocessing provides an image whose size does not depend on the measurement configuration since the sampling grid is arbitrary and can be fixed according to the needs of the network. Of course, the measurement configuration affects the quality of the outcome of the qualitative preprocessing, but the possibility of working directly on images allows a certain degree of flexibility in the use of the resulting physics-assisted DL architecture, as it will be shown in Section VII.
Among qualitative methods, OSM is suitable to implement an automatic and efficient physics-assisted classification architecture. In fact, as explained in Section III, OSM can supply an estimate of the unknown target shape in real time (thus not affecting the computational time of the training, neither the performance of the network once put in action). In addition, it does not need an explicit regularization, which would require fixing an input parameter for each sample and may either imply suboptimal or user-biased inputs, depending on the way in which the regularization is enforced.
Moreover, when data collected at multiple frequencies are available, the use of OSM as a physics-assisted preprocessing step to feed U-Net offers additional possibilities. As shown in Section III, OSM can either provide a single image through the indicator I M F by combining the results at the available N F frequencies or provide a set of images through the indicator I computed at each frequency. This latter circumstance nicely matches with the approach usually exploited when training CNNs on regular images, where the information is separated into three channels, one for each RGB color component. In the following, the case in which learning is carried out on I M F is referred to as a single indicator single-channel (SISI) approach whereas the case of learning with multifrequency information on separate I as multiple indicators' multiple channels (MUMU) approach. Fig. 1 shows the complete processing scheme for the MUMU case. The scheme can be easily particularized to the SISI case. When direct learning (single frequency or multifrequency) on scattered field data is considered, the OSM processing block in Fig. 1 is skipped.

A. Training and Test Data Generation
To build the training set needed to optimize U-Net parameters, an approach similar to the one in [16] has been used since a similar problem (the 2-D canonical ISP in free space) is therein considered. Accordingly, the training is concerned with the simulation of a number of scattering experiments involving combinations of lossless homogeneous circular cylinders enclosed in the imaging domain. For each computed scattered field, the OSM indicator function is built according to (3) or (4) depending on the adopted approach.
In particular, cylinders placed in groups of two or three with variable sizes, locations, and permittivities were considered for the generation of the training set. In doing so, no profile was allowed to be partially outside of the imaging domain, while some targets could overlap. Details of the setup conditions are listed in Table I. From the table, it could be noted that the training set and initial validation were performed on lossless targets.
Different from [16], the training set does not consider the case of a single target, and given the fact the proposed network aims at retrieving the shape of the targets, a broader range of admissible permittivities was considered. Moreover, the measurement configuration adopted for the training is inspired by the one used in the Fresnel database [25], [26], which is used in Section VI to benchmark the proposed approach. For the training and the performance assessment, a total set of 2000 scattering experiments was simulated. Among them, 90%(1800) were used as the training set and 10%(200) as the test set. The selection of the samples belonging to either set was performed using a tenfold cross-validation scheme, leading to ten experiments with varying train and test sets to be used in the performance evaluation, as detailed in Section VI.
For each experiment, the input data-ground-truth pair for the network training was generated using a proprietary forward solver based on the method of moments. For the direct learning approach, a stack of N F matrices encoding the scattered field data at the different frequencies made the input data. Each matrix has size N R × N T . For the physics-assisted learning, the input is the 64 × 64 pixels image of the indicator I M F for the SISI approach, while the input for MUMU is a stack of N F matrices each encoding the 64 × 64 pixels of the OSM indicator I at each frequency.

B. Implementation Details
One important feature for faster convergence of the training is image normalization to [0, 1] [33]. Such normalization was performed differently in the two versions of the physicsassisted approach. For the SISI, the four OSM indicator functions were added up, and the final image (the multifrequency OSM indicator) was normalized afterward, while, in the MUMU, the same four OSM indicators were individually normalized. Finally, the normalization of scattering matrices performed for the direct learning approach can be seen as an equalization of the information across the different frequencies.
Ground-truth 64 × 64 images were obtained from the binarization of the simulated targets. In order to make the output of U-Net suitable for the computation of the binary cross-entropy, a Softmax activation function [18] is employed in the last layer. When Softmax is used, the output becomes an N-stack of images, with N being the number of classes (2 for binary classification) instead of a single image. As a consequence, the ground truth is transformed as well to match the dimensions of the prediction. The transformation of an image where each pixel belongs to a class into an N-stack of images is known as one-hot encoding [34].
The optimization of the binary cross-entropy loss function was carried out using the Adam optimizer [35], with a learning rate of 10 −5 and a batch size of 16. An optimal solution is

C. Performance Evaluation Metrics
To quantitatively assess the performances of the optimized models, several metrics were used.
The first considered metric is accuracy ( ACC) [36], which is defined as where T P, T N, F P, and F N are true positives, true negatives, false positives, and false negatives, respectively. In the problem at hand, T P was considered to be the count of all pixels correctly labeled as foreground (targets), whereas T N was considered the count of all pixels correctly labeled as background. However, ACC scores may be influenced by data unbalance, that is, when there are considerably more pixels in one class than in the others. Such unbalance results in T N and F N being much larger than T P and F P, hence making the formers dominate (6). Accordingly, a second metric, the Dice similarity coefficient (DC S) [36], was employed. DC S is defined as DSC does not rely on T N to give an estimation of the performance; therefore, it is more robust against data unbalance. Furthermore, it doubles the contribution of T P, rewarding algorithms with better target detection capabilities.
Nevertheless, DSC can also suffer from overestimating the performance in situations where there is data unbalance with a higher proportion of pixels in the foreground than in the background, as in those situations in which T P and F P outnumber F N. The double contribution of T P to DSC worsens the estimation of the performance in such situations, which could lead to an even more optimistic DSC than ACC [37]. For these reasons, a further metric called Matthews correlation coefficient (MCC) [38] was calculated as well Finally, to further appraise the performance of the various approaches, the behavior of the training and validation losses has been observed. The training losses are the values reached by the binary cross-entropy loss function during the training and provide a measure of the quality of the optimized parameters θ . Simultaneously, validation losses over the test set can be provided.
Jointly appraising training and validation losses can be useful to identify overfit models, whose training losses tend to continuously drop over the epochs without a corresponding drop in the validation losses. In fact, while training losses for overfitting models consistently drop over the epochs, their corresponding validation losses start to increase after a certain number of epochs.

VI. ASSESSEMENT WITH SIMULATED DATA
When training is complete, U-Net can be used to make predictions on new data. Accordingly, the 200 samples excluded from the training in each of the tenfold experiments were fed into the network, and the corresponding predictions were retrieved.
For each experiment, the three considered metrics were calculated and averaged over the ten folds. The average performance scores for the direct learning approach, SISI, and MUMU are shown in Table II. Among the experiments, an average proportion of foreground pixels around 10.7% of the total, with a minimum of 4.3% and a maximum of 29.2%, was found, thus, revealing some degree of data unbalance; therefore, unreliable ACC scores should be expected. Consequently, U-Net showed a very optimistic ACC, while the DSC score was lower than ACC, as expected. On the other hand, the MCC score was close to DSC, thus showing the reliability of the two considered metrics in the proposed scenarios.
The binary cross-entropy losses evaluated during the training for the cross-validation fold with the MCC score closest to average (fourth fold) are plotted in Fig. 2. The training losses show a regular decay with an advantage of SISI and MUMU over the direct learning approach. When validation losses are compared, it is clearly evident that direct learning reaches the loss plateau earlier, followed by SISI and MUMU, which is consistent with the performance metrics in Table II. For the sake of comparison, the same number of epochs was used in the three cases though SISI and MUMU were not fully in the plateau.
It is worth mentioning that the validation loss is usually calculated over a subset of the training set different from the test set. This allows for an unbiased evaluation of the models, and it is a good practice when fine-tuning the nontrainable parameters of the network (hyperparameters) [18]. Nevertheless, since SISI and MUMU reached a level of performance sufficiently accurate in the first implementation attempts, no further hyperparameter tweaking was carried out, and the test set was directly employed for the calculation of the validation loss. Fig. 3. Imaging results of four test samples. One sample per row, with the first column depicting the contrast values, followed by four columns representing the OSM indicator function from simulated scattered fields at 2, 4, 6, and 8 GHz, and the multifrequency OSM indicator in the sixth column. The seventh column depicts the prediction made by U-Net using the direct learning approach. The eighth and ninth columns depict the prediction made by U-Net when using the physics-assisted approach, SISI, and MUMU, respectively. The last column shows the ground-truth binary mask used to evaluate the scattering data.  Fig. 3. The samples were thoroughly chosen to show the processing performance with different contrasts. Their corresponding individual metrics are reported in Table. III. It can be seen that these results are consistent with both the individual metrics and average performance metrics of Table II, with particular reference to DSC and MCC. Most notably, the direct learning approach is outperformed by the physics-assisted approach, whose outputs appear closer to the ground truths.
The individual metrics on Table III demonstrate that SISI and especially MUMU achieve comparable high reconstruction accuracies with different combinations of high and low contrast targets, as well as overlapping shapes.
In commenting on such a result, the better performance of MUMU comes with a lack of flexibility since the number of images provided to the network must forcibly be the number of selected frequencies. In contrast, SISI exhibits more flexibility as it only requires a single image as input, independently of the number of selected frequencies.
An input could be constructed by adding up more OSM indicator functions than the four considered in the experiments, thus possibly allowing the network to improve the predicted binary mask. Having a MUMU-like framework with a wider selection of frequencies, however, will certainly be limited by the available memory. Such differences suggest that the two techniques can be conveniently exploited in a cross-validated framework.
In order to evaluate the numerical stability of the OSM method, further testing was performed corrupting the scattered fields with different levels of additive white Gaussian noise prior to the calculation of the OSM. Direct learning was not included in this experiment due to its previously reported worse performance. From the reported metrics in Table IV, it can be seen that the performances remain almost unchanged, with slight degradation at 0 dB.

VII. ASSESSMENT AGAINST EXPERIMENTAL DATA
As a final validation, U-Net was tested with the experimental data collected from [25], [26]. The aim of this testing was to provide an assessment of samples collected in a completely different way than those used for training.
In particular, there are differences regarding measurement configurations with respect to simulations described in Section V. Also, the Fresnel database includes cases, depicted in Fig. 4, which are not considered in the training, such as single targets, metallic targets, and targets with concave and rectangular shapes. Moreover, while the targets considered in the experiments were lossless, the dielectric materials employed to build the Fresnel targets are not exactly lossless.
Taking into account the better performance of the OSM-based physics-assisted approach, only SISI and MUMU  [25], [26]. were employed in this analysis. The results of the analysis are shown in Fig. 5, while the metrics are reported in Table V. Such metrics were calculated using generated ground truths, as depicted in the last column of Fig. 5, based on the description of the Fresnel database. Therefore, depending on the interpolation method used for the ground truth, performance scores may vary. In case (b), two cylinders, there is a known discrepancy between the nominal and the actual position of the targets (see special issue [26]). Such a discrepancy complicates a proper computation of the ground truth. Accordingly, the quality of the reconstruction for this target was assessed by computing the estimated radii of the targets, which are 13.7 and 15.6 mm, whereas the ground-truth radius for both targets is 15.0 mm.
As can be seen, the developed framework, while not optimized for the considered experimental data, successfully tackles most of the targets, estimating their shape or providing useful morphological information. Particularly, the following holds.
1) The shape of the targets (a) and (b) are properly retrieved, despite the fact that the case of a single cylinder like (a) was not included in the training. 2) For targets in (c) and (d), a sort of convex envelope is retrieved, in agreement with the fact that these targets have shapes (rectangular, concave) that are different from what the network was trained with, and therefore, U-Net tends to round their corners and fill the gaps. This explains the low DSC and MCC values. Moreover, these targets are metallic ones. Hence, this is an interesting example that, while being unavoidably driven by the data used in the training, still, the physics-assisted approach shows a certain degree of robustness. 3) Target (e) is made up of two almost concentric cylinders whose external radius is underestimated. The inner one has a permittivity of 3.0 ± 0.3, while the permittivity of the outer one is 1.45 ± 0.15. Such a large jump in permittivity influences the behavior of the OSM indicator [23], making the lower permittivity being confused with the background and partially lost. 4) As for target (f), it is interesting to observe how the network predictions improve the estimate of the shape as compared to the input indicator maps. 5) Targets (g) and (h) are hard to retrieve. The former because nested targets change the OSM indicator function, but its ground-truth binary mask remains the same as for a nonnested profile with similar dimensions. Hence, it is difficult for the network to process these types of targets. The latter is hardly retrieved because the network was not trained to deal with metallic targets.
It is interesting to notice that, nevertheless, SISI retrieves a slightly accurate result for the (h) profile.
The assessment against the Fresnel data shows the capabilities of the proposed approach with cases not derivable from the training set and nonsimulated data. Moreover, it also provides an example of the flexibility of the framework with respect to the measurement configuration used to collect the data. As a matter of fact, while the network has been trained with a full-aperture configuration (wherein both the transmitters and the receivers are located on a circumference), the Fresnel measurement configuration is limited in aspect since for each transmitter position the receivers only cover an angular sector [25], [26].

VIII. CONCLUSION
In this work, an innovative microwave imaging framework for effective, reliable, and user-independent retrieval of the shapes of unknown targets from scattered fields was presented. To this end, a processing approach, which combines qualitative imaging and DL, was provided.
As well known, qualitative imaging methods are powerful tools for solving ISPs. However, their outcome is a continuous map that requires postprocessing in order to provide an objective output to clearly identify the targets. Thresholding the continuous map is typically employed to deal with this issue but unavoidably results in case-dependent and userbiased postprocessing.
The above shortcomings were overcome by adopting U-Net as a DL tool and the orthogonality sampling method as the qualitative imaging approach. In particular, the capability of OSM to provide images at multiple frequencies was exploited to provide U-Net with information at different frequencies.
Information was supplied to the network in two ways, jointly in a single image (SISI) or as a stack of images (MUMU). In this article, only four frequencies were used, but more frequencies could potentially be employed.
Performance analysis of the processing framework was carried out on simulated data and quantitatively assessed by means of several metrics, showing, on average, excellent capabilities and robustness against noise.
The two versions of the physics-assisted approach were tested on experimental samples, some of which were considerably different from the samples used in the optimization of the network, showing compelling capabilities. Even so, performance should be expected to worsen the more the conditions differ from the training ones, as shown with the experimental samples (c) and (d).
Another limitation, specifically relevant for MUMU-like frameworks, is the need for knowing in advance the number of frequencies to be used and forcibly stick to those ones after the training is carried out. This should not be a burden in specific applications where an optimal frequency selection can be done.
It is worth noting here that the proposed approach can even be used to retrieve targets that are not in free space or embedded in a nonhomogeneous background, provided that the training would be carried out accordingly.
In order to address the listed limitations, further research with different targets, to verify the performance with less controlled experiments, possibly from medical scenarios involving lossy media, will be carried out. Furthermore, the scope of the framework will be extended to 3-D. Also, regarding MUMU implementation, a study investigating the implications of using a larger number of frequencies is needed. Specifically, an analysis of possible performance improvements, memory limitations, or processing speed burden will be performed.