Ultrasonic fatty liver imaging

Fatty liver disease is a prevalent condition which may result in serious liver complications and is currently lack of an effective and efficient approach for its quantification. In the paper, we propose to directly image the fat content distribution in liver based on ultrasound echo radio-frequency signals. In the proposed method, spectral difference is utilized to represent the small pieces of liver tissues. Then the connection between the data representation and liver tissues is directly established by an elaborately designed learning process in the high-dimensional feature space, which includes comprehensive hyperparameter learning and model learning. Experimental results demonstrate the effectiveness of the proposed method which is able to visualize the fat distribution and has a 0.93 correlation coefficient with the fat-percentage quantification results of doctor's pathological analysis.


INTRODUCTION
Fatty liver, or liver steatosis, is the buildup of triglycerides in the form of lipid droplets in the liver which can be a result of several causes such as alcohol consumption, viral hepatitis or metabolic dysfunction [1,2]. If fat proportion in the liver is larger than 5~10%, it is considered as fatty liver disease which is highly reversible when the extent of fatty change is not high [2]. However with further progress associated with inflammation, the disease becomes much irreversible and may result in severe conditions such as steatohepatitis, liver cirrhosis and hepatocellular carcinoma [1]. Since the prevalence of fatty liver disease is as high as around 30% of the population [3], an efficient and accurate method for diagnosing fatty liver extent is important for the clinical practice.
Nowadays liver biopsy is the golden standard to evaluate patient's fat fraction. Histopathological analysis is reliable on tissue characterization. However the approach may encounter the problem of limited tissue sampling [4]. It is also an invasive method and patients may suffer from serious complications. These significantly restrict the application of liver biopsy in fatty liver quantification. This work is supported by Philips Research China.
Ultrasound as a diagnostic imaging is widely used due to its noninvasive nature, real-time imaging and low cost. In ultrasound echo images, fatty liver disease manifests as increased echogenicity and signal attenuation. Experienced doctors may diagnosis the existence of fatty liver disease, but it is subjective, operator-dependent and not quantitative. Research on quantitative ultrasound has demonstrated the capability for the diagnosis of fatty liver disease to some extent [5,6].
As liver fat content results in increased attenuation, conventional echo signal based attenuation estimation is one major research direction for fatty liver disease diagnosis [5,7]. These methods assume total attenuation as a function of attenuation coefficient in an exponential form. Then an echo signal from a well-characterized reference target is utilized and the attenuation coefficients can be estimated for the processed target [8]. However, parenchyma heterogeneity may negatively impact these methods. Meanwhile, signal beamforming for the processed and reference data is required to be consistent. There also exist approaches without using the reference signal [9,10]. But they are often based on strong assumptions or require additional signal information to constrain the estimation performance. Recently shearwave-based parameter estimation methods [11,12] demonstrate the potential for a reliable output however they are still in the preliminary research phase. Other acoustic parameters like thermal stain [13], acoustical nonlinear parameter, etc. [14] manifest the variation due to the change of fat content. For the practical estimation of these parameters, rigorous approaches are required for specific signal acquisition.
For fatty liver quantification, the parameter-based methods need to further establish a connection between different parameter values and the corresponding fatty liver extents. The above mentioned ultrasound parameters often approximately demonstrate a linear relationship with the extent of fat content. However the exact relationship is still not fully understood yet. Also since fatty liver tissue is fundamentally a synthetical configuration, the estimated ultrasound parameters manifest interacted impacts with each other. Such interaction may significantly affect the diagnosis of fatty liver disease and is still under research.
Another research direction for fatty liver quantification is the learning-based discrimination between fatty liver and normal liver. Authors in [15,16] [17] did a study on distingui fatty, cirrhosis and carcinoma cases. A with two hidden layers is applied wit image statistics and the total accuracy is These methods highly depend on devic utilization of echo images. Meanwhile, tion of normal and fatty patients is the co The quantification of fatty liver extent is of these methods.
In the paper, we propose to directly tent distribution in liver based on ultra frequency (RF) signals. With the visua tion, doctors may conveniently identi extent of one patient. In the proposed me one small piece of liver tissue, the spec simply derived based on the RF signal piece of liver tissue and directly used as a large number of such liver samples, a l performed to directly model the conne samples and their corresponding histop sis results. The learning process is based of an elaborately designed machine l which includes comprehensive hyper-p and model learning. The histopatholo every ultrasound data are based on the slices that are obtained from the corres liver surgery. With the established learn fatty liver tissues can be directly ident The imaging of the visualized fat distr termed as ultrasonic fatty liver imaging i

Data representation
When an ultrasound echo signal is tran piece of liver tissue, the signal will be i small piece of liver tissue. Such interac process. Generally, the signal "before" piece of liver tissue will be different fro ter" the interaction with the small piece the proposed method, one fundamental the difference between the "before" an contains the related characteristic inform piece of liver tissue and may be used small piece of liver tissue. Therefore to ference", it is heuristically proposed to ing equation for the characteristic repr small piece of liver tissue: where S is the representation for one sm tissue, which is considered as one sampl learning process. S before and S after are "after" RF signals in time domain, resp notes the Fourier transform and the resul corresponding complex coefficients. |·| calculation for each frequency componen The design of such signal represent the following considerations. First, fo ishing the normal, A neural network th the features of s 96% as reported. ce settings due the binary classificaoncerned problem. s beyond the scope image the fat consound echo radioalized fat distribufy the fatty liver ethod, to represent ctrum difference is l around the small one sample. With learning process is ction between the pathological analyd on the utilization learning approach parameter learning ogical results for analysis of tissue sponding patient's ning-based model, tified in real time. ribution in liver is in our paper.

ODS
nsmitted to a small interacted with the ction is a complex entering the small om the signal "afe of liver tissue. In assumption is that nd "after" signals mation of the small d to represent the denote such "difapply the followresentation of the mall piece of liver le in the following the "before" and pectively. F(·) delt is a vector of the denotes the norm nt. tation is based on or ultrasound RF signals, the Fourier transfor represent the signal with a basi Therefore it can be seen that on (1) has the form of a high dim ments in the vector are obtai frequency components which a other. Considering the element ing point of view, such proper representation in the following ly, it is straightforward to appl denote the "difference". With formation in the data represent "complete" for the following sample S, the "before" and "aft of the received RF signal arou piece of liver tissue. The calcu ference as (1) can be directly a following processes are based No other assumption that is r signal property or further sig ultrasound machine is taken. T ther requirement of signal or other methods as mentioned following process is to directly corresponding histopathologic relationship between the ultra ence and the fat content is direc intermediate calculation.
There fat-extent quantification can b veniently performed. Now the question is, does s ally contains enough or correc objective of fatty liver quan ultrasound signal property cha output? In the proposed meth straight or theoretically answe Instead, a learning process is p learn the connection between the fat-content quantification such a connection can be foun The impacts from other ultras designed to be handled by th learned connection is supposed plication. The learning is pro Fig. 1. Diagram of the mo rm could be applied to is of frequency components. ne sample S as calculated in mensional vector. The eleined based on the Fourier are perpendicular with each s as features from the learnrty is important for sample g learning process. Secondly the "minus" operation to such operation, signal intation part is supposed to be process. Thirdly, for one fter" signals are the portions und the corresponding small ulation of the spectrum difand simply performed. The on such generated samples. related with any ultrasound gnal processing settings of This design avoids the furr machine constraints like in section 1. Fourthly, the y link the samples with their cal analysis results. The sound RF spectrum differctly established without any efore the application for the be straightforward and consuch data representation ret signal information for the ntification? Will any other ange impact the final model hod, the questions are not ered from the signal itself. proposed and applied to first the data representation and and later validate whether nd and robustly established. sound signal properties are e learning process and the d to be intrinsic for the apoposed to be directly perdel-learning framework formed in the high-dimensional space of the data representation, which strongly enhances the learning's potential capability. An elaborately designed machine learning approach is correspondingly proposed, which will be described in details in the next section. Since the small pieces of liver tissues (not the patients) are the processed objects, the number of samples for the learning could reach the level of hundreds of thousands even with a limited number of patients. This further guarantees the feasibility of the proposed method. Experimental results preliminarily validated the effectiveness of the proposed data representation, which will be detailedly described in section 3.

Learning process
The proposed learning process consists of two parts. The first one is the hyper-parameter learning which is to understand the proper biological representation of liver fatty and normal content for the spectrum difference and adjust the appropriate number of samples for the following process. The second part is to learn the optimal model for connecting the samples with the corresponding histopathological results. To achieve the real-time imaging, the classifier which is as the core part of the learned model is restricted to the ones with an appropriate complexity. Meanwhile, the computational complexity for the optimization of model is deliberately and tremendously increased to achieve the high performance for the entire learning process.

Hyper-parameter learning
For one sample as defined by (1), there are two hyperparameters related with the biological representation. The first one is the RF signal length for either S before or S after , which denotes the choice of information quantity that is interacted with the liver tissue. Large length may benefit the sufficiency of the considered signal information. However it may also lead to the loss of localization for the data representation. The second hyper-parameter is the interval length between the "before" and "after" signals, which is related with the appropriate tissue size for representing the liver normal or fatty content. Since the number of samples for the learning process could be extremely large which may not be suitable for some kinds of models, it is necessary to optimize the number of samples in an appropriate level considering the different computing capabilities of the following learning models. Therefore the sample-density coefficient is the third hyper-parameter required to be optimized, which denotes the region size for selecting one small piece of liver tissue and generating the corresponding one sample as defined by (1). It is noted that the "sample" and the related descriptions here and as followed are for the term from the learning point of view, which has nothing to do with the sampling notion when receiving the RF echo signal.
For the described three hyper-parameters, an intensive grid search is applied for their learning. In the entire learn-ing process, the hyper-parameter learning is used as the wrapper for the following model learning.

Model learning
This study focuses on identifying the liver fat tissue in the liver normal parenchyma which is technically considered as a binary classification problem and supervised learning is utilized in the proposed method. The ground truth is from the histopathological analysis results. However the histopathological analysis result for each fatty liver case is a percentage of fat content. It doesn't denote the exact correspondence between every small piece of liver tissue and its histopathological analysis result, which is impossible to be realized in practice even with the large histopathological slice. To handle this problem, the fatty liver cases with a high percentage of fat content are selected in the proposed method to extract the fatty liver training samples which are given with the ground truth as complete fatty tissue. Correspondingly, normal liver samples are extracted from normal liver cases. It can be seen that for fatty liver samples, an unknown proportion of them is given with the incorrect ground truth. However since the majority of the samples is with the correct ground truth, it is assumed that such majority could dominate the determination of the hyper-plane for the binary classification problem. The impact from the incorrect ground truth is left to be handled by the model learning itself in the sample space. The feasibility of such assumption has been validated by experimental results.
Based on the extracted training samples and the corresponding ground truth, a learning framework is established to perform the model learning. The diagram of the framework is demonstrated as Figure 1. The features for one sample are the frequency components of the spectrum difference as calculated in (1). Sampling method first deals with the number imbalance of fatty and normal samples. Then feature filtering is applied to preselect the relatively significant features and preliminarily reduce the feature dimension. Optimization criterion is further used to measure the performance for the cross validation results and guide the learning within the search algorithms. When every step of the framework selects one specific method,

Framework Step Applied methods
Sampling method None-sampling, over-sampling, under-sampling [18] Feature filtering None-filtering, T-test [19], Wilcoxon rank-sum test [20], Kolmogorov-Smirnov Test [21], MR-MR [22], KL Divergence, Rf-Gini Importance, Rf-Mda Importance [23], ReliefF [24] Applied classifiers KNN, SVM, linear discriminant analysis [25], logistic regression [26], AdaBoost [27], Random forests [28] Optimization criteria Accuracy, g-mean sensitivity, AUROC [29] Search algorithms Grid search, stepwise optimization, constrained stepwise optimization, genetic algorithm [30], simulated annealing [31]  the whole framework can be performed once and the specific learning will stop after the convergence. In the proposed method, we make several different methods available for each step of the framework. Different methods may take different assumptions or have different computational advantages for classification problem. The detailed methods for each step are listed in Table 1. Then the model learning based on the framework is performed by exhaustively running all the combinations of all different essential methods for every step. Meanwhile we extremely increase the number of optimized parameters and their optimization ranges for different methods. The search algorithms are simultaneously performed in both feature and parameter space. Considering the grid search of hyper-parameter learning as the wrapper of the model learning, it can be seen that the computational cost for the entire learning process is extremely large. This may guarantee the learning process converged at the optimal model for representing the intrinsic connection between the data representation and the histopathological results. Meanwhile, after the entire learning process, the main computational cost for the imaging application of the learned model mainly comes from the classifier complexity. As previously mentioned, the applied classifiers are constrained with the ones of an appropriate complexity. Therefore the extremely high complexity of the learning process will not affect the imaging cost and the real-time imaging can be still guaranteed.

EXPERIMENTAL RESULTS
Ultrasound RF signals were collected on liver parenchyma regions with patients' consent by Philips iU22 system with a L9-3 transducer before surgery. The corresponding liver parenchyma slices were obtained during surgery and the pathologic analysis were further performed by at least two experienced doctors. In this study, normal liver cases are from the patients whose parenchyma is pathology confirmed with no fibrosis and no inflammation. For fatty liver cases, the patients are also not fibrotic or inflammatory and the fat-content percentage is quantified by the doctors. Totally, there are 16 normal cases from 9 patients and 11 fatty cases from 6 patients. For the hyper-parameter learning, the RF signal length, the interval length and the sample-density coefficient are in the range of 0.62mm to 2.46mm, 0.39mm to 1.54mm and 0.0706mm 2 to 0.1413mm 2 , respectively. Training samples are generated from 5 normal cases and 4 fatty cases for which the two patients' fat-content percentages are 40% and 60% respectively. Generally the number of the generated training samples is in the level of 30,000 for one specific set of hyper-parameters. All the remained samples are used as the testing samples. The output of the learned model is a normalized score in the range of 0 to 1. For one processed small piece of liver tissue, the higher value denotes higher probability to be a fat content.
The tremendous learning process in the proposed method is automatically performed with parallel computing, which totally takes 27 days. Based on cross validation results, a model with Random forests as the applied clas-sifier is selected for the further testing. For one testing case, it takes around 0.3 second for realizing the ultrasonic fatty liver imaging. It can be seen that the elaborate design for the real-time imaging is realized.
For the experimental results, the reasonable and meaningful way to evaluate the model performance for the fat content quantification is to directly correlate the model's fat-percentage estimation results on the testing fatty liver data with the percentage estimation results from the pathologic analysis. A thresholding value is required to determine whether a small piece of liver tissue is fat-content by its model score. Figure 2(a) demonstrates the results of the calculated correlation coefficients. With the thresholding value as 0.58, the correlation coefficient could be 0.93 which verifies the effectiveness of the proposed method to some extent. The corresponding fat-percentage estimation on the testing normal liver data is 0.9 2%, which is also consistent with the pathologic analysis results.
For the further comparison, we used the parametric methods including attenuation [9], envelope statistics, peak-to-peak imaging, frequency shift, Nakagami parameter [32] to substitute the part of data representation in the proposed method. Every sample becomes a 16 dimensional vector and the related other processing is as same as in the proposed method. For the correspondingly learned model, the correlation result with the pathologic analysis is shown as Figure 2(b) and the correlation coefficient may achieve 0.86. On one hand, it manifests the effectiveness of the learning process in modeling the connection between ultrasound echo signal and liver tissue. On the other hand, it denotes that the proposed spectral difference is superior for the proposed method in data representation mainly due to its "complete" information.
The generated fatty liver images by the proposed method are demonstrated as Figure 3 where red denotes high probability to be fat content. The fat distribution can be clearly visualized. It also can be seen that for Figure 3(d) and 3(e), the fat-percentage should be both 5% since they come from the same patient. However Figure 3(d) denotes more fat content. The discrepancy might be possibly due to the limited scope of its pathologic slice.

CONCLUSION
In the paper, we propose a method to model the connection between ultrasound echo signal and fatty liver tissue. Instead of explicit description or theoretical proof, the connection is directly established by learning in the highdimensional space. Experimental results manifest the correlation coefficient as high as 0.93 between the proposed method and the pathologic analysis in quantifying the percentage of fat content. The in-vivo nature and visualized fat distribution make the proposed method superiorly suitable for widely and regularly screening in fatty liver quantification.

ACKNOWLEDGEMENT
The assistance of Dr. Jiawu Li, Dr. Wenwu Ling and Dr. Changli Lu from West China hospital is gratefully acknowledged.