Feature tracking for automated volume of interest stabilization on 4D-OCT images

A common representation of volumetric medical image data is the triplanar view (TV), in which the surgeon manually selects slices showing the anatomical structure of interest. In addition to common medical imaging such as MRI or computed tomography, recent advances in the field of optical coherence tomography (OCT) have enabled live processing and volumetric rendering of four-dimensional images of the human body. Due to the region of interest undergoing motion, it is challenging for the surgeon to simultaneously keep track of an object by continuously adjusting the TV to desired slices. To select these slices in subsequent frames automatically, it is necessary to track movements of the volume of interest (VOI). This has not been addressed with respect to 4DOCT images yet. Therefore, this paper evaluates motion tracking by applying state-of-the-art tracking schemes on maximum intensity projections (MIP) of 4D-OCT images. Estimated VOI location is used to conveniently show corresponding slices and to improve the MIPs by calculating thin-slab MIPs. Tracking performances are evaluated on an in-vivo sequence of human skin, captured at 26 volumes per second. Among investigated tracking schemes, our recently presented tracking scheme for soft tissue motion provides highest accuracy with an error of under 2.2 voxels for the first 80 volumes. Object tracking on 4D-OCT images enables its use for sub-epithelial tracking of microvessels for image-guidance.


INTRODUCTION
Optical coherence tomography (OCT) is a three-dimensional imaging modality, which provides high-accuracy depth information of light scattering properties of biological tissue. 1 Recent advances have enabled live processing and volumetric rendering of four-dimensional (3D plus time as fourth dimension) OCT images. 2 The integration of such an OCT into an operating microscope enables its use for high-accuracy intra-operative guidance with visibility of subcutaneous structures up to 3 mm in depth. 34Within this setup, redundant combination or fusion of RGB camera and OCT data brings additional information to the live camera viewer. 5An advanced future application is envisioned by soft tissue laser ablation with compensated tissue motion. 6 extracting three orthogonal slices from the regarded volume, the triplanar view (TV) as common representation of volumetric images is generated.As this works well for static single-shot 3D volumes, in time-varying volumes the surgeon can easily lose track of the volume of interest (VOI) due to observer or tissue movement.A manual adjustment of the TV to desired slices is highly inconvenient and limits the potential of 4D-OCT as a tool for intra-operative guidance.An alternative to this is the maximum intensity projection (MIP) and the sliding thin-slab MIP (STS-MIP).MIPs do not have to be adjusted, but the resulting 2D image is low in contrast, noisy, and occlusions by brighter overlying structures can occur.
which is necessary for usage on 4D images. 7Recent work has covered OCT pose estimation on 2D B-scans for visual servoing purposes. 8However, this is limited to planar pose estimation with 3 degrees of freedom.
The purpose of this paper is to provide a convenient view of 4D-OCT data by automatically and continuously selecting slices, based on a user defined, pre-selected VOI.Therefore, we present suitable image processing frameworks and investigate different vision-based tracking algorithms for their use on MIPs of 4D-OCT data.The estimated location of the VOI is used to select corresponding slices.The experimental setup (see Fig. 1) is composed of a 4D-OCT image acquisition unit-acquiring at a rate of 26 volumes per second-and an image guidance interface.Each volume consists of 320 × 320 depth scans, each with 400 pixel in depth, resulting in a rate of 1 GVoxel per second. 2This study addresses the image processing required for tracking a user-selected VOI.The motion estimate is used to adapt corresponding slices and STS-MIPs subsequently for view stabilization.

Triplanar view of the OCT volume
The triplanar view shows three orthogonal slices (axial, coronal, and sagittal, see Fig. 2) of 3D image datasets, usually referred as anatomical planes.The coronal or sagittal correlates with the OCT B-scan, depending on the scan direction.To visualize a certain VOI, e. g. cancerous tissue under the epithelium, the user has to manually select slices, which intersect this structure.
Without any slice selection, the maximum intensity projection calculates 2D planar projections from 3D volumetric image data. 9Parallel rays from a certain direction are casted through the OCT volume and intensities along the rays are analyzed.For each ray, only the maximum intensity value is projected on the resulting image plane.As OCT images represent light scattering properties, the maximum value corresponds either to matter interface or background noise.As a result of this, MIP images are low in contrast but have less speckle noise than single B-scans (see Fig. 2).
To overcome these drawbacks, the sliding thin-slab MIP (STS-MIP) is a technique for improved visualization of tomographic data.As the name suggests, MIPs from thin-slab subvolumes are only a small number of voxel thick. 10The slabs can vary in thickness and resulting projections have higher contrast than conventional MIPs.Selecting the slabs containing the tracked VOI will be done based on the tracking result.

Investigated tracking schemes
The different tracking schemes evaluated in this study are summarized in Tab. 1. Implementations of schemes 1-7 are used from computer vision library "OpenCV 3.2". 18The first six uses a brute force matcher to find feature correspondences.BMATCH is used with normalized correlation coefficient as similarity measure (refer to OpenCV documentation 19 for further explanation).SOFT aims on minimizing the tissue model energy function of tissue model S. ε is defined by the sum of correspondence energy ε C and deformation energy ε D providing regularization.The weight parameter is empirically set to λ D = 6.0.This makes SOFT robust towards outliers.The following tracking setup is used to assess the performance of the tracking schemes with respect to precision and computational cost.First, a volume of interest with the dimension of 100 × 100 × 100 voxels and a starting center voxel c 0 = (150, 120, 150) T within the first OCT volume is defined.In the tracking pipeline (see Fig. 4), the MIPs of current volume are computed and c t is projected onto them.Due to low contrast, the MIPs are preprocessed using histogram equalization.The actual tracking is performed on the resulting MIPs and therefore, the former three-dimensional tracking problem turns into multiple two-dimensional tracking problems.To reconstruct all components of c t , it is sufficient to track in only two of three possible MIPs.This results in an artificial stereo view with a rectangular epipolar geometry (see Fig. 3 left).The location of pixels which correspond to the same voxel share one component, e. g. let c t = (x, y, z) T , then the projection onto axial MIP is c t,ax = (x, y) T and onto sagittal MIP is c t,sag = (z, y) T .SOFT enforces consistency between axial and sagittal y-coordinate, whereas the others just merge the common dimension of the results by arithmetic mean.As shown in Fig. 4, the tracking pipeline splits into three paths, depending on the selected method.The result of either of these paths is used to update the current VOI and to extract new tracking templates for next iteration.
As ground truth generation in soft tissue motion is still a problem, the forward-backward error metric 20 is used to quantitatively evaluate tracking precision.The FB error for tracked VOI center c t is defined as with initial position c 0 and final position c 0,end after tracking forward to volume t = n and then backward to first volume t = 0.The test sequence counts 200 OCT volumes and errors are calculated at every 10 th volume, which means n = {10, 20, . . ., 200}.Qualitative results are shown in Fig. 6.First line shows four consecutive axial MIPs beginning at an arbitrary volume k.The images contain motion of the specimen relative to the OCT as well as tissue deformation due manipulation with the needle.The green dots represent tracked features.The VOI center c t,ax is highlighted through orange reticle.The needle induces partial occlusion.Coronal and sagittal slices transecting c t are automatically selected and shown in the lower lines of Fig. 6.SOFT is robustly stabilizing the triplanar view.

CONCLUSION
This study reveals first results of different feature tracking techniques for application in novel 4D-OCT imaging.It has been shown that conventional feature tracker-like SIFT or SURF-are not able to perform well on this kind of data.A simple block matching algorithm performed surprisingly good, but the tracking results state, that an algorithm especially tailored for soft tissue environments is mandatory for this dataset.The achieved accuracy is sufficient for intended use in 4D-OCT-guided interventions, even though the best performing algorithm has been developed for use on camera images.Therefore, future work will address the advancement of our soft tissue tracking scheme for its use on 4D-OCT images, including efficient GPU implementation.
Optical coherence tomography as relatively new and emerging medical imaging modality will attract more attention in the future.With recent advances in performance of graphics processing units (GPU), real-time processing of 4D-OCT data to extract additional information for intra-operative guidance and navigation will be possible and worth investigating.Object tracking on OCT MIPs for better visualization by VOI stabilization is an important step for enabling 4D-OCT for a whole new class of medical imaging products.

Figure 2 .
Figure 2. Visualization of OCT data of finger tip.Top line: volume rendering with corresponding triplanar slices.Bottom line: Associated maximum intensity projections.(a) Volume rendering.(b-d) Triplanar views.Table1.A summary of tracking schemes evaluated in this paper.# Scheme Description

7 BMATCH
Simple Block Matching slides an image patch over the reference image and calculates the normalized correlation coefficient.The best match is found at maximum correlation.8 SOFT Soft Tissue Motion Tracking is a recently proposed tracking scheme combining a piecewise affine deformation model with the epipolar geometry of stereo images.6

Figure 3 .
Figure 3. Tracking setup.Left: The axial and sagittal MIPs are used to generate an artificial stereo view.Right: VOI within the OCT volume and corresponding 2D templates used for the tracking schemes.The VOI is tracked throughout the sequence with respect to image acquisition unit (OCT) and tissue motion.

Figure 5 .
Figure 5. Quantitative tracking results measured with forward-backward error in voxels over OCT volume count.Left: Overview of all tracking results.Right: Selected tracking results until tracking failure.

Figure 6 .
Figure 6.Automated slice selection on TV with SOFT algorithm.Top line: feature tracking on axial MIP.Lower lines: corresponding coronal and sagittal slices on which ct is located.Note sagittal view transecting the needle for t = {0, 1}.

Table 1
. A summary of tracking schemes evaluated in this paper.#Scheme Description 1 SIFT Scale-Invariant Feature Transform uses extremal values of differences of repeatedly Gaussian filtered image as feature points.These points are described with a histogram of local image gradients. 11 SURF Speeded-Up Robust Features are inspired by SIFT.It approximates the Gaussian filtering for computational efficiency and detects feature points by calculating the determinant of the Hessian matrix.Descriptors are obtained by using spacial wavelet responses. 12 BRIEF Binary Robust Independent Elementary Features use pixel binary tests in a smoothed image patch as feature descriptor.Due to its lack of feature detection, we combine BRIEF with SIFT detector. 13 ORB Oriented FAST and Rotated BRIEF combines FAST 14 feature detector with an orientation component and oriented BRIEF feature descriptors.155 BRISK Binary Robust Invariant Scalable Keypoints uses a novel FAST-based 14 feature detector and a bitstring descriptor from intensity values by sampling each keypoint neighborhood.166 FREAK Fast Retina Keypoints are inspired by the human visual system and compare image intensities over a retinal sampling pattern.Again, we combine it with the SIFT detector.17

Table 2 .
Average execution time (in seconds) per OCT volume.

Table 2
shows overall execution times of all tracking schemes, including MIP calculation.The first six featurebased schemes all have an execution time of same order of magnitude.BMATCH has the lowest execution time and is the only one investigated, which is able to process the OCT volumes at image acquisition rate of 26 Hz.SOFT has the longest execution time of 1.66 s.It has to be mentioned, that schemes 1-7 are implemented efficiently by OpenCV, whereas SOFT has not been optimized, yet.