Analysis of Stitch-0-1-2-3-4.txm.nii

Author: Niels Jeppesen (niejep@dtu.dk)

The structure-tensor package we will be using here is a 2D and 3D strcture tensor package for Python implemented by Vedrana A. Dahl and Niels Jeppesen.

Installation

To run this notebook there are some prerequisites that must be installed in our Python environment. We generally recommend creating a new Python environment to run the notebooks in, for instance using conda create -n <env_name> python=3.7. This way we avoid conflicts with packages in your existing environment.

Install dependencies (for this notebook) using pip install numpy scipy scikit-image matplotlib nibabel tqdm structure-tensor. To start the notebook we will of course also need jupyter. Install the dxchange package from GitHub, see note below.

Now, let's go ahead and import the required modules. The structure_tensor_workers.py file should be in the notebook folder.

Load and display data

First, we'll load a sample of the data and some meta data. We will be using the following folder structure:

Let's have a need look at the data, to ensure it was loaded correctly.

Create histogram.

Crop data

Some of the data may not of interest to us, so let's crop out the area of interest. You can cut out any part of the data you want to analyze.

Let's have a look at the cropped area. The TXM reader can be a bit slow, so we will only look at YZ-slices for now. We may have to come back and adjust the cropping on the X-axis later.

Create new shape. Note: We rotate 90 degrees around the X axes to match convention.

Save as RAW

To work with the data, we will save it as a raw data file, which will allow us to access the data using numpy.memmap. Using a memory-mapped file is both convenient and fast, especially since we will be accessing the data many times from different processes.

We will be placing all our processed data in a temporary folder, as it can be re-created using this notebook if deleted. We will also use data in the temp folder as a cache, so we don't have to create the RAW volume if it already exists. Thus, if you change the volume cropping etc., be sure to delete the RAW volume from the temp folder and create a new one.

Let's have a look some slices from all three axes.

Important: Remember to look at figures below to verify that the correct data is loaded into the data variable. If you've changed the slicing or the data you're working with, you may want to delete the RAW data cached in the temp_folder folder.

Choosing fiber threshold

To choose a good intensity threshold for the fibers we can use a histogram and choose the value manually. Alternatively we can use a method such as Otsu's threshold. For choosing the threshold we make sure only to include inside-sample data. This makes it easier to separate foreground (fibers) and background inside the sample.

Structure tensor

We set $\sigma$ and $\rho$ based on the size of the fibers that we want to analyze. For more details see the related paper and StructureTensorFiberAnalysisDemo notebook.

Then, we set the block size (crop_size), which will determine the maximum size of the blocks we will partition the data into for the ST calculation. The maximum black size will depend on crop_size, the truncate value used for Gaussian filters (default is 4), and $\rho$ or $\sigma$ (usually $\rho$ because ti is largest).

We also set the fiber_threshold and specify a list of devices to use for the calculation. The list determines how the blocks will be distributed between devices for calculations. If we have a dual GPU system supporting CUDA, we can specify ['cuda:0', 'cuda:1'], which will distribute blocks evenly between the two GPUs. ['cuda:0', 'cuda:1', 'cuda:1'] would move two thirds to GPU 1, while ['cuda:0', 'cuda:1', 'cpu'] will move one third of the blocks to the CPU. Remember to update the device list to match your system resources. Specifying CUDA devices that are not available will result in exceptions and/or undefined behaviour. If you don't have CUDA GPUs available, just set device = ['cpu'] and specify the number of processes later.

The class_vectors area used to segment the voxels based on the orientations of the ST eigenvectors. It is a list of unit vectors, which represent each of the fiber classes. The first vector is considered the primary orientation and will be used for calculating orientation metrics for each voxel. For more details see the related paper and StructureTensorFiberAnalysisDemo notebook.

Lastly, bins_per_degree determines the number of bins used per degree when aggregating the orientation metrics.

Create valid data mask

To create the mask for valid data we create a mask as before, except that we erode the mask with an amount equal to the kernel radius. This was we "remove" voxels that are affected by voxels without values (black/zero intensity).

Creating fiber mask

Now that we have a threshold and data mask, we combine them into a fiber mask, which is True for voxels which contain fibers.

We can plot the mask on top of the data. Here, red are fiber voxels while blue are non-fiber voxels, background and voxels ignored for being close to the edge of the data.

Calculating structure tensor (in blocks)

We can use the get_crops function to partition the data into blocks (crops) with proper padding. However, we will actually only use len(crops) here, as we will use multiprocessing to distribute the blocks accross multiple devices. We may include a function just for calculating the number of blocks/crops at a later point.

We will be using the structure_tensor_analysis_v1 function to calculate the structure tensor, $S$, and do the eigendecomposition for the blocks in parallel. We will be saving the following metrics to the disk using memory maps created below:

  1. vec_class contains a label for each voxel.
  2. eta_os contains a stiffness estimate, $\eta_O$, for each voxel
  3. theta contains the angle, $\theta$ between vec and class 0, at each voxel.
  4. out_xy_angle angle betweenvec and the XY-plane (out-of-plane orientation).
  5. in_xy_angle rotation of vec in the XY-plane (in-plane orientation).

To calcualte these metrics, the structure_tensor_analysis_v1 uses the calculate_angles function, which is explained further in the StructureTensorFiberAnalysisDemo notebook. However, the metrics returned by structure_tensor_analysis_v1 have been aggregated (binned), as returning all the metrics for each block unaggregated would obvously consume large amounts of memory and be infeasible for large volumes. We will combine the aggregated data from each block afterwards.

In the code below we create memory-mapped files for the eigenvectors, along with three other metrics. The structure_tensor_analysis_v1 function will be using these to save the metrics for the volume straight to the disk, which may require a significant amount of disk space, but shouldn't result in memory issues. Besides the five types shown below, the function also supports saving $S$ and the eigenvalues. See structure_tensor_workers.py for details. In the example below we will be saving the results using 16-bit precision data types to save space. This is usually fine for visualization and most statistics. Saving the metrics to disk is optional. If you don't need the per voxel orientations for later, don't create the memory-maps and remove the entries from the init_args dictionary below. This will save you a lot of disk space and probably reduce the processing time.

Now we're finally ready to perform the analysis. We will be using multiprocessing.Pool to distribute the work across multiple CPU cores. If specified in the device list, the CPU will offload the majority of the work to a GPU, otherwise it'll do the calculations itself. Here, we will create four processes for each device in the list (processes=4 * len(device)). As our device list contains four GPUs, we will be starting 16 processes, four for each GPU. Beware that this may require a significant amount of GPU memory. If you experience out of memory exceptions, either reduce crop_size and/or the number of processes per device. Change the number of processes to fit your system.

All shared variables are passed to the workers using the init_args dictionary and init_workerfunction. This function is only called once per worker. When called, it will create all the memory-maps needed to read and write data during calculations, based on init_args. We will use the pool.imap_unordered function to tell the workers to perform the calculations for a block, by passing the structure_tensor_analysis_v1 function along with the index of the block which we want to analyze. The returned results are a tuple of aggregated metrics, as decribed earlier. For more details see the structure_tensor_analysis_v1 code. Below we will show have to combine the aggregated metrics and display them as histograms.

Another option is to save all the metrics to disk (using memory-maps as described earlier) with reasonable precision and perform the statistics directly on the full orientation volumes. This approach would be similar to what's done in the StructureTensorFiberAnalysisDemo notebook, except there data is kept in memory instead of on the disk. If you use this approach you may simply ignore the aggregated data returned by structure_tensor_analysis_v1, but it will require you to use a significant amount of disk space to store all the orientation and segmentation data. If the volumes are very big, working with them may also be slow and memory intensive, even if you use memmap to access the data.

Histograms

The figures above are great for validating that we've done the calculations correctly and that our results are meaningful. However, even with this many figures we're only actually showing a very small fraction of the volume. To get a proper overview of the fiber orientations and classes we need to aggregate the information in a meaningful way. One way to do this is to use histograms. Since we have saved all the data to the memory maps, we just need to aggregate the data that they contain.

Note: To load the memory maps from disk later, simply use np.memmap, for example theta = np.memmap(path, dtype=dtype, shape=shape, mode='r').

By combining the threshold mask (indicating what is fiber) with the class labels, we can create a fiber class mask for each class.

Fiber fractions

Using these masks we calculate the fraction of the fibers belonging to each class.

Stiffness meassure ($\eta_o$)

The estimated stiffness in the UD direction can be calculated.

Angle from X ($\theta$)

The histograms below show the distribution of the absolute angles between the fibers and the X-axis unit vector, along with the median angle. The resoulution of the histograms is determined by bins_per_degree.

In-plane orientation (rotaiton in XY-plane)

The histograms below show the distribution of the rotation around the Z-axis (in-XY-plane rotation), the mean and the standard deviation for each class.

Out-of-plane orientation (angle from xy-plane)

The histograms below show the distribution of the angle from the out-of-plane (XY-plane) angle.

The class fractions and orientation distributions are useful for determining if the material follows production specifications. Here we've chosen a specific set of metrics to calculate and plot, but many other distributions can just as easily be obtained and plotted.

Slices

As we chose to save the eigenvectors for each voxel in the data volume to the disk using the vec memory map, we can actually access all the eigenvectors along with the original data if we like. We also have orientation metrics avilable through the theta, in_xy_angle, out_xy_angle memory-maps, but they were saved with low precision intended only for visualization in other applications and we will not be using these data in this notebook.

Let's grab som data and their ST eigenvectors. The vectors were saved as float16 to save space, but before we can plot them we have to change their type. Generally, CPUs do not support float16 natively, so using this data type for anything but storing large data sets is generally not advisable.

Note: We've chosen some specific slices from our data, such as Z-index 40 and 80. If you Z-dimension size is 80 or less this will give an error. You can easily change which slices will be shown in the figures below, by changing/adding/removing the data_slices and vec_slices indices.

Let's can have a look at the data and the first element of the eigenvectors.

Slices with angle information

Just as we did in StructureTensorFiberAnalysisDemo, we can use the calculate_angles function to calculate the segmentation and orientation metrics for the slices we just read from the volumes.

First we define some plotting functions, then we calculate vec_class, eta_os, theta, out_xy_angle and in_xy_angle for each of the slices and plot them. All the figures will be saved to the figures folder we created in the beginning. It easy to create figures for other slices simply be changing which slices are extracted from the volumes.

NOTE: You can easily change which figures are made by commenting out the calls to fig_with_colorbar in the show_metrics function.

Effect of stitching on orientation

The stitching may affect the orientation of the fibers. Since the stitching has a certain pattern, we may be able to see this in the orientations.

In-XY-plane orientation

First, we'll examine the in-plane XY orientation for the slices above.

Out-off-XY-plane orientation

Where the backing fibers touch the UD bundles, the UD bundles are pushed slightly. This is can be observed as well.

Overlay detected backing bundle edges

The overlay the detected backing bundle edges, based on orientation with the nearest bundles visible in the data. The expectation is that the lines are placed close to the bundle edges.