Faster Multi-Object Segmentation using Parallel Quadratic Pseudo-Boolean Optimization, ICCV 2021 Paper
Author: Niels Jeppesen (niejep@dtu.dk)
This notebook is used to analyze the benchmark results from the ParallelNucleiSegmentationPart2.ipynb notebook. The benchmark is testing the performance of three different QPBO implementations: K-QPBO, M-QPBO and P-QPBO. The K-QPBO imlementation found in the thinqpbo package, which is almost identical to the original implementation by Vladimir Kolmogorov. P-QPBP is our new parallel QPBO implementation and M-QPBO is our serial QPBO implementation.
import os
from glob import glob
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
First we load the benchmark results from the CSV files. Once we've loaded the results we display the dataframe. Change the variables in the cell below to save figures or change directories.
save_figures = False
figure_dir = 'figures'
benchmark_dir = '../benchmark/nuclei_benchmarks/qpbo/'
benchmark_paths = glob(os.path.join(benchmark_dir, '*nuclei*.csv'))
benchmark_paths
df_all = pd.read_csv(benchmark_paths[0], index_col=0)
for p in benchmark_paths[1:]:
df_all = df_all.append(pd.read_csv(p, index_col=0), ignore_index=True)
df_all
To get an overview of the data we've loaded, we print the different configurations.
print('Classes:')
for n in df_all['Class'].unique().tolist():
print(f'\t{n}')
print('CPU counts:')
for n in df_all['SystemCpuCount'].unique().tolist():
print(f'\t{n}')
print('Images:', len(df_all['Name'].unique()))
For the purpose of plotting, we update the ShotName column values.
df = df_all.copy()
df.loc[df['Class'] == 'QPBOInt', 'ShortName'] = 'K-QPBO'
df.loc[df['Class'].str.startswith('QpboCap'), 'ShortName'] = 'M-QPBO'
df.loc[df['Class'].str.startswith('ParallelQpboCap'), 'ShortName'] = 'P-QPBO'
df.loc[df['CpuCount'] != -1, 'ShortName'] += ' (' + df['CpuCount'].astype(np.str) + ')'
df['SystemCpuCount'] = df['SystemCpuCount'].astype(np.int16)
df['TotalTime'] = df['BuildTime'] + df['SolveTime'] + df['WeakPersistenciesTime']
We only want to work with results from a specific system and configuration, so we filter out other. This should have no effect for the data included in the supplementary material.
mask = df['SystemCpu'].str.contains('Gold 6226R')
mask &= (~df['Class'].str.contains('CapInt') | df['Class'].str.contains('CapInt32'))
df = df[mask]
As the number of nuclei varies a lot between the images, so does the size of the graphs.
df_graph_size = df.groupby('Name')[['NodeCount', 'EdgeCount']].first().reset_index()
df_graph_size.describe()
fig, ax = plt.subplots(1, 1, figsize=(5, 2))
dist = df_graph_size['NodeCount']
dist.hist(ax=ax, bins=50)
ax.axvline(dist.min(), c=plt.cm.Set1(4), ls=':', label=f'Min = {int(dist.min()):,}')
ax.axvline(dist.median(), c=plt.cm.Set1(0), ls='--', label=f'Med = {int(dist.median()):,}')
ax.axvline(dist.max(), c=plt.cm.Set1(2), ls=':', label=f'Max = {int(dist.max()):,}')
ax.legend(loc='upper center')
ax.set_xlabel('Nodes')
ax.grid(False)
plt.tight_layout()
if save_figures:
plt.savefig(os.path.join(figure_dir, f'nodes_dist_2d.pdf'))
plt.show()
fig, ax = plt.subplots(1, 1, figsize=(5, 2))
dist = df_graph_size['EdgeCount']
dist.hist(ax=ax, bins=50)
ax.axvline(dist.min(), c=plt.cm.Set1(4), ls=':', label=f'Min = {int(dist.min()):,}')
ax.axvline(dist.median(), c=plt.cm.Set1(0), ls='--', label=f'Med = {int(dist.median()):,}')
ax.axvline(dist.max(), c=plt.cm.Set1(2), ls=':', label=f'Max = {int(dist.max()):,}')
ax.legend(loc='upper center')
ax.set_xlabel('Edges')
ax.grid(False)
plt.tight_layout()
if save_figures:
plt.savefig(os.path.join(figure_dir, f'edges_dist_2d.pdf'))
plt.show()
We can now group the data to get an overview of the solve times for N1 and N2 for each algorithm and thread configuration. The results are used in the paper, where we report the minimum (best) solve time for each group.
df_an = df[df['CpuCount'] <= 16].reset_index(drop=True)
# This is perhaps not how we should calculate total time.
df_an['TotalTime'] = df_an['BuildTime'] + df_an['SolveTime'] + df_an['WeakPersistenciesTime']
df_group = df_an.groupby(['Class', 'SystemCpu', 'CpuCount'])
df_group[['SolveTime']].describe()
Group minimum times for each config.
df_group = df_an.groupby(['Name', 'Class', 'SystemCpu', 'CpuCount'])
df_min = df_group.min().reset_index()
Separate configs.
df_qpbo = df_min[df_min['Class'] == 'QPBOInt'].sort_values('Name').reset_index()
df_mqpbo = df_min[df_min['Class'] == 'QpboCapInt32ArcIdxUInt32NodeIdxUInt32'].sort_values('Name').reset_index()
dfs_pqpbo = {}
for k, g in df_min[df_min['Class'] == 'ParallelQpboCapInt32ArcIdxUInt32NodeIdxUInt32'].groupby('CpuCount'):
dfs_pqpbo[k] = g.reset_index().sort_values('Name')
assert (np.array(df_qpbo['Name']) == np.array(df_mqpbo['Name'])).all()
for k in dfs_pqpbo:
assert (np.array(df_qpbo['Name']) == np.array(dfs_pqpbo[k]['Name'])).all()
To investigate the performance difference between the three QPBO implementations, we compute the relative speed-up on each image/task for M-QPBO and P-QPBO compared to K-QPBO.
df_qpbo['RelativeSolveTime'] = 1
df_mqpbo['RelativeSolveTime'] = df_mqpbo['SolveTime'] / df_qpbo['SolveTime']
for k in dfs_pqpbo:
dfs_pqpbo[k]['RelativeSolveTime'] = dfs_pqpbo[k]['SolveTime'] / df_qpbo['SolveTime']
df_qpbo['DiffSolveTime'] = 0
df_mqpbo['DiffSolveTime'] = df_mqpbo['SolveTime'] - df_qpbo['SolveTime']
for k in dfs_pqpbo:
dfs_pqpbo[k]['DiffSolveTime'] = dfs_pqpbo[k]['SolveTime'] - df_qpbo['SolveTime']
short_names = df_min['ShortName'].unique().tolist()
df_rel = df_mqpbo['RelativeSolveTime'].reset_index().copy().rename(columns={'RelativeSolveTime': short_names[-1]})
for i, k in enumerate(dfs_pqpbo):
dfp = dfs_pqpbo[k]
df_rel[short_names[i]] = dfs_pqpbo[k]['RelativeSolveTime']
df_rel.drop(columns='index', inplace=True)
We can plot a histogram of the speed-ups. However, it is a bit difficult to interpret due to the number of configurations.
ax = (1 / df_rel).plot.hist(bins=50, figsize=(15, 7), histtype='step')
ax.set_xlabel('Relative speed-up (times)')
ax.set_title('Relative speed-up compared to K-QPBO')
plt.show()
Plotting only three configurations makes it easier to read.
ax = (1 / df_rel.iloc[:, [0, 2, 3]]).plot.hist(bins=50, figsize=(15, 7), alpha=0.3)
ax.set_xlabel('Relative speed-up (times)')
ax.set_title('Relative speed-up compared to K-QPBO')
plt.show()
Boxplots are a nice way to display the vital information about the distributions for the different configurations. This figure is included in the paper.
ax = (1 / df_rel).plot.box(figsize=(5, 3))
ax.set_ylabel('Solve time speed-up (times)')
ax.axhline(1, label='K-QPBO', c='r', ls='--')
ax.grid(axis='y')
ax.legend()
ax.set_ylim(0, 6)
plt.xticks(rotation=25)
plt.tight_layout()
if save_figures:
plt.savefig(os.path.join(figure_dir, f'qpbp_boxplot_nuclei_cap32.pdf'))
plt.show()
The plot shows that the M-QPBO and P-QPBO provides a significant speed-up over K-QPBO for most of the images. However, the very small tasks (images with only a few nuclei) are not of much interest as the segmentation is found very fast by all three implementations.
Information about distributions.
(1 / df_rel).describe()
We can do the same boxplots, but including only results for images with 16 or more nuclei. This figure is included in the paper.
nuclei_count = 16
mask_slow = (df_qpbo['NucleiCount'] >= nuclei_count)
df_rel = df_mqpbo['RelativeSolveTime'].reset_index().copy().rename(columns={'RelativeSolveTime': short_names[-1]})
for i, k in enumerate(dfs_pqpbo):
dfp = dfs_pqpbo[k]
df_rel[short_names[i]] = dfs_pqpbo[k]['RelativeSolveTime']
df_rel.drop(columns='index', inplace=True)
df_rel = df_rel[mask_slow]
print('Images:', len(df_rel))
ax = (1 / df_rel).plot.box(figsize=(5, 3))
ax.set_ylabel('Solve time speed-up (times)')
ax.axhline(1, label='K-QPBO', c='r', ls='--')
ax.grid(axis='y')
ax.legend()
ax.set_ylim(0, 6)
plt.xticks(rotation=25)
plt.tight_layout()
if save_figures:
plt.savefig(os.path.join(figure_dir, f'qpbp_boxplot_nuclei_n{nuclei_count}_i{mask_slow.sum()}_cap32.pdf'))
ax.set_title(f'{mask_slow.sum()} images with at least {nuclei_count} nuclei (32-bit capacities)')
plt.show()
The plot shows that the M-QPBO and P-QPBO provides a significant speed-up over K-QPBO for all images with 16 or more nuclei, except one image when using P-QPBO(1).
Information about distributions.
(1 / df_rel).describe()
The actual reduction in the solve time for each image shows us the practical benefit (time saved) of P-QPBO, depending on the size of the tasks.
df_qpbo['DiffSolveTime'] = 0
df_mqpbo['DiffSolveTime'] = df_mqpbo['SolveTime'] - df_qpbo['SolveTime']
for k in dfs_pqpbo:
dfs_pqpbo[k]['DiffSolveTime'] = dfs_pqpbo[k]['SolveTime'] - df_qpbo['SolveTime']
short_names = df_min['ShortName'].unique().tolist()
df_diff = df_mqpbo['DiffSolveTime'].reset_index().copy().rename(columns={'DiffSolveTime': short_names[-1]})
for i, k in enumerate(dfs_pqpbo):
dfp = dfs_pqpbo[k]
df_diff[short_names[i]] = dfs_pqpbo[k]['DiffSolveTime']
df_diff.drop(columns='index', inplace=True)
ax = df_diff.plot.hist(bins=50, figsize=(15, 7), histtype='step')
ax.set_xlabel('Difference (s)')
ax.set_title('Difference compared to K-QPBO')
plt.show()
ax = df_diff.plot.box(figsize=(15, 7))
ax.set_ylabel('Difference (s)')
ax.axhline(0, label='K-QPBO', c='r', ls='--')
ax.grid(axis='y')
outlier_idx = (dfs_pqpbo[1]['SolveTime'] - df_qpbo['SolveTime']).argmax()
ax.scatter(range(1, len(df_diff.columns) + 1), df_diff.iloc[outlier_idx], c='r', label=f'Image {outlier_idx}')
ax.legend()
plt.show()
Except for one outlier, we see the P-QPBO and M-QPBO can provide a large reduction in solve time in the best cases, while being functionally equivalent to K-QPBO in the worst cases. By functionally equivalent, we mean that the real-time difference is so small it's irrelevant in almost all practical use-cases.
The image 421 is a bit special. It is the image with nuclei and most overlap between the SLG objects. It's size makes is suited for our fast M-QPBO and P-QPBO implementations, however the density of the nuclei negatively implacts the bottom-up mergin, which is particularly noticeable for P-QPBO(1).