# Analysis of load balancing results

Load balancing results from experiments with LeanMD over 960 cores (20 compute nodes) on the [Joliot-Curie (SKL Irene)](http://www-hpc.cea.fr/en/complexe/tgcc-JoliotCurie.htm) supercomputer.

The raw result files are organized in four directories (oct18, oct23, oct24, and oct24_2).
Each directory contains the results of one batch execution in the supercomputer.
Each batch is composed of 10 repetitions of a set of experiments.
Each set of experiments includes different load balancing algorithms and different problem sizes.
Each set is randomly ordered to avoid interference coming from a specific order of execution.
Each raw file contains the appended output of the application and its load balancer for all 10 repetitions.
The name of the files indicate the load balancer and size of the problem.
For instance, `PackStealLB.240` means that the application was run with PackStealLB and the problem size parameter is 240.

Before running this notebook, make sure this file sits on the same path of the directory `supercomputer_results`.


## 1. Generation of CSV files

The code below calls three different scripts that generate CSV files containing total execution time, load balancing time, and number of task migrations based on the raw files. The total execution times and load balancing times are analyzed in detail, while the number of task migrations is only used to provide an idea of the behavior of the algorithms and is not detailed here.

In [None]:
# Total execution times
!python3 supercomputer_results/parse_results.py

In [None]:
# Load balancing times
!python3 supercomputer_results/parse_lb_times.py

In [None]:
# Number of migrations
!python3 supercomputer_results/parse_migs.py

## 2. Result analysis

First step: import the required packages for analysis.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from scipy import stats

### 2.1 Analysis of total execution times

#### 2.1.1 Read files and organize dataframes

In [None]:
raw_data = pd.read_csv('results.csv',sep=',')
raw_data     # prints the whole dataframe

In [None]:
# Sizes and load balancers for different categories of results (smaller and larger)
smaller_sizes = ['80', '120', '160']
parsing_lbs = ['DummyLB', 'PackDropLB', 'PackStealLB', 'DistributedLB', 'GreedyLB', 'RefineLB']
larger_sizes = ['240', '320']
extra_lbs = ['DummyLB', 'PackDropLB', 'PackStealLB']

parsed_data = pd.DataFrame() # empty dataframe

# Reorganizes raw data into parsed data
for size in smaller_sizes:
  for lb in parsing_lbs:
    scenario = lb + '.' + size # single scenario name
    parsing = pd.DataFrame(raw_data[[scenario]])
    parsing = parsing.rename(columns={scenario: 'Time'})
    parsing['Load Balancer'] = lb
    parsing['Problem size'] = size
    parsed_data = pd.concat([parsed_data, parsing])
for size in larger_sizes:
  for lb in extra_lbs:
    scenario = lb + '.' + size # single scenario name
    parsing = pd.DataFrame(raw_data[[scenario]])
    parsing = parsing.rename(columns={scenario: 'Time'})
    parsing['Load Balancer'] = lb
    parsing['Problem size'] = size
    parsed_data = pd.concat([parsed_data, parsing])

parsed_data

In [None]:
# Checks if the size of the dataframe matches what was expected
parsed_data.size == (20 * (6 * 3 + 3 * 2) * 3)

In [None]:
# Fixing LB names for later plots
sizes = smaller_sizes + larger_sizes
lbs = ['Baseline', 'GreedyLB', 'RefineLB', 'DistributedLB', 'PackDropLB', 'PackStealLB']
parsed_data.loc[parsed_data['Load Balancer'] == 'DummyLB', 'Load Balancer'] = 'Baseline'

In [None]:
# Averages for different problem sizes and load balancers
averages = parsed_data['Time'].groupby([parsed_data['Problem size'], parsed_data['Load Balancer']]).mean().unstack(level=0)
averages

In [None]:
# Computes speedups over the baseline

speedups = pd.DataFrame()

for size in sizes:
  speedups[size] = averages[size]['Baseline'] / averages[size]

# Reorganizes the speedups to a different dataframe

final = dict()
for lb in lbs:
  final[lb] = {}
  for size in sizes:
    final[lb][int(size)] = speedups[size][lb]
final

sps = pd.DataFrame(final)
sps

#### 2.1.2 Statistical Analysis

##### Distribution of origin

In [None]:
np.random.seed(2019)
for size in smaller_sizes:
  for lb in lbs:
    results = list(parsed_data[(parsed_data['Load Balancer'] == lb) & (parsed_data['Problem size'] == size)].Time)
    print('Results for size = ' + size + ' and LB = '+lb)
    print(stats.kstest(results, 'norm', args=(np.mean(results), np.std(results))))
for size in larger_sizes:
  for lb in ['Baseline', 'PackDropLB', 'PackStealLB']:
    results = list(parsed_data[(parsed_data['Load Balancer'] == lb) & (parsed_data['Problem size'] == size)].Time)
    print('Results for size = ' + size + ' and LB = '+lb)
    print(stats.kstest(results, 'norm', args=(np.mean(results), np.std(results))))

We cannot reject the null hypothesis that any of the results come from a normal distribution with 5% confidence (i.e., no result had a p-value under 0.05). 

##### Comparisons between Pack LBs and the baseline (No LB)

In [None]:
for lb in ['PackDropLB', 'PackStealLB']:
  for size in sizes:
    pack = list(parsed_data[(parsed_data['Load Balancer'] == lb) & (parsed_data['Problem size'] == size)].Time)
    baseline = list(parsed_data[(parsed_data['Load Balancer'] == 'Baseline') & (parsed_data['Problem size'] == size)].Time)
    # Using the t-test for comparisons
    # (actually it is Welch’s t-tests since we are not assuming the same variance between samples)
    print('T-Test for comparison between the baseline and ' + lb + ' for size ' + size)
    print(stats.ttest_ind(baseline, pack, equal_var=False))

Given that all p-values < 0.05 (our 5% confidence threshold), we reject the null hypothesis that they come from the same distribution. In other words, the times for PackSteal and PackDrop are different from the times of the Baseline.

##### Comparisons between PackDrop and PackSteal

In [None]:
for size in sizes:
  drop = list(parsed_data[(parsed_data['Load Balancer'] == 'PackDropLB') & (parsed_data['Problem size'] == size)].Time)
  steal = list(parsed_data[(parsed_data['Load Balancer'] == 'PackStealLB') & (parsed_data['Problem size'] == size)].Time)
  # Using the t-test for comparisons
  # (actually it is Welch’s t-tests since we are not assuming the same variance between samples)
  print('T-Test for comparison between PackDrop and PackSteal for size = ' + size)
  print(stats.ttest_ind(drop, steal, equal_var=False))

In this situation, we can reject the null hypothesis that their times come from the same distribution for sizes 80, 240, and 320 (where PackStealLB performs better than PackDropLB); 120 (where PackDropLB performs better than PackStealLB); but we cannot reject it for size 160 (p-value = 0.165 > 0.05).

##### Other comparisons

We will compare the performance of some last specific cases because they look similar. They are:

1. PackDrop and Distributed for size 80.
2. Baseline and Distributed for size 120.
3. Baseline and Refine for size 120.
4. Baseline and Refine for size 160.

In [None]:
drop = list(parsed_data[(parsed_data['Load Balancer'] == 'PackDropLB') & (parsed_data['Problem size'] == '80')].Time)
dist = list(parsed_data[(parsed_data['Load Balancer'] == 'DistributedLB') & (parsed_data['Problem size'] == '80')].Time)
print('T-Test for comparison between PackDrop and Distributed for size = 80')
print(stats.ttest_ind(drop, dist, equal_var=False))

base = list(parsed_data[(parsed_data['Load Balancer'] == 'Baseline') & (parsed_data['Problem size'] == '120')].Time)
dist = list(parsed_data[(parsed_data['Load Balancer'] == 'DistributedLB') & (parsed_data['Problem size'] == '120')].Time)
print('T-Test for comparison between Baseline and Distributed for size = 120')
print(stats.ttest_ind(base, dist, equal_var=False))

base = list(parsed_data[(parsed_data['Load Balancer'] == 'Baseline') & (parsed_data['Problem size'] == '120')].Time)
refine = list(parsed_data[(parsed_data['Load Balancer'] == 'RefineLB') & (parsed_data['Problem size'] == '120')].Time)
print('T-Test for comparison between Baseline and Refine for size = 120')
print(stats.ttest_ind(base, refine, equal_var=False))

base = list(parsed_data[(parsed_data['Load Balancer'] == 'Baseline') & (parsed_data['Problem size'] == '160')].Time)
refine = list(parsed_data[(parsed_data['Load Balancer'] == 'RefineLB') & (parsed_data['Problem size'] == '160')].Time)
print('T-Test for comparison between Baseline and Refine for size = 160')
print(stats.ttest_ind(base, refine, equal_var=False))

We can reject the null hypothesis in the comparison of PackDrop and Distributed (as PackDrop performs better than Distributed) but not for all the other tests.

#### 2.1.3 Visualization

##### Total execution times

In [None]:
plt.rcParams['axes.titlesize'] = 18
plt.rcParams['axes.labelsize'] = 18
plt.rcParams['xtick.labelsize'] = 18
plt.rcParams['ytick.labelsize'] = 18
plt.rcParams['legend.fontsize'] = 18

base_palette = sns.color_palette('colorblind', n_colors=10, desat=0.8)
new_palette = [base_palette[4],
               base_palette[7],
               base_palette[3],
               base_palette[2],
               base_palette[1],
               base_palette[0]]

fig, ax = plt.subplots(figsize=(8,10))
ax.yaxis.grid(True)

sns.boxplot(y='Time', x='Load Balancer', 
            data=parsed_data[(parsed_data['Problem size'] == '80')], 
            width=0.5,
            palette=new_palette,
            order=lbs,
            saturation=1)

plt.xlabel('Load Balancing Algorithms', fontsize=18)
plt.xticks(rotation=15)
plt.ylabel('Total execution time (s)', fontsize=18)
plt.ylim(26, 42)

plt.savefig("plot-80.pdf", bbox_inches='tight')

In [None]:
plt.rcParams['axes.titlesize'] = 18
plt.rcParams['axes.labelsize'] = 18
plt.rcParams['xtick.labelsize'] = 18
plt.rcParams['ytick.labelsize'] = 18
plt.rcParams['legend.fontsize'] = 18

fig, ax = plt.subplots(figsize=(8,10))
ax.yaxis.grid(True)

sns.boxplot(y='Time', x='Load Balancer', 
            data=parsed_data[(parsed_data['Problem size'] == '120')], 
            width=0.5,
            palette=new_palette,
            order=lbs,
            saturation=1)

plt.xlabel('Load Balancing Algorithms', fontsize=18)
plt.xticks(rotation=15)
plt.ylabel('Total execution time (s)', fontsize=18)
plt.ylim(45, 90)

plt.savefig("plot-120.pdf", bbox_inches='tight')

In [None]:
plt.rcParams['axes.titlesize'] = 18
plt.rcParams['axes.labelsize'] = 18
plt.rcParams['xtick.labelsize'] = 18
plt.rcParams['ytick.labelsize'] = 18
plt.rcParams['legend.fontsize'] = 18

fig, ax = plt.subplots(figsize=(8,10))
ax.yaxis.grid(True)

sns.boxplot(y='Time', x='Load Balancer', 
            data=parsed_data[(parsed_data['Problem size'] == '160')], 
            width=0.5,
            palette=new_palette,
            order=lbs,
            saturation=1)

plt.xlabel('Load Balancing Algorithms', fontsize=18)
plt.xticks(rotation=15)
plt.ylabel('Total execution time (s)', fontsize=18)
plt.ylim(70, 150)

plt.savefig("plot-160.pdf", bbox_inches='tight')

In [None]:
plt.rcParams['axes.titlesize'] = 18
plt.rcParams['axes.labelsize'] = 18
plt.rcParams['xtick.labelsize'] = 18
plt.rcParams['ytick.labelsize'] = 18
plt.rcParams['legend.fontsize'] = 18

shortened_palette = [new_palette[0], new_palette[4], new_palette[5]]
shortened_lbs = ['Baseline', 'PackDropLB', 'PackStealLB']

fig, ax = plt.subplots(figsize=(4,10))
ax.yaxis.grid(True)

sns.boxplot(y='Time', x='Load Balancer', 
            data=parsed_data[(parsed_data['Problem size'] == '240')], 
            width=0.5,
            palette=shortened_palette,
            order=shortened_lbs,
            saturation=1)

plt.xlabel('Load Balancing Algorithms', fontsize=18)
plt.xticks(rotation=15)
plt.ylabel('Total execution time (s)', fontsize=18)
plt.ylim(120, 240)

plt.savefig("plot-240.pdf", bbox_inches='tight')

In [None]:
plt.rcParams['axes.titlesize'] = 18
plt.rcParams['axes.labelsize'] = 18
plt.rcParams['xtick.labelsize'] = 18
plt.rcParams['ytick.labelsize'] = 18
plt.rcParams['legend.fontsize'] = 18

fig, ax = plt.subplots(figsize=(4,10))
ax.yaxis.grid(True)

sns.boxplot(y='Time', x='Load Balancer', 
            data=parsed_data[(parsed_data['Problem size'] == '320')], 
            width=0.5,
            palette=shortened_palette,
            order=shortened_lbs,
            saturation=1)

plt.xlabel('Load Balancing Algorithms', fontsize=18)
plt.xticks(rotation=15)
plt.ylabel('Total execution time (s)', fontsize=18)
plt.ylim(220, 400)

plt.savefig("plot-320.pdf", bbox_inches='tight')

##### Speedups

In [None]:
plt.rcParams['axes.titlesize'] = 18
plt.rcParams['axes.labelsize'] = 18
plt.rcParams['xtick.labelsize'] = 16
plt.rcParams['ytick.labelsize'] = 16
plt.rcParams['legend.fontsize'] = 16

fig, ax = plt.subplots(figsize=(10,5))
ax.yaxis.grid(True)

sns.lineplot(data=sps, palette=new_palette, markers = True, linewidth=2)

plt.xlabel('Problem size (X parameter)', fontsize=18)
plt.ylabel('Speedup over Baseline', fontsize=18)
plt.xticks(ticks=[80,120,160,240,320])
plt.legend(loc='lower right', ncol=2)
plt.ylim(0.7, 1.5)

plt.savefig("plot-speedups.pdf", bbox_inches='tight')

### 2.2 Analysis of load balancing times

#### 2.2.1 Read files and organize dataframes

In [None]:
# Naming to use depending on the load balancing call
call_name = ['First', 'Second', 'Third']

# Iterates over the three LB time files and puts them all into one dataframe
parsed_call_data = pd.DataFrame() # empty dataframe

for i in range(3):
  raw_call_data = pd.read_csv('lb_times_step_' + str(i) + '.csv', sep=',')
  for size in smaller_sizes:
    for lb in parsing_lbs:
      scenario = lb + '.' + size # single scenario name
      parsing = pd.DataFrame(raw_call_data[[scenario]])
      parsing = parsing.rename(columns={scenario: 'Time'})
      parsing['Load Balancer'] = lb
      parsing['Problem size'] = size
      parsing['Step'] = call_name[i]
      parsed_call_data = pd.concat([parsed_call_data, parsing])
  for size in larger_sizes:
    for lb in extra_lbs:
      scenario = lb + '.' + size # single scenario name
      parsing = pd.DataFrame(raw_call_data[[scenario]])
      parsing = parsing.rename(columns={scenario: 'Time'})
      parsing['Load Balancer'] = lb
      parsing['Problem size'] = size
      parsing['Step'] = call_name[i]
      parsed_call_data = pd.concat([parsed_call_data, parsing])

parsed_call_data

In [None]:
# Fixes DummyLB's name
parsed_call_data.loc[parsed_call_data['Load Balancer'] == 'DummyLB', 'Load Balancer'] = 'Baseline'

# Averages for different problem sizes and load balancers
call_averages = parsed_call_data['Time'].groupby([parsed_call_data['Problem size'], parsed_call_data['Load Balancer']]).mean().unstack(level=0)
call_averages

In [None]:
# Averages for different problems sizes, load balancers, and call moments (first, second, or third step)

call_averages_per_step = parsed_call_data['Time'].groupby([parsed_call_data['Problem size'], parsed_call_data['Load Balancer'], parsed_call_data['Step']]).mean().unstack(level=0)
call_averages_per_step

#### 2.2.2 Statistical Analysis

##### Distributions of origin

In [None]:
np.random.seed(2019)
for size in smaller_sizes:
  print('Analysis for problem size: ' + size)
  for lb in lbs:
    for call in call_name:
      results = list(parsed_call_data[(parsed_call_data['Load Balancer'] == lb) 
                                      & (parsed_call_data['Problem size'] == size)
                                      & (parsed_call_data['Step'] == call)].Time)
      print('Results for size = ' + size + ', LB = ' + lb + ' and call = ' + call)
      print(stats.kstest(results, 'norm', args=(np.mean(results), np.std(results))))

  print(' ')
for size in larger_sizes:
  print('Analysis for problem size: ' + size)
  for lb in ['Baseline', 'PackDropLB', 'PackStealLB']:
    for call in call_name:
      results = list(parsed_call_data[(parsed_call_data['Load Balancer'] == lb) 
                                      & (parsed_call_data['Problem size'] == size)
                                      & (parsed_call_data['Step'] == call)].Time)
      print('Results for size = ' + size + ', LB = ' + lb + ' and call = ' + call)
      print(stats.kstest(results, 'norm', args=(np.mean(results), np.std(results))))
  print(' ')

Several results have p-values under 0.05, meaning that, for them, we can reject the null hypothesis that they come from a normal distribution.

Examples of outliers (not normal distributions):

- Z = 80: Baseline (second call), DistributedLB (second call), PackDropLB (third call)
- Z = 120: DistributedLB (first and second calls), PackDropLB (second and third calls), PackStealLB (first and second calls)
- Z = 160: DistributedLB (third call),  PackDropLB (first and second calls), PackStealLB (second and third calls)
- Z = 240: PackDropLB (second call), PackSteal (first and second calls)
- Z = 320: PackDropLB (first and second calls), PackSteal (second call)

##### Comparisons between PackDrop and PackSteal

In [None]:
for size in sizes:
  for call in call_name:
    drop = list(parsed_call_data[(parsed_call_data['Load Balancer'] == 'PackDropLB') 
                                  & (parsed_call_data['Problem size'] == size)
                                  & (parsed_call_data['Step'] == call)].Time)
    steal = list(parsed_call_data[(parsed_call_data['Load Balancer'] == 'PackStealLB') 
                                   & (parsed_call_data['Problem size'] == size)
                                   & (parsed_call_data['Step'] == call)].Time)

    print('Mann-Whitney U test for comparison between PackDrop and PackSteal for size = ' + size)
    print(stats.mannwhitneyu(drop, steal, alternative='two-sided'))

For most cases, we can say that the samples for PackDropLB and PackStealLB come from different distributions (p-value < 0.05), but there are some cases for size = 320 that we cannot reject the hypothesis that they come from the same distribution.

##### Comparison between steps

In [None]:
for size in smaller_sizes:
  print('Analysis for problem size: ' + size)
  for lb in lbs:
    first_calls = list(parsed_call_data[(parsed_call_data['Load Balancer'] == lb) 
                                      & (parsed_call_data['Problem size'] == size)
                                      & (parsed_call_data['Step'] == 'First')].Time) 
    second_calls = list(parsed_call_data[(parsed_call_data['Load Balancer'] == lb) 
                                      & (parsed_call_data['Problem size'] == size)
                                      & (parsed_call_data['Step'] == 'Second')].Time)   
    third_calls = list(parsed_call_data[(parsed_call_data['Load Balancer'] == lb) 
                                      & (parsed_call_data['Problem size'] == size)
                                      & (parsed_call_data['Step'] == 'Third')].Time)      
    print('Results for LB = ' + lb)
    print(stats.wilcoxon(first_calls, second_calls))
    print(stats.wilcoxon(second_calls, third_calls))
  print(' ')

for size in larger_sizes:
  print('Analysis for problem size: ' + size)
  for lb in ['Baseline', 'PackDropLB', 'PackStealLB']:
    first_calls = list(parsed_call_data[(parsed_call_data['Load Balancer'] == lb) 
                                      & (parsed_call_data['Problem size'] == size)
                                      & (parsed_call_data['Step'] == 'First')].Time) 
    second_calls = list(parsed_call_data[(parsed_call_data['Load Balancer'] == lb) 
                                      & (parsed_call_data['Problem size'] == size)
                                      & (parsed_call_data['Step'] == 'Second')].Time)   
    third_calls = list(parsed_call_data[(parsed_call_data['Load Balancer'] == lb) 
                                      & (parsed_call_data['Problem size'] == size)
                                      & (parsed_call_data['Step'] == 'Third')].Time)      
    print('Results for LB = ' + lb)
    print(stats.wilcoxon(first_calls, second_calls))
    print(stats.wilcoxon(second_calls, third_calls))
  print(' ')

For many of the cases (mostly the small cases), we reject the null hypothesis that the samples come from the same distribution (p-values < 0.05), meaning that the time of the first and second, and second and third load balancing calls do not have the same median (come from different populations).

This is less the case for larger problem sizes and for comparisons between the second and the third load balancing call (p-values > 0.05). This seems to indicate that after the first load balancing call, the load balancers are showing a more stable behavior.

##### Total execution time - load balancing times

In [None]:
print(averages - 3*call_averages)

In [None]:
call_medians = parsed_call_data['Time'].groupby([parsed_call_data['Problem size'], parsed_call_data['Load Balancer']]).median().unstack(level=0)
medians = parsed_data['Time'].groupby([parsed_data['Problem size'], parsed_data['Load Balancer']]).median().unstack(level=0)

print(medians - 3*call_medians)

In [None]:
# Checking the execution times, each with its LB calls subtracted from it, to see if they follow normal distributions
for size in smaller_sizes:
  print('Analysis for problem size: ' + size)
  for lb in lbs:
    original_times = list(parsed_data[(parsed_data['Load Balancer'] == lb) 
                                      & (parsed_data['Problem size'] == size)].Time)
    first_calls = list(parsed_call_data[(parsed_call_data['Load Balancer'] == lb) 
                                      & (parsed_call_data['Problem size'] == size)
                                      & (parsed_call_data['Step'] == 'First')].Time) 
    second_calls = list(parsed_call_data[(parsed_call_data['Load Balancer'] == lb) 
                                      & (parsed_call_data['Problem size'] == size)
                                      & (parsed_call_data['Step'] == 'Second')].Time)   
    third_calls = list(parsed_call_data[(parsed_call_data['Load Balancer'] == lb) 
                                      & (parsed_call_data['Problem size'] == size)
                                      & (parsed_call_data['Step'] == 'Third')].Time)      
    print('Results for LB = ' + lb)
    subtracted_times = [original_times[i] - (first_calls[i] + second_calls[i] + third_calls[i]) for i in range(len(original_times))]
    print(stats.kstest(subtracted_times, 'norm', args=(np.mean(subtracted_times), np.std(subtracted_times))))
  print(' ')

for size in larger_sizes:
  print('Analysis for problem size: ' + size)
  for lb in ['Baseline', 'PackDropLB', 'PackStealLB']:
    original_times = list(parsed_data[(parsed_data['Load Balancer'] == lb) 
                                      & (parsed_data['Problem size'] == size)].Time)
    first_calls = list(parsed_call_data[(parsed_call_data['Load Balancer'] == lb) 
                                      & (parsed_call_data['Problem size'] == size)
                                      & (parsed_call_data['Step'] == 'First')].Time) 
    second_calls = list(parsed_call_data[(parsed_call_data['Load Balancer'] == lb) 
                                      & (parsed_call_data['Problem size'] == size)
                                      & (parsed_call_data['Step'] == 'Second')].Time)   
    third_calls = list(parsed_call_data[(parsed_call_data['Load Balancer'] == lb) 
                                      & (parsed_call_data['Problem size'] == size)
                                      & (parsed_call_data['Step'] == 'Third')].Time)      
    print('Results for LB = ' + lb)
    subtracted_times = [original_times[i] - (first_calls[i] + second_calls[i] + third_calls[i]) for i in range(len(original_times))]
    print(stats.kstest(subtracted_times, 'norm', args=(np.mean(subtracted_times), np.std(subtracted_times))))
  print(' ')

As no p-values < 0.05 were seen, we cannot reject the null hypothesis that these come from a normal distribution.

#### 2.2.3 Visualization

In [None]:
plt.rcParams['axes.titlesize'] = 18
plt.rcParams['axes.labelsize'] = 18
plt.rcParams['xtick.labelsize'] = 18
plt.rcParams['ytick.labelsize'] = 18

first_m_calls = parsed_call_data[parsed_call_data['Step'] == 'First']

first_m_call_medians = first_m_calls['Time'].groupby(
    [first_m_calls['Problem size'], 
     first_m_calls['Load Balancer']]).median().unstack(level=0)
first_m = first_m_call_medians.reindex(['Baseline','GreedyLB','RefineLB','DistributedLB','PackDropLB','PackStealLB'])
first_m = first_m[sizes]

ax = first_m.T.plot(kind = 'bar', color=new_palette, figsize=(12,6), width=0.8)
ax.yaxis.grid(True)

ax.get_legend().remove()
plt.yscale('log')
plt.xlabel('Problem size (X parameter)', fontsize=18)
plt.xticks(rotation=0)
plt.ylabel('Median LB invocation time (s)', fontsize=18)
plt.ylim(0.001, 10)

plt.savefig("lbs-1st-median.pdf", bbox_inches='tight')

In [None]:
plt.rcParams['axes.titlesize'] = 18
plt.rcParams['axes.labelsize'] = 18
plt.rcParams['xtick.labelsize'] = 18
plt.rcParams['ytick.labelsize'] = 18

second_m_calls = parsed_call_data[parsed_call_data['Step'] == 'Second']

second_call_medians = second_m_calls['Time'].groupby(
    [second_m_calls['Problem size'], 
     second_m_calls['Load Balancer']]).median().unstack(level=0)
second_m = second_call_medians.reindex(['Baseline','GreedyLB','RefineLB','DistributedLB','PackDropLB','PackStealLB'])
second_m = second_m[sizes]

ax = second_m.T.plot(kind = 'bar', color=new_palette, figsize=(12,6), width=0.8)
ax.yaxis.grid(True)

ax.get_legend().remove()
plt.yscale('log')
plt.xlabel('Problem size (X parameter)', fontsize=18)
plt.xticks(rotation=0)
plt.ylabel('Median LB invocation time (s)', fontsize=18)
plt.ylim(0.001, 10)

plt.savefig("lbs-2nd-median.pdf", bbox_inches='tight')

In [None]:
plt.rcParams['axes.titlesize'] = 18
plt.rcParams['axes.labelsize'] = 18
plt.rcParams['xtick.labelsize'] = 18
plt.rcParams['ytick.labelsize'] = 18

third_m_calls = parsed_call_data[parsed_call_data['Step'] == 'Third']

third_call_medians = third_m_calls['Time'].groupby(
    [third_m_calls['Problem size'], 
     third_m_calls['Load Balancer']]).median().unstack(level=0)
third_m = third_call_medians.reindex(['Baseline','GreedyLB','RefineLB','DistributedLB','PackDropLB','PackStealLB'])
third_m = third_m[sizes]

ax = third_m.T.plot(kind = 'bar', color=new_palette, figsize=(12,6), width=0.8)
ax.yaxis.grid(True)

plt.legend(loc='lower right', bbox_to_anchor=[1.3,0.2], title='Load Balancer', title_fontsize=16, fontsize=16)
plt.yscale('log')
plt.xlabel('Problem size (X parameter)', fontsize=18)
plt.xticks(rotation=0)
plt.ylabel('Median LB invocation time (s)', fontsize=18)
plt.ylim(0.001, 10)

plt.savefig("lbs-3rd-median.pdf", bbox_inches='tight')