Simulation output analysis
Introduction
This notebook analyzes the results of the resource management experimental campaign of the article “Light-weight prediction for improving energy consumption in HPC platforms” (sections 6.4 and 6.5) published at Euro-Par 2024. For full context of this experiment please refer to the article preprint, which is available on [hal long-term open-access link].
Résumé of the experiment. 30 different 1-day workloads have been extracted at random points in time from the Marconi 100 trace. Each workload has been replayed in simulation thanks to the Batsim simulator [thesis link] [long-term software heritage code permalink]. A constraint is set on the power that the whole platform can use during the first 3 hours of the simulation. This is implemented on our EASY backfilling implementation that supports a powercap [gitlab code link] [software heritage long-term code permalink], which uses a prediction of the jobs’ power consumption to take its decisions.
The goal of this notebook is to determine the impact of the job power predictor on the schedules resulting from this scheduling algorithm execution. The notebook takes an aggregation of all the simulation executions as input. The notebook outputs image files that are Figures 4 and 5 of the article, and also provides additional analyses (images + short text analysis) that could not fit in the article page limit.
Power predictor naming difference w.r.t. article
upper_bound
is the predictor namednaive
in the article. It assumes that all the nodes allocated to the job are at full power during the whole job execution. This is an upper bound on the job power consumption that can be used safely from the scheduler point of view.real_mean
is the predictor that uses the real mean power of each job (perfect oracle, unfeasible in practice but shows the best we would get with a perfect predictor)real_max
is the predictor that uses the real maximum power of each job (perfect oracle, unfeasible in practice but shows the best we would get with a perfect predictor)mean
is the history-based (light-weight) mean job power predictor described in section 4.1 of the articlemax
is the history-based (light-weight) maximum job power predictor described in section 4.1 of the articlezero
assumes that all jobs consume 0 W. This is strictly equivalent to EASY backfilling without powercap support, and is used as baseline for scheduling metrics.
Code to read and prepare data
set.seed(1)
suppressMessages(library(tidyverse))
suppressMessages(library(viridis))
library(knitr)
# data extracted from the analysis of the M100 real trace from 2022-01 to 2022-09
nb_nodes = 980
max_observed_total_power = 955080
max_power_per_node = 2100.0
min_power_per_node = 240.0
max_dynamic_power = max_observed_total_power - min_power_per_node * nb_nodes
# data from the simulation campaign definition
constrained_time_window_duration_seconds = 60*60*3 # 3 hours
# read input data, fix types, reorder predictor and split predictors in categories
data = read_csv(params$simulation_aggregated_output, show_col_types = FALSE) %>% mutate(
start_dt_s = as.factor(start_dt_s),
job_power_estimation_field = as.factor(job_power_estimation_field)
)
data$predictor_name = factor(data$predictor_name,
levels=c('upper_bound', 'max', 'real_max', 'real_mean', 'mean', 'zero'))
data = data %>% mutate(
predictor_metrics = ifelse(predictor_name %in% c('real_max', 'max'), 'max',
ifelse(predictor_name %in% c('real_mean', 'mean'), 'mean',
'naive'
)),
predictor_method = ifelse(predictor_name %in% c('mean', 'max'), 'predicted', 'real')
)
data$predictor_metrics = factor(data$predictor_metrics, levels=c('naive', 'max', 'mean'))
data$predictor_method = factor(data$predictor_method, levels=c('predicted', 'real'))
# compute scheduling metrics against their matching EASY baseline
data_nz = data %>% filter(predictor_name != 'zero')
data_z = data %>% filter(predictor_name == 'zero' &
powercap_dynamic_value_ratio == max(data$powercap_dynamic_value_ratio))
data_z_joinable = data_z %>% transmute(
start_dt_s = start_dt_s,
zero_mean_utilization = mean_utilization,
zero_max_utilization = max_utilization,
zero_mean_turnaround_time = mean_turnaround_time,
)
data_nz = inner_join(data_nz, data_z_joinable, by='start_dt_s') %>% mutate(
mean_turnaround_time_minus_zero = mean_turnaround_time - zero_mean_turnaround_time,
) %>% mutate(
mean_turnaround_time_increase_ratio = mean_turnaround_time_minus_zero / zero_mean_turnaround_time
)
Consistency checks
This section inspects the simulation data to make sure the values are consistent with our expectations on the algorithm.
During the constrained time window, is the utilization proportional to the powercap value for each (predictor, workload)? it should
data_nz %>% ggplot(aes(x=powercap_dynamic_value_ratio, y=mean_utilization / nb_nodes, color=predictor_name)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
geom_abline(slope=1) +
theme_bw() +
theme(legend.position='top', legend.title=element_blank()) +
guides(color = guide_legend(nrow = 1)) +
scale_x_continuous(breaks=seq(0.1,0.7,0.2), labels = scales::percent) +
scale_y_continuous(breaks=seq(0,1,0.2), labels = scales::percent) +
scale_color_viridis(discrete=TRUE) +
expand_limits(x=0) +
facet_wrap(vars(start_dt_s)) +
labs(
y="Utilization (proportion of nodes)",
x="Powercap value (proportion of the maximum dynamic power range)"
)
## `geom_smooth()` using formula = 'y ~ x'
Conclusion: Yes, almost perfectly proportional for
all (workload, predictor) before the utilization becomes saturated. On
5/30 workloads the max
predictor slightly jumps from one
linear trend to another. This is consistent with the first-fit policy of
the scheduling algorithm, here we think that EASY becomes able to
execute a job whose power consumption is over estimated by
max
, and that EASY then cannot backfill smaller jobs since
it thinks that there is not enough available power.
During the constrained time window, is the utilization roughly proportional to the powercap value for each predictor regardless of the workload?
data_nz %>% ggplot(aes(x=powercap_dynamic_value_ratio, y=mean_utilization / nb_nodes, color=predictor_name)) +
geom_jitter(width=1/100, height=0) +
geom_smooth(method = "lm", se = FALSE) +
geom_abline(slope=1) +
theme_bw() +
theme(legend.position='top', legend.title=element_blank()) +
guides(color = guide_legend(nrow = 1)) +
scale_x_continuous(breaks=seq(0,0.7,0.1), labels = scales::percent) +
scale_y_continuous(breaks=seq(0,1,0.2), labels = scales::percent) +
scale_color_viridis(discrete=TRUE) +
expand_limits(x=0) +
labs(
y="Utilization (proportion of nodes)",
x="Powercap value (proportion of the maximum dynamic power range). Shown with horizontal jitter."
)
## `geom_smooth()` using formula = 'y ~ x'
Conclusion: Yes this is roughly proportional. For
some workloads the utilization is saturated while using the
real_mean
/mean
predictor for powercap > 55
%.
Per-workload analysis
This section analyzes how the algorithm behaves on each workload. We believe that this is the most important analysis section of this notebook, as scheduling results must be looked at for each workload to make sense.
During the constrained time window, how much power is consumed on average for each (predictor, workload)?
data_nz %>% ggplot(aes(x=powercap_dynamic_value_ratio, y=mean_power/max_dynamic_power, color=predictor_name)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
geom_abline(slope=1) +
theme_bw() +
theme(legend.position='top', legend.title=element_blank()) +
guides(color = guide_legend(nrow = 1)) +
scale_x_continuous(breaks=seq(0.1,0.7,0.2), labels = scales::percent) +
scale_y_continuous(breaks=seq(0,1,0.2), labels = scales::percent) +
scale_color_viridis(discrete=TRUE) +
expand_limits(x=0) +
facet_wrap(vars(start_dt_s)) +
labs(
y="Mean power (proportion of maximum observed M100 power)",
x="Powercap value (proportion of the maximum dynamic power range)"
)
## `geom_smooth()` using formula = 'y ~ x'
Conclusions: The platform dynamic mean power consumption is linear to the powercap value (unless the platform is already saturated and increasing the powercap has small effect), which is expected. Furthermore we can clearly see that:
- This plot has the same shape as the corresponding utilization plot, which is expected. The main difference is that the maximum mean power consumption is around 70 % while the utilization goes up to 100 %.
- The
real_mean
>real_max
>upper_bound
predictor order in terms of mean power consumption holds for all workloads. - The
mean
>max
>upper_bound
predictor order in terms of mean power consumption holds for all workloads. - Using the
mean
history-based predictor instead of the real valuereal_mean
(which cannot be used in practice as it is unknown at decision taking time, but which represents a perfect oracle estimator without error) has almost no impact on the power used during the constrained time window. - Using the
max
history-based predictor instead of the real valuereal_max
(which cannot be used in practice as it is unknown at decision taking time, but which represents a perfect oracle estimator without error) decreases the mean power consumption. The decrease is very small on some workloads (e.g., almost no impact on workload 19389030), but quite strong on other workloads (e.g., on workload 10061708 while using powercap=70%, the mean power consumption moves from ~50 % withreal_max
to ~30 % withmax
).
How is the scheduling performance (as measured by mean turnaround time) impacted by each predictor, for all workloads?
data_nz %>% ggplot(aes(x=powercap_dynamic_value_ratio, y=mean_turnaround_time_minus_zero, color=predictor_name)) +
geom_point() +
#geom_smooth(method = "lm", se = FALSE) +
geom_hline(yintercept=0) +
theme_bw() +
theme(legend.position='top', legend.title=element_blank()) +
guides(color = guide_legend(nrow = 1)) +
scale_x_continuous(breaks=seq(0,0.7,0.2), labels = scales::percent) +
scale_y_continuous() +
scale_color_viridis(discrete=TRUE) +
facet_wrap(vars(start_dt_s)) +
expand_limits(x=0) +
labs(
y="Mean turnaround time difference against EASY without any powercap for each simulation (seconds)",
x="Powercap value (proportion of the maximum dynamic power range)"
)
Problem: The scheduling performance is expected to be degraded on most instances as 1. this algorithm does not directly change the job ordering and 2. the lower the powercap, the lower the utilization. This means the mean turnaround time difference should be positive. Workload 18474670 is clearly an outlier here, as the workload scheduling performance has been greatly improved by the powercap for most instances. We think that in most instances of this workload the powercap prevented big jobs (in number of requested resources and area) to be executed, which enabled a lot of small jobs to be executed, which improved the mean turnaround time metrics.
Here is a look at the data without the outlier workload.
outlier_workload_start_dt_s = 18474670 # sched metrics are strongly better than EASY on it
data_nz %>%
filter(start_dt_s != outlier_workload_start_dt_s) %>%
ggplot(aes(x=powercap_dynamic_value_ratio, y=mean_turnaround_time_minus_zero, color=predictor_name)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
geom_hline(yintercept=0) +
theme_bw() +
theme(legend.position='top', legend.title=element_blank()) +
guides(color = guide_legend(nrow = 1)) +
scale_x_continuous(breaks=seq(0,0.7,0.2), labels = scales::percent) +
scale_y_continuous() +
scale_color_viridis(discrete=TRUE) +
facet_wrap(vars(start_dt_s)) +
expand_limits(x=0) +
labs(
y="Mean turnaround time difference against EASY without any powercap for each simulation (seconds)",
x="Powercap value (proportion of the maximum dynamic power range)"
)
## `geom_smooth()` using formula = 'y ~ x'
Conclusions: The scheduling performance degradation
is clearly linear to the powercap value on some workloads
(e.g., 3079185 and 7934521), has a linear trend but with noise
on most workloads (e.g., 17539280), and is not linear on some
workloads (e.g.,19389030 for predictors that are not
upper_bound
). Additionnally:
- The
mean
>max
>upper_bound
predictor order in terms of scheduling performance holds for most workloads. - The
real_mean
>real_max
>upper_bound
predictor order in terms of scheduling performance holds for most workloads. - The mean turnaround time difference metrics spans on the same range of values for most workloads, which means this metrics can be aggregated over all workloads without a per-workload normalization step.
- Similarly to the mean power consumption during the constrained time
window metrics, using
mean
instead ofreal_mean
seems to have very little impact on the mean turnaround time metrics on most workloads. - Using
max
instead ofreal_max
has a small impact (small performance degradation) on the mean turnaround time metrics on most workloads.
Analysis aggregating all workloads together
While we think that per-workload analysis is the most relevant, it obviously cannot fit in the 1.5-page window dedicated to the analysis in the article, as per-workload view of the data takes a lot of place.
This section aggregates the result seen previously in smaller figures that can fit in the paper, and does additional analysis on the whole dataset.
During the constrained time window, how far is the mean power compared to the powercap value for each predictor?
data_nz %>%
#filter(powercap_dynamic_value_ratio %in% powercap_ratios_values_to_show) %>%
mutate(powercap_label = sprintf("pcap=%g", powercap_dynamic_value_ratio)) %>%
ggplot() +
geom_hline(aes(yintercept=powercap_dynamic_value_ratio)) +
geom_boxplot(aes(y=mean_power/max_dynamic_power, fill=predictor_method, x=predictor_metrics)) +
theme_bw() +
theme(
legend.position=c(0.2, 0.9),
legend.direction='horizontal',
legend.title=element_blank(),
legend.background=element_rect(color='black'),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
) +
expand_limits(x=0) +
scale_y_continuous(breaks=seq(0,0.7,0.1)) +
facet_wrap(vars(powercap_label), nrow=1) +
labs(
y="Mean platform power consumption",
x="Job power estimator"
) +
scale_fill_grey(start=0.8, end=1)
Description. This figure is very similar to the per-workload mean power consumption plot done in the previous section, but the mean power values are aggregated per workload and the predictor naming now uses the predicted metrics (x axis) and whether it is the real value or the predicted one (fill color). Standard ggplot boxplots are used, which show the first (25 %) second (median, 50 %) and third (75 %) quartiles, and outlier values are shown as points if they are further away than 1.5 * the distance between the first and third quartile.
The final version seen in the article (Figure 4) is very similar, but for the sake of font readibility only half of the powercap values are shown.
powercap_ratios_values_to_show = seq(0.1, 0.7, 0.1)
scale=0.9
width_scale=0.3
data_nz %>%
filter(powercap_dynamic_value_ratio %in% powercap_ratios_values_to_show) %>%
mutate(powercap_label = sprintf("pcap=%g", powercap_dynamic_value_ratio)) %>%
ggplot() +
geom_hline(aes(yintercept=powercap_dynamic_value_ratio), linewidth=width_scale) +
geom_boxplot(aes(y=mean_power/max_dynamic_power, fill=predictor_method, x=predictor_metrics), linewidth=width_scale, outlier.size=width_scale) +
theme_bw() +
theme(
legend.position=c(0.2, 0.9),
legend.direction='horizontal',
legend.title=element_blank(),
legend.background=element_rect(color='black'),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
) +
expand_limits(x=0) +
scale_y_continuous(breaks=seq(0,0.7,0.1)) +
facet_wrap(vars(powercap_label), nrow=1) +
labs(
y="Mean platform power consumption",
x="Job power estimator"
) +
scale_fill_grey(start=0.8, end=1)
Here is the code that produces the summarized power underutilization values seen in Section 6.5 of the article. The power difference to the powercap is normalized by the powercap value for each instance, such that the aggregation of the values makes sense (otherwise big powercap values would dominate the aggregation). The average value has been used in the article.
t = data_nz %>%
mutate(power_underutilization_ratio = (powercap_dynamic_watts - mean_power)/powercap_dynamic_watts) %>%
group_by(predictor_name) %>%
summarize(
average_power_underutilization_ratio = mean(power_underutilization_ratio),
median_power_underutilization_ratio = median(power_underutilization_ratio),
)
knitr::kable(t)
predictor_name | average_power_underutilization_ratio | median_power_underutilization_ratio |
---|---|---|
upper_bound | 0.7392531 | 0.7381094 |
max | 0.4356219 | 0.4404844 |
real_max | 0.3457243 | 0.3519978 |
real_mean | 0.0128441 | 0.0018967 |
mean | -0.0254606 | -0.0322494 |
How is the scheduling performance degraded by each predictor?
Very similarly to the previous plot, here is how Figure 5 of the article is produced.
data_nz %>%
filter(start_dt_s != outlier_workload_start_dt_s) %>%
filter(powercap_dynamic_value_ratio %in% powercap_ratios_values_to_show) %>%
mutate(powercap_label = sprintf("pcap=%g", powercap_dynamic_value_ratio)) %>%
ggplot() +
geom_boxplot(aes(y=mean_turnaround_time_minus_zero, fill=predictor_method, x=predictor_metrics), linewidth=width_scale, outlier.size=width_scale) +
theme_bw() +
theme(
legend.position=c(0.16, 0.12),
legend.direction='horizontal',
legend.background=element_rect(color='black'),
legend.title=element_blank(),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
) +
facet_wrap(vars(powercap_label), nrow=1) +
labs(
y="Mean turnaround time increase (s)",
x="Job power estimator"
) +
scale_fill_grey(start=0.8, end=1)
Here is the code that produces the summarized scheduling performance
degradation values seen in Section 6.5 of the article. The scheduling
performance has already been normalized by the performance of EASY
without powercap for each instance. The
average_mtt_increase_ratio
value (average
of normalized mean turnaround time difference) has been used in the
article.
t = data_nz %>%
filter(start_dt_s != outlier_workload_start_dt_s) %>%
group_by(predictor_name) %>%
summarize(
average_mtt_increase = mean(mean_turnaround_time_minus_zero),
average_mtt_increase_ratio = mean(mean_turnaround_time_increase_ratio),
median_mtt_increase = median(mean_turnaround_time_minus_zero),
)
knitr::kable(t)
predictor_name | average_mtt_increase | average_mtt_increase_ratio | median_mtt_increase |
---|---|---|---|
upper_bound | 8691.197 | 0.1540434 | 8855.663 |
max | 6109.214 | 0.1087719 | 6364.189 |
real_max | 5331.574 | 0.0959661 | 5484.921 |
real_mean | 3419.712 | 0.0625572 | 2996.683 |
mean | 3373.779 | 0.0599860 | 3228.024 |
How much energy is consumed during the time window compared to the energy that should be used by being at the powercap value for the whole window duration?
data_nz %>% ggplot() +
geom_hline(yintercept=0) +
geom_violin(aes(x=predictor_name, y=energy_from_powercap / 1e9)) +
geom_jitter(aes(x=predictor_name, y=energy_from_powercap / 1e9), alpha=0.1) +
geom_boxplot(aes(x=predictor_name, y=energy_from_powercap / 1e9), width=0.025, outlier.shape=NA) +
theme_bw() +
labs(
x="Power predictor",
y="Distribution of the energy consumed (GJ)"
)
Conclusions: Energy values are consistent with the
previous power plots. We can see that only the mean
and
real_mean
used more energy than what the powercap enables
on the analyzed workloads. We can see that mean
frequently
leads to more energy being used than what the powercap enables.
How is the powercap exceeded during the time window for each predictor?
Whether the powercap has been exceeded or not has been computed for each second of each simulation.
data_nz %>% ggplot() +
geom_hline(yintercept=0) +
geom_violin(aes(x=predictor_name, y=nb_seconds_above_powercap/constrained_time_window_duration_seconds)) +
geom_jitter(aes(x=predictor_name, y=nb_seconds_above_powercap/constrained_time_window_duration_seconds), alpha=0.1) +
geom_boxplot(aes(x=predictor_name, y=nb_seconds_above_powercap/constrained_time_window_duration_seconds), width=0.025, outlier.shape=NA) +
theme_bw() +
labs(
x="Power predictor",
y="Proportion of time above powercap"
) +
scale_y_continuous(labels = scales::percent)
Conclusions: Only real_mean
and
mean
exceed the powercap on the analyzed workloads. They
both exceed the powercap frequently, but mean
breaks the
powercap more frequently than real_mean
.
Here is the code that produces the summarized maximum instantaneous
powercap break values seen in Section 6.5 of the article. The maximum
instantaneous powercap break (in watts) is used (named
max_power_from_powercap
in the code). For each simulation,
the powercap break (current power minus powercap) has been computed for
each second during the time window, and
max_power_from_powercap
is the maximum of all these values.
The value is normalized by the powercap so that the aggregation makes
sense (otherwise the difference would be distorted and big powercap
values would dominate the result). The average and
median values have been used in the article.
t = data_nz %>%
mutate(powercap_break = pmax(max_power_from_powercap, 0)) %>%
mutate(powercap_break_ratio = powercap_break / powercap_dynamic_watts) %>%
group_by(predictor_name) %>%
summarize(
mean_powercap_break_ratio = mean(powercap_break_ratio),
median_powercap_break_ratio = median(powercap_break_ratio),
)
knitr::kable(t)
predictor_name | mean_powercap_break_ratio | median_powercap_break_ratio |
---|---|---|
upper_bound | 0.0000000 | 0.0000000 |
max | 0.0000000 | 0.0000000 |
real_max | 0.0000000 | 0.0000000 |
real_mean | 0.1022596 | 0.0815692 |
mean | 0.1472294 | 0.1419211 |
Similarly, here is the code that computes in how many cases the powercap is exceeded by all predictors.
nb_simus = data_nz %>%
group_by(predictor_name) %>%
summarize(
total_count = n()
)
breaks = data_nz %>%
mutate(powercap_break = pmax(max_power_from_powercap, 0)) %>%
filter(powercap_break > 0) %>%
group_by(predictor_name) %>%
summarize(
break_count = n()
)
t = inner_join(nb_simus, breaks, by="predictor_name") %>%
mutate(break_ratio = break_count / total_count)
knitr::kable(t)
predictor_name | total_count | break_count | break_ratio |
---|---|---|---|
real_mean | 390 | 372 | 0.9538462 |
mean | 390 | 368 | 0.9435897 |
Erratum note: The first submitted version of the
article states that mean
breaks the powercap in 38 % of
instances. The computation was wrong, mean
breaks the
powercap in 95 % of instances, and real_mean
breaks the
powercap 94 % of instances.