This is a simplified version of the benchmarking and analysis campaign that can be executed on a desktop computer quickly. For the full report which was used to generate figures in the paper, see viewmodel-data-analysis-results.html.

This report contains simplified data analysis for the paper “Incremental View Model Synchronization Using Partial Models” using a subset of the experiment configurations. It was created to facilitate experimenting with the benchmarking environment without running the full measurement campaign.

Running the benchmark

The measurements can be ran in a fairly short time (< 15 minutes) on a desktop computer.

After extracting the benchmarking environment (hu.bme.mit.inf.viewmodel.benchmarks.product-linux.gtk.x86_64, hu.bme.mit.inf.viewmodel.benchmarks.product-macosx.cocoa.x86_64 or hu.bme.mit.inf.viewmodel.benchmarks.product-win32.win32.x86_64) for the appropriate operating system, the benchmark configuration short.json can be run as follows:

./eclipse -benchmarks short.json -vmargs -Xmx8g

The results are placed into ./results/short/benchmarks.log, which should be copied into the same folder as this .Rmd document under the name short_log.csv before knitting.

The largest source models for both the Dependability and VirtualSwitch case studies are omitted, and only a single modification mix (Usual) is executed. Moreover, warm-up iterations are skipped and only a single iteration is performed for each experiment, which may increase noise substantially.

Loading the data

The rest of the analysis proceeds as a literate R script. First we load the tidyverse packages for data wrangling and plotting.

require(tidyverse)

The file short_log.csv is the concatenation of the log files produced by the benchmark configuration short.json.

log_path <- './short_log.csv'
full_log <- read_csv(log_path, col_types = cols(
  model = col_character(),
  transformationCase = col_character(),
  experiment = col_character(),
  modificationMix = col_character(),
  rerun = col_integer(),
  variable = col_character(),
  value = col_double()
))

We only measured using Train Benchmark models, so we can replace the model name with the scale factor in the logs, while still preserving all information.

trainbenchmark_log <- full_log %>%
  mutate(modelSize = as.integer(gsub('railway-batch-', '', model))) %>%
  select(-model) %>%
  separate(variable, c('checkpoint', 'category', 'variable'))

Basic statistics

We define some helper function for converting string identifiers to factor variables later.

TransformationCaseFactor <- function (v) {
  factor(as.factor(v),
         levels = c('dependability', 'virtualSwitch'),
         labels = c('Dependability', 'VirtualSwitch'))
}

ModificationMixFactor <- function (v) {
  factor(as.factor(v),
         levels = c('modelQuery', 'execute', 'usual', 'petriNetSlow', 'virtualSwitchSlow',
                    'bothSlow', 'bothFast', 'createSwitch', 'createSegment',
                    'connectTrackElements', 'disconnectTrackElements', 'createRoute', 'removeRoute',
                    'addSwitchToRoute', 'removeSwitchFromRoute',
                    'setSwitchFailed', 'setSwitchOperational'),
         labels = c('Initial query', 'Initial transformation', '(A) Usual mix', '(B) Depend. stress mix',
                    '(C) VirtSw. stress mix', '(D) mix', '(E) mix', 'Create switch', 'Create segment',
                    'Connect tract elements', 'Disconnect track elements', 'Create route', 'Remove route',
                    'Add switch to route', 'Remove switch from route',
                    'Set switch failed', 'Set switch operational'))
}

ExperimentFactor <- function (v) {
  factor(as.factor(v),
         levels = c('viewModel-physical', 'viatra-priorities', 'viatra'),
         labels = c('Our approach', 'Source-reactive VIATRA', 'Trace-reactive VIATRA'))
}

Source model statistics

We create a data frame containing size of the source models (object, reference and attribute counts). Each run prints the model sizes. However, we only use the output from the first batch run, because all source models with the same scale factor are identical.

source_model_statistics <- trainbenchmark_log %>%
  filter(experiment == 'viewModel-batch-physical' &
           modificationMix == 'none' &
           rerun == 0 &
           checkpoint == 'batch' &
           category == 'source') %>%
  group_by(modelSize, variable) %>%
  summarize(value = first(value)) %>%
  mutate(variable = paste0('source_', variable)) %>%
  spread(variable, value)

stat_df <- data.frame(source_model_statistics$modelSize, source_model_statistics$source_count, source_model_statistics$source_referenceCount, source_model_statistics$source_attributeCount)

colnames(stat_df) <- c("Scale factor", "Source objects", "Source references", "Source attributes")
knitr::kable(stat_df, format='markdown') %>% cat(sep = '\n')
Scale factor Source objects Source references Source attributes
1 1014 3955 1865
2 2039 7958 3752
4 4565 17842 8403
8 12213 47810 22495
16 25259 98926 46524
32 49799 194960 91735
64 101697 398278 187327
128 207953 814472 383068

Target model statistics

Batch transformations

We perform the same analysis for the target models of the batch transformations.

target_model_statistics <- trainbenchmark_log %>%
  filter(experiment == 'viewModel-batch-physical' &
           modificationMix == 'none' &
           rerun == 0 &
           checkpoint == 'batch' &
           category == 'target') %>%
  group_by(modelSize, transformationCase, variable) %>%
  summarize(value = first(value)) %>%
  mutate(variable = paste0('target_', variable)) %>%
  spread(variable, value)

tstat_cases <- TransformationCaseFactor(target_model_statistics$transformationCase)
tstat_df <- data.frame(target_model_statistics$modelSize, tstat_cases, target_model_statistics$target_count, target_model_statistics$target_referenceCount, target_model_statistics$target_attributeCount)

colnames(tstat_df) <- c("Scale factor", "Case study", "Target objects", "Target references", "Target attributes")
knitr::kable(tstat_df, format='markdown') %>% cat(sep = '\n')
Scale factor Case study Target objects Target references Target attributes
1 Dependability 2941 7840 2354
2 Dependability 5911 15760 4732
4 Dependability 13171 35120 10544
8 Dependability 35101 93600 28096
16 VirtualSwitch 495 325 495
32 VirtualSwitch 1040 708 1040
64 VirtualSwitch 2060 1370 2060
128 VirtualSwitch 4249 2841 4249

For sanity, we check that all transformations resulted in the same number of target model elements.

bad_batch_transformations <- trainbenchmark_log %>%
  filter(modificationMix == 'none' &
           checkpoint == 'batch' &
           category == 'target') %>%
  select(-c(modificationMix, checkpoint, category)) %>%
  mutate(variable = paste0('actual_', variable)) %>%
  spread(variable, value) %>%
  inner_join(target_model_statistics, by=c('transformationCase', 'modelSize')) %>%
  filter(actual_rootCount != target_rootCount |
           actual_count != target_count |
           actual_referenceCount != target_referenceCount |
           actual_attributeCount != target_attributeCount)

if (nrow(bad_batch_transformations) != 0) {
  print(bad_batch_transformations)
  stop("Unexpected batch transformation results")
} else {
  message("All correct")
}
## All correct

We may see that every batch transformation resulted in the expected number of target elements.

Now we count the variables and constraints in the partial models.

partial_size <- trainbenchmark_log %>%
  filter(modificationMix == 'none' &
           checkpoint == 'batch' & category == 'trace' &
           variable %in% c('variableCount', 'constraintCount') &
           rerun == 0) %>%
  select(c(transformationCase, variable, value, modelSize)) %>%
  spread(variable, value)

partial_cases <- TransformationCaseFactor(partial_size$transformationCase)
partial_df = data.frame(partial_size$modelSize, partial_cases, partial_size$variableCount, partial_size$constraintCount)

colnames(partial_df) <- c("Scale factor", "Case study", "Partial model variables", "Partial model constraints")
knitr::kable(partial_df, format='markdown') %>% cat(sep = '\n')
Scale factor Case study Partial model variables Partial model constraints
1 Dependability 9023 16277
2 Dependability 18137 32719
4 Dependability 40413 72907
8 Dependability 107689 194285
16 VirtualSwitch 2135 3280
32 VirtualSwitch 4536 6992
64 VirtualSwitch 8920 13720
128 VirtualSwitch 18429 28360

Change-driven transformations

Different modification mixed produce different output models, so we also collect statistics for the target model after each modification mix separately.

incremental_target_model_statistics <- trainbenchmark_log %>%
  filter(experiment == 'viewModel-incremental-physical' &
           modificationMix != 'none' &
           rerun == 0 &
           checkpoint == 'after' &
           category == 'target') %>%
  group_by(modelSize, transformationCase, modificationMix, variable) %>%
  summarize(value = first(value)) %>%
  mutate(variable = paste0('target_', variable)) %>%
  spread(variable, value) %>%
  ungroup()

incremental_tstat_df <- data.frame(
  incremental_target_model_statistics$modelSize,
  TransformationCaseFactor(incremental_target_model_statistics$transformationCase),
  ModificationMixFactor(incremental_target_model_statistics$modificationMix),
  incremental_target_model_statistics$target_count,
  incremental_target_model_statistics$target_referenceCount,
  incremental_target_model_statistics$target_attributeCount) %>%
  arrange_at(c(2, 1, 3))

colnames(incremental_tstat_df) <- c("Scale factor", "Case study", "Modification mix", "Target objects", "Target references", "Target attributes")
knitr::kable(incremental_tstat_df, format='markdown') %>% cat(sep = '\n')
Scale factor Case study Modification mix Target objects Target references Target attributes
1 Dependability (A) Usual mix 2125 5480 1646
2 Dependability (A) Usual mix 5459 14440 4336
4 Dependability (A) Usual mix 12908 34340 10310
8 Dependability (A) Usual mix 33788 89820 26962
16 VirtualSwitch (A) Usual mix 505 324 505
32 VirtualSwitch (A) Usual mix 1050 713 1050
64 VirtualSwitch (A) Usual mix 2070 1373 2070
128 VirtualSwitch (A) Usual mix 4259 2845 4259

We make sure that each experiment resulted in the same number of target model elements when executed with the same source model and modification mix.

incremental_target_model_statistics_by_experiment <- trainbenchmark_log %>%
  filter(modificationMix != 'none' &
           checkpoint == 'after' &
           category == 'target') %>%
  group_by(modelSize, transformationCase, modificationMix, experiment, rerun, variable) %>%
  summarize(value = first(value)) %>%
  mutate(variable = paste0('actual_', variable)) %>%
  spread(variable, value) %>%
  ungroup()

bad_incremental_experiments <- incremental_target_model_statistics_by_experiment %>%
  inner_join(incremental_target_model_statistics,
    by = c('modelSize', 'transformationCase', 'modificationMix')) %>%
  filter(actual_rootCount != target_rootCount |
           actual_count != target_count |
           actual_referenceCount != target_referenceCount |
           actual_attributeCount != target_attributeCount)


if (nrow(bad_incremental_experiments) != 0) {
  print(bad_incremental_experiments)
  stop("Unexpected incremental transformation results")
} else {
  message("All correct")
}
## All correct

We may see that every incremental transformation resulted in the expected number of target elements.

Execution time (RQ2 and RQ3)

Data wrangling

We define a helper function for collecting execution times from the batch and incremental versions of experiments.

RenameExperiment <- function(df) {
  df %>% mutate(experiment = gsub("-(batch|incremental)", "", experiment))
}

First execution (RQ2)

We will “formally” treat model query and first execution as two modification mixes, which simplifies our data frames. They will be shown on the same plots as the incremental sychronization times, anyway.

We collect the execution time of the model query in the first (also known as “batch” in the logs) execution. This is relatively simple, as all experiments contain a modelQuery checkpoint.

modelQuery <- trainbenchmark_log %>%
  filter(modificationMix == 'none' &
           variable == 'duration' &
           checkpoint == 'modelQuery') %>%
  mutate(modificationMix='modelQuery') %>%
  RenameExperiment()

We extract the first execution time of the hand-written VIATRA transformations, too. Only the hand-written transformations have an execute checkpoint in the log.

execution_viatra <- trainbenchmark_log %>%
  filter(modificationMix == 'none' &
           variable == 'duration' &
           checkpoint == 'execute') %>%
  mutate(modificationMix='execute') %>%
  RenameExperiment()

Extracting the first run of the ViewModel transformations is a bit more involved, as different steps were logged separately. We simply sum their durations.

execution_viewmodel <- trainbenchmark_log %>%
  filter(modificationMix == 'none' &
           variable == 'duration' &
           checkpoint %in% c('pt2tExecute', 'pt2tRete', 's2ptExecute', 's2ptRete')) %>%
  group_by(transformationCase, experiment, rerun, category, variable, modelSize) %>%
  summarize(checkpoint='execute', modificationMix='execute', value=sum(value)) %>%
  ungroup() %>%
  RenameExperiment()

Incremental transformations (RQ3)

The execution time of change-driven synchronization is split between the modelModification (propagation of source model changes trough the RETE net) and synchronization (firing of change-driven transformation rules) phases. We sum the two durations.

incremental <- trainbenchmark_log %>%
  filter(modificationMix != 'none' &
           variable == 'duration' &
           checkpoint %in% c('synchronization', 'modelModification')) %>%
  group_by(transformationCase, experiment, rerun, category, variable, modelSize, modificationMix) %>%
  summarize(checkpoint='synchronization', value=sum(value)) %>%
  ungroup() %>%
  RenameExperiment()

Putting it together

We bind the data frames from the previous two sections, and join them to the target and source model statistics. Then we prepare for creating the plots as follows:

  • The total model size is the geometric mean of the number of source and target model objects. This measure was selected because it can indicate scaleability in both case studies considered.
  • Only the median of the execution times is kept.
  • Execution times smaller than 1 ms are replaced with 1 ms so that a log-log plot can be drawn. Zero execution times would lead to NaN values after taking their logarithm.
durations_plot <- rbind(incremental, modelQuery, execution_viatra, execution_viewmodel) %>%
  inner_join(target_model_statistics, by = c('transformationCase', 'modelSize')) %>%
  inner_join(source_model_statistics, by = c('modelSize')) %>%
  mutate(total_count = sqrt(source_count * target_count)) %>%
  group_by(modelSize, transformationCase, modificationMix, experiment) %>%
  summarize(total_count = median(total_count), value = median(value)) %>%
  mutate(value = ifelse(value < 1, 1, value))

We add some factor labels in order to generate appropriate legends in the plots.

durations_plot$modificationMix <- ModificationMixFactor(durations_plot$modificationMix)
durations_plot$transformationCase <- TransformationCaseFactor(durations_plot$transformationCase)
durations_plot$experiment <- ExperimentFactor(durations_plot$experiment)
durations_plot <- durations_plot %>%
  arrange(transformationCase, modelSize, modificationMix, experiment)

A large table can be assembled for viewing by spreading the three experiments side by side. Note that different modification mixes were evaluated on different virtual machines, so executing times are only comparable within a single modification mix.

durations_table <- durations_plot %>%
  spread(experiment, value) %>%
  arrange(modificationMix, transformationCase, modelSize)
colnames(durations_table)[1:4] <- c("Scale factor", "Case study", "Modification mix", "Total size")
knitr::kable(durations_table, format='markdown') %>% cat(sep = '\n')
Scale factor Case study Modification mix Total size Our approach Source-reactive VIATRA Trace-reactive VIATRA
1 Dependability Initial query 1726.897 336 53 54
2 Dependability Initial query 3471.675 26 32 26
4 Dependability Initial query 7754.071 37 37 43
8 Dependability Initial query 20704.794 93 73 71
16 VirtualSwitch Initial query 3535.987 815 679 655
32 VirtualSwitch Initial query 7196.594 1360 1385 1600
64 VirtualSwitch Initial query 14473.970 3175 3024 3234
128 VirtualSwitch Initial query 29725.280 7915 7113 7241
1 Dependability Initial transformation 1726.897 2349 32 45
2 Dependability Initial transformation 3471.675 2061 67 50
4 Dependability Initial transformation 7754.071 5932 76 103
8 Dependability Initial transformation 20704.794 12109 200 203
16 VirtualSwitch Initial transformation 3535.987 370 11 20
32 VirtualSwitch Initial transformation 7196.594 738 22 28
64 VirtualSwitch Initial transformation 14473.970 1182 50 68
128 VirtualSwitch Initial transformation 29725.280 2347 129 143
1 Dependability (A) Usual mix 1726.897 1316 22 26
2 Dependability (A) Usual mix 3471.675 1410 14 15
4 Dependability (A) Usual mix 7754.071 2902 17 15
8 Dependability (A) Usual mix 20704.794 16761 57 56
16 VirtualSwitch (A) Usual mix 3535.987 60 9 10
32 VirtualSwitch (A) Usual mix 7196.594 60 8 9
64 VirtualSwitch (A) Usual mix 14473.970 62 10 10
128 VirtualSwitch (A) Usual mix 29725.280 75 8 10

Plots

Let use define some helper functions for making publication-quality plots.

scientific_10 <- function (x) {
  parse(text=gsub("1e", " 10^", scales::scientific_format()(x)))
}

DurationsPlot <- function (df) {
  ggplot(df, aes(x = total_count, y = value, color=experiment, shape=experiment)) +
    geom_point(size = 2) +
    geom_line() +
    scale_x_continuous(name = "Model size = sqrt(#source objects * #target object)",
                       trans = "log",
                       limits = c(1000, 100000),
                       breaks = c(1, 10, 100, 1000, 10000, 100000),
                       label=scientific_10) +
    scale_y_continuous(name = "Execution time (ms)",
                       trans = 'log',
                       limits = c(1, 150000),
                       breaks = c(1, 10, 100, 1000, 10000, 100000),
                       label=scientific_10) +
    facet_grid(transformationCase~modificationMix) +
    scale_color_brewer(type='qual', palette=6, name = "Transformation") +
    scale_shape_manual(values=c(1, 4, 3), name="Transformation") +
    theme_bw() +
    theme(legend.position='bottom', legend.box.spacing = unit(c(0, 0, 0, 0), 'cm'))
}

As only a single modification mix was run, the plots fit on a single figure.

durations_plot %>% filter(as.numeric(modificationMix) < 8) %>% DurationsPlot()

Instrumented first-run transformations (RQ1)

In order to analyze first run behavior in depth, ViewModel was instrumented to log each phase of first run execution separately. These phases are

As the RETE constrution times are usually much shorter than the firing time, we add them to the firing times and compare the overall execution times of the Source2PartialTarget and PartialTarget2Target phases.

instrumented_plot <- trainbenchmark_log %>%
  filter(modificationMix == 'none' &
           variable == 'duration' &
           checkpoint %in% c('pt2tExecute', 'pt2tRete', 's2ptExecute', 's2ptRete')) %>%
  mutate(checkpoint = gsub('Execute|Rete', '', checkpoint)) %>%
  group_by(transformationCase, experiment, rerun, category, variable, modelSize, checkpoint, modificationMix) %>%
  summarize(value=sum(value)) %>%
  ungroup() %>%
  inner_join(target_model_statistics, by = c('transformationCase', 'modelSize')) %>%
  inner_join(source_model_statistics, by = c('modelSize')) %>%
  mutate(total_count = sqrt(source_count * target_count)) %>%
  group_by(modelSize, transformationCase, modificationMix, experiment, checkpoint) %>%
  summarize(total_count = median(total_count), value = median(value)) %>%
  mutate(value = ifelse(value < 1, 1, value)) %>%
  ungroup()

instrumented_plot$transformationCase <- TransformationCaseFactor(instrumented_plot$transformationCase)
instrumented_plot$checkpoint <- factor(as.factor(instrumented_plot$checkpoint),
                                    levels = c('s2pt', 'pt2t'),
                                    labels = c('S2PT', 'PT2T'))
show_instrumented_plot <- instrumented_plot %>%
  select(-c(modificationMix, experiment)) %>%
  spread(checkpoint, value)
colnames(show_instrumented_plot) <- c('Scale factor', 'Case study', 'Total size', 'PT2T duration', 'S2PT duration')

knitr::kable(show_instrumented_plot, format='markdown') %>% cat(sep = '\n')
Scale factor Case study Total size PT2T duration S2PT duration
1 Dependability 1726.897 1536 813
2 Dependability 3471.675 1186 875
4 Dependability 7754.071 3945 1987
8 Dependability 20704.794 6943 5166
16 VirtualSwitch 3535.987 259 111
32 VirtualSwitch 7196.594 589 149
64 VirtualSwitch 14473.970 964 218
128 VirtualSwitch 29725.280 1970 377
ggplot(instrumented_plot, aes(x = total_count, y = value, color=checkpoint, shape=checkpoint)) +
  geom_point(size=3.5) +
  geom_line() +
  scale_x_continuous(name = "Model size",
                     trans = "log",
                     limits = c(1000, 100000),
                     breaks = c(1, 10, 100, 1000, 10000, 100000),
                     label=scientific_10) +
  scale_y_continuous(name = "Execution time (ms)",
                     trans = 'log',
                     limits = c(1, 150000),
                     breaks = c(1, 10, 100, 1000, 10000, 100000),
                     label=scientific_10) +
  facet_grid(transformationCase~.) +
  scale_color_brewer(type='qual', palette=6, name = "Execution step") +
  scale_shape_manual(values=c(1, 4), name="Execution step") +
  theme_bw() +
  theme(legend.position='bottom', legend.box.spacing = unit(c(0, 0, 0, 0), 'cm'))