Frequently Asked Questions
This page continuously evolves and can be modified directly at GitHub!
If you have questions or suggestions which are not addressed here, please feel free
to post them to the dedicated AE google group.
Do I have to open source my software artifacts?
No, it is not strictly necessary and you can
provide your software artifact as a binary.
However, in case of problems, reviewers may not be
able to fix it and will likely give you a negative score.
Is Artifact evaluation blind or double-blind?
AE is a single-blind process, i.e. authors' names are known to the evaluators
(there is no need to hide them since papers are accepted),
but names of evaluators are not known to authors.
AE chair is usually used as a proxy between authors and evaluators
in case of questions and problems.
How to pack artifacts?
We do not have strict requirements at this stage. You can pack
your artifacts simply in a tar ball, zip file, Virtual Machine or Docker image.
You can also share artifacts via public services including GitHub, GitLab and BitBucket.
Please see our submission guide
for more details.
Is it possible to provide a remote access to a machine with pre-installed artifacts?
Only in exceptional cases, i.e. when rare hardware or proprietary software/benchmarks are required,
or VM image is too large, or when you are not authorized to move artifacts outside your organization.
In such case, you will need to send the access information
to the AE chairs via private email or SMS.
They will then pass this information to the evaluators.
Can I share commercial benchmarks or software with evaluators?
Please check the license of your benchmarks, data sets and software.
In case of any doubts, try to find a free alternative. In fact,
we strongly suggest you provide a small subset of free benchmarks
and data sets to simplify evaluation.
Note, that we have a preliminary agreement with the EEMBC consortium
to let authors share their EEMBC benchmarks with the evaluators for Artifact Evaluation purposes.
Can I engage with the community to evaluate my artifacts?
Based on the community feedback, we provided an extra option of open evaluations
to let the community validate artifacts which are publicly available
at GitHub, GitLab, BitBuckets, etc, report issues and help authors
fix them.
Note, that in the end, these artifacts still go through traditional
evaluation process via the AE committee. We successfully validated
at ADAPT'16
and CGO/PPoPP'17 AE!
How to automate, customize and port experiments?
For our past AE experience, the major difficulty that evaluators
face is the lack of a common and portable workflow framework
in systems research. This means that each year they have
to learn some ad-hoc scripts and formats in nearly
all artifacts without even reusing such knowledge the following year
(see our public presentation about
CGO-PPoPP'17 AE issues).
Things get even worse if an evaluator would like to validate experiments
using a different compiler, tool, library, data set, operating systems or hardware
rather than just replicating quickly outdated results using
VM and Docker images - our experience shows that most of the submitted scripts
are not easy to change, customize or port.
That is why we collaborate with the community
and ACM to develop a common experimental framework
(Collective Knowledge aka CK)
with and a common JSON API and a portable package manager
which unifies detection and installation of dependencies for Linux, MacOS, Windows and Android.
You can see how CK workflows helped automate, crowdsource and visualize experiments in the
1st ACM ReQuEST-ASPLOS'18 tournament
to co-design Pareto-efficient software/hardware stack for deep learning:
CK workflows,
ACM proceedings,
report and
reproducible results.
You can also reuse shared programs/benchmarks,
portable packages (code and data sets),
repositories with customizable workflows and other CK components,
and software detection plugins in your own workflows.
You can also check this CGO'17 article from the University of Cambridge
on "Software Prefetching for Indirect Memory Accesses"
with CK-based experimental workflow and packages, which won a distinguished artifact award:
Please, follow this guide
if you want to convert your artifacts and workflows to the CK format.
Note that you are not obliged to use CK.
However if you are interested to give it a try and help evaluators,
the non-profit cTuning foundation
regularly helps users to convert their workflows to the CK format while reusing already existing artifacts -
just contact them way before
submitting artifacts. You can also join us at the ResCuE-HPC workshop
on Reproducible, Customizable and Portable Workflows for HPC at Supercomputing'18
to discuss how to introduce common workflows to systems research!
Do I have to make my artifacts public if they pass evaluation?
No, you don't have to and it may be impossible in some cases of some commercial artifacts.
Nevertheless, we encourage you to make your artifacts publicly available upon publication,
for example, by including them as "source materials" in the Digital Library
or sharing them in a permanent repository (required to receive the "available" badge)
as outlined in our vision
for collaborative and reproducible computer engineering.
Furthermore, if you have your artifacts already publicly available at the time
of submission, you may profit from the "public review" option, where you are engaged
directly with the community to discuss, evaluate and use your software. See such
examples here
(search for "example of public evaluation").
How to report and compare empirical results?
First of all, you should undoubtedly run empirical experiments more than once
(we still encounter many cases where researchers measure execution time only once)
and perform statistical analysis (i.e. do not just look at average but at distribution of values)!
There is no universal recipe how many times you should repeat your empirical experiment
since it heavily depends on the type of your experiments, platform and environment.
You should then analyze distribution of execution time as shown in the figure below:
If you have more than one expected value (b), it means that you have several
run-time states on your machine which may be switching during your experiments
(such as adaptive frequency scaling) and you can not reliably compare empirical results.
However, if there is only one expected value for a given experiment (a),
then you can use it to compare multiple experiments (for example during
autotuning as described
here).
You should also report variation of empirical results together with expected values.
Furthermore, we strongly suggest you to pre-record results from your platform
and provide a script to automatically compare new results with the pre-recorded ones
preferably using expected values. This will help evaluators avoid wasting time
when trying to dig out and validate results from "stdout".
For example, see how new results are visualized and compared against the pre-recorded ones
using CK dashboard
in the CGO'17 distinguished artifact.
How to deal with numerical accuracy and instability?
If the accuracy of your results depends on a given machine, environment and optimizations
(for example, when optimizing BLAS, DNN, etc), you should provide a script/plugin to automatically
report unexpected loss in accuracy (above provided threshold) as well as any numerical instability.
How to validate models or algorithm scalability?
If you present a novel parallel algorithm or some predictive model which should scale
across a number of cores/processors/nodes, we suggest you
to provide such an experimental workflow which could automatically detect the underlying topology
of a user machine (or it can at least be configurable), validate your models or algorithm scalability,
and report any unexpected behavior. In the future, we expect to use public repositories
of knowledge where results will be automatically validated against the ones continuously shared
by the community (1, 2).
Is there any page limit for my Artifact Evaluation Appendix?
There is no limit for the AE Appendix at the time of the submission for Artifact Evaluation.
However, there is currently a 2 page limit for the AE Appendix in the camera-ready CGO and PPoPP papers.
There is no page limit for the AE Appendix in the camera-ready SC paper. We also expect
that there will be no page limits for AE Appendices in the journals willing to participate
in our AE initiative.