Porting DB12 to Python3: Analysis of the scores

Alexandre Franck Boyer; Federico Stagni

doi:10.5281/zenodo.7031029

Published August 29, 2022 | Version 1.1.0

Dataset Open

Porting DB12 to Python3: Analysis of the scores

1. CERN

DB12 has been originally conceived with Python2 to estimate the power of a given CPU to run HEP applications. However, since January 2020, Python2 is no longer maintained and we decided to port the code to Python3, which contains several optimizations.

In October 2021, we effectively ported DB12 to Python3.9, but the optimizations brought by the language generated discrepancies in the norm score, which is a critical component to evaluate the power of CPUs. We build an analysis tool to mitigate these discrepancies.

Current situation

The current analysis assess the impact of the changes on the norm score and propose 3 different solutions to resolve the issue:

Find a constant value that would transform a python3 score into a python2 one: 1.18 seems fine.
- Pros: simple
- Cons: not accurate. does fit well with scores computed on Intel, not so well with scores computed on AMD
Find one constant value per processor type (Intel/AMD): 1.16 fits well with scores computed on Intel, 1.4 also fits well with scores computed on AMD .
- Pros: accurate
- Cons: need to maintain a table and update it with new types of processors, and maybe also with new versions of python
Compute a simple linear regression:
- Pros: accurate
- Cons: need to run it with many examples to get an accurate model

To keep it simple and accurate, we chose to apply the second solution: one constant value per processor type.
These constants are part of the code and are located in src/db12/factors.json.

These values will need to be updated through time, according to the evolution of the CPUs and Python.
If you need to get an accurate DB12 norm score using a Python version or a CPU that has not be taken into account,
then you have to run the analysis with new data following the next steps.

Run the analysis

Execute the Jupyter Notebook:

jupyter notebook DB12Analysis.ipynb

Include new data using DIRAC

Install a DIRAC client:

lb-dirac
lhcb-proxy-init

Go to resources/tools and submit jobs:

./submit.sh <number of jobs> <list of sites>

Once the jobs are done, add their IDs in `jobIDs.csv` to get the results:

python getDB12Scores.py

You will obtain `results.json`, you can sort the file to get a better look at it with `jq`:

cat results.json | jq . > results_sorted_<date>.json

You can finally remove the jobs that are still pending in the queues

Files

DB12Analysis.ipynb

Files (3.2 MB)

Name	Size	Download all
DB12Analysis.ipynb md5:161a48bcbeb669c9ed4dda554dfc3b89	1.4 MB	Preview Download
resources.zip md5:1823d2dd9923c56d4d039cedfee31c19	438.1 kB	Preview Download
results.zip md5:76538859c9b9970aa03af2d96fa0c6fe	1.3 MB	Preview Download

	All versions	This version
Views	469	145
Downloads	61	29
Data volume	135.7 MB	34.7 MB

Porting DB12 to Python3: Analysis of the scores

Creators

Description

Files

DB12Analysis.ipynb

Files (3.2 MB)