{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# DB12 Analysis: Port to Python 3" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import json\n", "import pandas as pd\n", "import seaborn as sns\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from scipy import stats\n", "import math\n", "from sklearn.model_selection import train_test_split, learning_curve\n", "from sklearn.linear_model import ElasticNet, LinearRegression, Ridge, RidgeCV\n", "from sklearn import metrics" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Problem" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "DB12 was written in python2 and was ported to python3 in October 21.\n", "The CI detected discrepancies between results of DB12 when executed with python2 and python3.\n", "In this notebook, we want to study the potential impact of the discrepancies on the execution of the jobs and propose new solutions." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "### Source of the discrepancies:" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Porting DB12 to python3 involved:\n", "- using `range` instead of `xrange`: but actually `range` in python3 is the equivalent of `xrange` in python2.\n", "- using `long` instead of `int` values: python3 does not support the old `int` anymore, the new `int` == `long`. Operations involving the new `int` are slower.\n", "- and probably many other small optimizations...\n", "\n", "Finding the source of the discrepancies would imply thorough profiling and would probably not help us to resolve the issue.\n", "First let's discover the impact of such a change." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Impact" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Hypothesis: \n", "- $H_0$: Scores of `python3` DB12 are not significantly different from `python2` scores.\n", "- $H_a$: Scores are significantly different.\n", "\n", "We submitted 5 jobs computing DB12 20 times (10 with `python2` and 10 with `python3`, 5 minutes interval between each run) on many different sites and we collected data about the scores and the environments (programs available in `resources`).\n", "\n", "```bash\n", "# On a dirac client\n", "./submit.sh\n", "python getDB12Scores.py\n", "```" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "#plt.style.use('tex')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "## Data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": false, "slideshow": { "slide_type": "notes" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | score | \n", "python-version | \n", "os | \n", "cpu-model | \n", "nb-cores | \n", "cpu-mhz | \n", "load-avg | \n", "ce | \n", "iteration | \n", "job-id | \n", "cpu-family | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
8823 | \n", "17.099863 | \n", "3 | \n", "CentOS Linux release 7.9.2009 (Core) | \n", "AMD EPYC 7282 16-Core Processor | \n", "64 | \n", "2800.000 | \n", "75.97 | \n", "lapp-ce06.in2p3.fr | \n", "3 | \n", "542946731 | \n", "AMD | \n", "
8843 | \n", "11.627907 | \n", "2 | \n", "CentOS Linux release 7.9.2009 (Core) | \n", "AMD EPYC 7282 16-Core Processor | \n", "64 | \n", "2800.000 | \n", "75.97 | \n", "lapp-ce06.in2p3.fr | \n", "7 | \n", "542946731 | \n", "AMD | \n", "
8842 | \n", "15.595758 | \n", "3 | \n", "CentOS Linux release 7.9.2009 (Core) | \n", "AMD EPYC 7282 16-Core Processor | \n", "64 | \n", "2800.000 | \n", "75.97 | \n", "lapp-ce06.in2p3.fr | \n", "7 | \n", "542946731 | \n", "AMD | \n", "
8841 | \n", "17.556180 | \n", "3 | \n", "CentOS Linux release 7.9.2009 (Core) | \n", "AMD EPYC 7282 16-Core Processor | \n", "64 | \n", "2800.000 | \n", "75.97 | \n", "lapp-ce06.in2p3.fr | \n", "8 | \n", "542946731 | \n", "AMD | \n", "
8840 | \n", "11.973180 | \n", "2 | \n", "CentOS Linux release 7.9.2009 (Core) | \n", "AMD EPYC 7282 16-Core Processor | \n", "64 | \n", "2800.000 | \n", "75.97 | \n", "lapp-ce06.in2p3.fr | \n", "0 | \n", "542946731 | \n", "AMD | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
2347 | \n", "10.490978 | \n", "2 | \n", "CentOS Linux release 7.7.1908 (Core) | \n", "Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz | \n", "64 | \n", "2631.152 | \n", "57.34 | \n", "ce03.cmsaf.mit.edu | \n", "2 | \n", "534019954 | \n", "Intel | \n", "
2346 | \n", "11.927481 | \n", "3 | \n", "CentOS Linux release 7.7.1908 (Core) | \n", "Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz | \n", "64 | \n", "2631.152 | \n", "57.34 | \n", "ce03.cmsaf.mit.edu | \n", "3 | \n", "534019954 | \n", "Intel | \n", "
2345 | \n", "11.938873 | \n", "3 | \n", "CentOS Linux release 7.7.1908 (Core) | \n", "Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz | \n", "64 | \n", "2631.152 | \n", "57.34 | \n", "ce03.cmsaf.mit.edu | \n", "9 | \n", "534019954 | \n", "Intel | \n", "
5484 | \n", "11.190689 | \n", "2 | \n", "CentOS Linux release 7.9.2009 (Core) | \n", "Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz | \n", "16 | \n", "2095.074 | \n", "5.97 | \n", "ce515.cern.ch | \n", "9 | \n", "534019731 | \n", "Intel | \n", "
5483 | \n", "13.206550 | \n", "3 | \n", "CentOS Linux release 7.9.2009 (Core) | \n", "Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz | \n", "16 | \n", "2095.074 | \n", "5.97 | \n", "ce515.cern.ch | \n", "4 | \n", "534019731 | \n", "Intel | \n", "
11880 rows × 11 columns
\n", "