There is a newer version of the record available.

Published March 3, 2021 | Version v1
Conference paper Open

High Performance Correctly Rounded Math Libraries for 32-bit Floating Point Representations

  • 1. Rutgers University

Description


Abstract:

We present the artifact for the accepted paper, High Performance
Correctly Rounded Math Libraries for 32-bit Floating Point
Representations. The artifact provides all necessary source code as
well as testing harness to show the correctness and speedup of the
proposed math library, RLibm-32. The readme documentation describes how
to install all necessary tools and explain the steps to run
experiments. We provided scripts to automatically execute each part of
the experiment. As fully evaluation the entire artifact will require
at least 4 days of work (even with some parallelism), we also provide
a list of scripts for the "SHORT" version of the artifact
evaluation. The short version will require roughly 2 hours to
complete. We highly recommend to use the short version instead of the
full version. However, for completeness, we also describe how to
completely evaluate the artifact towards the bottom of the readme
file.


_________________________________________________________________________________________________

* Important note:

Completely evaluating everythign in this artifact will take several
days (4+ days). There are a total of 4 phases in our evaluation:

    Computing reduced intervals
    Generating polynomial
    Checking for correctness
    Testing speedup

Thus, in this artifact, we provide a "short" version of each phase of
evaluation. The short version will take roughly 2 hours to
complete. We HIGHLY HIGHLY recommend you to evaluate the artifact
using the short version.

We also provide methods to perform full evaluation of the 3rd and 4th
phase (checking for correctness and testing speedup) without actually
requiring to generate polynomials. However, checking for correctness
requires roughly 24 hours to complete and testing the speedup of
RLibm-32 requires roughly 8-10 hours to complete.

In conclusion, we strongly recommend you to evaluate the artifact
using the short version.


_________________________________________________________________________________________________

* Hardware recommendation:

All of our evaluations were performed on a machine with 2.10GHz Intel
Xeon Gold 6230R and 187GB of RAM running Ubuntu 18.04. However, the
artifact should be executable using most modern machine with at least
16GB of memory. To run the "short" version, we recommend disk space of
at least 40GB to be safe. To run the "full" version, we recommend 
disk space of at least 200GB.

* Software recommendation:

We tested our artifact on Ubuntu-16.04 and Ubuntu-18.04. We provide
installation instructions to install all pre-requisite software.


_________________________________________________________________________________________________

* INSTALLATION GUIDE:

There are two methods to install the artifact: (1) Installing docker
and downloading the docker image for the artifact or (2) Manual
installation. We recommend to use the docker image provided to
expedite the installation process.

** DOWNLOADING PREBUILT DOCKER IMAGE

   We have prebuilt a docker image and hosted it in the docker hub.

(1) Install docker if not already installed by following the
installation documentation in this link:
https://docs.docker.com/install/

    We recommend installing docker and evaluating our artifact on
    a machine with Ubuntu. Although docker can be used with
    Windows or MacOS, docker will run on top of a linux VM,
    significantly slowing down all process. All docker commands
    may require "sudo" permission. If you get any errors when
    trying to use docker, prefix all commands using the "sudo"
    command

(2) Download the docker image

    $ docker pull jpl169/pldi2021artifact

    The docker image is roughly 5.61GB in size

(3) Run the docker image
    
    $ sudo docker run -it jpl169/pldi2021artifact

    You will be placed in the PLDI2021Artifact directory. You
    are now ready to evaluate the artifact. Go down to the
    EVALUATION GUIDE section for artifact evaluation.


(*) NOTE ON WORKING WITH DOCKER (COPYING FILES)

    Later in the EVALUATION GUIDE, you may want to copy files
    from the docker container to your host machine. This will
    allow you to compare or inspect files easier. To copy files
    from container to host machine, follow the steps below:

    (a) Remember the <path> and <file name> you wish to copy
    (b) exit from docker container by using the command

        $ exit

    (c) Find the ID of the container you were in

        $ docker container ls -a

    (d) Copy file from the host terminal:

        $ docker cp <container id>:<path>/<filename> <destination>

    For example, if you want to copy "Example.pdf" in the "/home/foo"
    directory from the container Id 12345 to your host machine's 
    current directory, the command will look like
        
        $ docker cp 12345:/home/foo/Example.pdf .

    (e) If you want to go back into the container, use the command

        $ docker start <container id>
        $ docker attach <container id>


** MANUAL INSTALLATION

(1) To install and evaluate the artifact manually, first install a list of prerequisites:

    $ sudo apt-get update
    $ sudo apt-get install build-essential cmake git libgmp3-dev libmpfr-dev zlib1g zlib1g-dev wget python python3 python3-pip parallel
    $ sudo python3 -m pip install matplotlib

(1.1) To prevent gnu parallel from messaging multiple times, run the code to agree with their term:
    $ parallel --citate
    -> type 'will cite'

(2) Download and install soplex 4.0.1:
    
    Please download soplex-4.0.1 from https://soplex.zib.de/index.php#download
    ** Make sure that you're downloading version 4.0.1 **

    $ tar -xvf soplex-4.0.1.tar
    $ cd soplex-4.0.1
    $ mkdir build
    $ cd build
    $ cmake ..
    $ make
    $ cd ../..

(3) Download and compile SoftPosit :
    $ git clone https://gitlab.com/cerlane/SoftPosit.git
    $ cd SoftPosit
    $ git checkout 983e821dd30b9da5467bcbea2895fdbacc1ce264
        $ cd build/Linux-x86_64-GCC/
        $ make
        $ cd ../../..

(4) Download and install Intel oneAPI Base Toolkit

    * You can access it through this site:
      https://software.intel.com/content/www/us/en/develop/tools/oneapi/hpc-toolkit/download.html

    (a) Select the appropriate operating system
    (b) Select "Web & Local" distribution option
    (c) Select Online installer
    (d) On the right hand side (gray background) if you scroll
    down, it will show the steps to install. 

    If your OS is Linux base, then you might use the command:

    $ wget https://registrationcenter-download.intel.com/akdlm/irc_nas/17427/l_HPCKit_p_2021.1.0.2684.sh

    $ bash l_HPCKit_p_2021.1.0.2684.sh

    Follow the instruction. The installer will guide you through 
    installing intel compiler.
    
    (a) Make sure to install "Intel® oneAPI DPC++/C++ Compiler &
    Intel® C++ Compiler Classic." Choose to not install any other
    components
    (b) Make sure to remember the Installation directory
    (c) If it shows you any warning about requiring the "Base
    toolkit" you can choose to ignore it.

    The installation will take roughly 5~10 minutes.

    Once installation is complete, run script to set variables:
    $ cd <path to intel oneAPI directory>
    $ . setvars.sh

(5) Set the following environment variables:
    $ export SOFTPOSITPATH=<path to SoftPosit directory>
    $ export SOPLEXPATH=<path to soplex-4.0.1 directory>
    $ export ICCPATH=<path to Intel oneAPI directory>
    
    * If you did not change the installation path while installing 
      Intel compiler, then the path to Intel oneAPI directory will 
      most likely end with "intel/oneapi"

(6.a) Download PLDI21 artifact from github
    $ git clone https://github.com/jpl169/PLDI2021Artifact
    $ cd PLDI2021Artifact

(6.b) Or Download PLDI2021 artifact from Zenodo and untar the file:
    - Zenodo page: https://doi.org/10.5281/zenodo.4579410
    $ tar -xvf PLDI2021Artifact.tar
    $ cd PLDI2021Artifact

(7) Installation is complete. Go to EVALUATION GUIDE section.


_________________________________________________________________________________________________

* EVALUATION GUIDE FOR THE ARTIFACT

    If you have followed one of the two installation methods in
    the INSTALLATION GUIDE section, you should be in the
    PLDI2021Artifact directory. As mentioned before, we provide
    both "short" version and "long" version of artifact
    evaluation. We highly recommend to try the "short" version, as
    the "long" version will take several days to complete. We
    first describe how to evaluate our artifact using the "short"
    version. Then we will explain the "long version"


_________________________________________________________________________________________________

* EVALUATION GUIDE (SHORT VERSION)

(1) Computing reduced intervals

    In this part, we will compute reduced intervals necessary to
    generate polynomials of sinh(x) and cosh(x) for float
    representation. To execute this step, run the command

    $ ./runFloatIntervalGen_Short.sh

    This process should take roughly 30min. Once this process
    finishes, you will find that there is a new folder "intervals"
    which contains two files, FloatCoshData and FloatSinhData
    which are necessary for the next step

(2) Generating polynomial (Corresponding to Table 2)

    In this part, we generate the polynomials based on the reduced
    intervals we computed in the previous step. To execute this
    step, run the command
        
    $ ./runFloatFunctionGen_Short.sh

    This process should take roughly 15min ~ 30min. Once this
    process finishes, you will find that there are two files
    "Sinh.h" and "Cosh.h" in the directory
    include/float_headers. You can check the file against the
    reference header files in
    include-reference/float_headers. Although each value in the
    array may slightly differ, the size of the array should be
    exactly the same.

(3) Checking the correctness of RLibm-32 (Corresponding to Table 1)

    In this part, we check the correctness of RLibm-32 functions. To
    expedite the process, we select one million uniformly
    distributed input points. Thus, the exact result and number
    will be different from the result in the paper. To check the
    correctness of math library functions for float, run the
    command

    $ ./runFloatLibTest_reference_Short.sh

    This process should take roughly 1 minute. Once this process
    finishes, it will print out whether RLibm-32, GLibc, and Intel
    math library produces correct results. For all functions, you
    should see the message: "RLibm-32 returns correct result for all
    inputs." You should also see that for a number of functions,
    GLibc and Intel's float math library does not produce correct
    results. Because we only tested with 1 million inputs, the
    number of inputs that produce wrong results will be different
    from the paper.

    To check the correctness of math library functions for
    posit32, run the command

    $ ./runPosit32LibTest_reference_short.sh

    This process should take roughly 1 minute

    Again, for each function, you should see the message: "RLibm-32
    returns correct result for all inputs." Because we did not
    test all inputs, it will also show that both GLibc and Intel's
    double library produces correct result as well, unlike what is
    reported from the paper.

(4) Speedup of RLibm-32 (Corresponding to Figure 3 and Line 1134)

    * NOTE: Since the submission of the paper, Intel has
    completely changed the toolkit for installing Intel's
    compiler. Thus, the speedup result may differ from what's
    reported in the paper. We will update the result in the final
    version accordingly.

    In this part, we check the speedup of RLibm-32 against Glibc and
    Intel math library. To expedite the process, we select one
    million uniformly distributed input points again. To check the
    speed of RLibm-32's float functions, run the command

    $ ./runFloatOverheadTest_reference_Short.sh

    This process should take roughly 1 minute

    Once this process finishes, it will create two files, 
    (a) floatAgainstGlibcShort.pdf
    (b) floatAgainstIntelShort.pdf
    
    These two graphs correspond to Figure 3(a) and Figure 3(b),
    respectively. Although the numbers may look different, it
    should still be the case that the bars are above 1x speedup in
    most cases, and average is above 1x speedup in all cases.

    To check the speedup of RLibm-32's posit32 functions, run the
    command

    $ ./runPosit32OverheadTest_reference_Short.sh

    This process should take roughly 1 minute

    Once this process finishes, it will print the speedup of
    RLibm-32 against Glibc and Intel's double math library. The
    speedup will be different because it only tests 1 million
    points, but speedup should be roughly 1x ~ 1.15x.

 

_________________________________________________________________________________________________

* EVALUATION GUIDE (LONG VERSION)

    * NOTE: Before beginning the long version, PLEASE understand
    that the whole process will take roughly 4 days or longer. If
    you do not wish to spend so much time, consider doing the
    SHORT version of the evaluation

(1) Computing reduced intervals for float functions

    This process computes the reduced intervals for all 10
    RLibm-32's float functions. To execute this phase, run the
    command

    $ ./runFloatIntervalGen.sh

    This script will automatically run the underlying scripts in
    parallel. By default, the parallelism is 2. If you would like
    to increase the parallelism, use the "-j" flag:

    $ ./runFloatIntervalGen.sh -j <# jobs>

    However, since this phase is I/O heavy, we recommend to limit
    parallelism to 4.

    This process will take roughly 12-13 hours. Once this process
    finishes, you will find that there is a new folder "intervals"
    which contains 10 files, which are necessary files to generate
    polynomials for each float functions.

(2) Generating polynomial for float functions (Corresponding to Table 2)

    In this part, we generate the polynomials based on the reduced
    intervals we computed in the previous step. To execute this
    step, run the command
        
    $ ./runFloatFunctionGen.sh

    This script will automatically run the underlying scripts in
    parallel. By default, the parallelism is 2. If you would like
    to increase the parallelism, use the "-j" flag:

    $ ./runFloatFunctionGen.sh -j <# jobs>

    However, since this phase is I/O heavy, we recommend to limit
    parallelism to 4.

    This process should take roughly 5-6 hours. Once this process
    finishes, you will find that there are header files in
    include/float_headers. You can check the file against the
    reference header files in
    include-reference/float_headers. Although each value in the
    array may slightly differ, the size of the array should be
    exactly the same.

(3) Computing reduced intervals for posit32 functions

    This process computes the reduced intervals for all 3 RLibm-32's
    posit32 functions. To execute this phase, run the command

    $ ./runPosit32IntervalGen.sh

    This script will automatically run the underlying scripts in
    parallel. By default, the parallelism is 2. If you would like
    to increase the parallelism, use the "-j" flag:

    $ ./runFloatIntervalGen.sh -j <# jobs>

    However, since there are only 3 jobs total, the maximum should
    be 3.

    This process will take roughly 12-13 hours. Once this process
    finishes, you will find that there is a new folder "intervals"
    which contains 3 posit32 data files, which are necessary files
    to generate polynomials for each posit32 functions.

(4) Generating polynomial for posit32 functions (Corresponding to Table 2)

    In this part, we generate the polynomials based on the reduced
    intervals we computed in the previous step. To execute this
    step, run the command
        
    $ ./runPosit32FunctionGen.sh

    This script will automatically run the underlying scripts 
    in parallel. By default, the parallelism is 3. If you would like
    to increase the parallelism, use the "-j" flag:

    $ ./runFloatFunctionGen.sh -j <# jobs>

    However, since there are only 3 jobs total, the maximum should
    be 3.

    This process should take roughly 1 hour. Once this
    process finishes, you will find that there are header files in
    include/posit32_headers. You can check the file against the
    reference header files in
    include-reference/posit32_headers. Although each value in the
    array may slightly differ, the size of the array should be
    exactly the same.

(5) Checking the correctness of RLibm-32 (Corresponding to Table 1)

    In this phase, we check the correctness of RLibm-32 functions.
    Depending on whether you generated the header files in the 
    previous (1)-(4) step, you should use different scripts.

    (a) If you did generate the polynomials, you can check 
    correctness of the generated polynomials.

    (b) If you did not generate the polynomials, you can check
    correctness of the reference polynomial that we generated.

    To check the correctness of the polynomials YOU GENERATED
    for float, run the command: 

    $ ./runFloatLibTest.sh

    To check the correctness of the REFERENCE polynomial for
    float, run the command:

    $ ./runFloatLibTest_reference.sh

    This script will automatically run the underlying scripts in
    parallel. By default, the parallelism is 4. If you would like
    to increase/decrease the parallelism, use the "-j" flag.

    This process should take roughly 12 hours. Once this process
    finishes, it will print out whether RLibm-32, GLibc, and Intel
    math library produces correct results. For all functions, you
    should see the message: "RLibm-32 returns correct result for all
    inputs." You should also see that for a number of functions,
    GLibc and Intel's float math library does not produce correct
    results.

    * Note: Because our oracle for Sinpi and Cospi are
    exceptionally slow, we only check the result of inputs x = 0
    to 2.0 for Sinpi and Cospi. This is sufficient to extrapolate
    the result of Sinpi and Cospi to all other inputs, due to the
    properties of Sinpi and Cospi.

    To check the correctness of the polynomials YOU GENERATED for
    posit32, run the command:

    $ ./runPosit32LibTest.sh

    To check the correctness of the REFERENCE polynomial for
    posit32, run the command:

    $ ./runPosit32LibTest_reference.sh

    This script will automatically run the underlying scripts in
    parallel. By default, the parallelism is 3. If you would like
    to increase/decrease the parallelism, use the "-j" flag.  Note
    that the script only runs 3 jobs. So the maximum parallelism
    is 3.

    This process should take roughly 10 hours. Once this process
    finishes, it will print out whether RLibm-32, GLibc, and Intel
    math library produces correct results. For all functions, you
    should see the message: "RLibm-32 returns correct result for all
    inputs." You should also see that for a number of functions,
    GLibc and Intel's float math library does not produce correct
    results.

(6) Speedup of RLibm-32 (Corresponding to Figure 3 and Line 1134)

    * NOTE: Since the submission of the paper, Intel has
    completely changed the toolkit for installing Intel's
    compiler. Thus, the speedup result may differ from what's
    reported in the paper. We will update the result in the final
    version accordingly.

    In this part, we check the speedup of RLibm-32 against Intel and
    Glibc math library. Depending on whether you generated the
    header files in the previous (1)-(4) step, you should use
    different scripts.

    (a) If you did generate the polynomials, you can check speedup
    of the generated polynomials.

    (b) If you did not generate the polynomials, you can check
    speedup of the reference polynomial that we generated.

    To check the speedup of the polynomials YOU GENERATED for
    float, run the command:

    $ ./runFloatOverheadTest.sh

    To check the speedup of the REFERENCE polynomials for float,
    run the command:

    $ ./runFloatOverheadTest_reference.sh

    This script will automatically run the underlying scripts in
    parallel. By default, the parallelism is 1. If you would like
    to increase/decrease the parallelism, use the "-j" flag. We
    recommend to keep the parallelism low, to roughly 2-4.

    This process should take roughly 5-6 hours. Once this process
    finishes, it will create two files,
    (a) floatAgainstGlibc.pdf
    (b) floatAgainstIntel.pdf
    
    These two graphs correspond to Figure 3(a) and Figure 3(b),
    respectively. Although the numbers may look different, it
    should still be the case that the bars are above 1x speedup in
    most cases, and average is above 1x speedup in all cases.

    To check the speedup of the polynomials YOU GENERATED for
    posit32, run the command:

    $ ./runPosit32OverheadTest.sh

    To check the speedup of the REFERENCE polynomial for posit32,
    run the command:

    $ ./runPosit32OverheadTest_reference.sh

    This script will automatically run the underlying scripts in
    parallel. By default, the parallelism is 1. If you would like
    to increase/decrease the parallelism, use the "-j" flag. We
    recommend to keep the parallelism low, to roughly 2-3.

    This process should take roughly 2 hours. Once this process
    finishes, it will print the speedup of RLibm-32 against Glibc
    and Intel's double math library. The speedup may be different
    from reported from the paper, but speedup should be roughly 1x
    ~ 1.15x.

Files

Files (4.5 MB)

Name Size Download all
md5:75d344d386abb64832e7f14694ad5cc0
4.5 MB Download