Published April 11, 2021 | Version V3
Conference paper Open

High Performance Correctly Rounded Math Libraries for 32-bit Floating Point Representations

  • 1. Rutgers University

Description

There are two files in this archive.

PLDI2021Artifact.tar: This is the artifact submission version. It includes 10 float functions and 3 posit32 functions. If you would like to evaluate exactly what was there when the artifact was submitted for consideration, please use this file.

rlibm-32.tar: This is the modified and extended version after the artifact was submitted. It includes 10 float functions and 8 posit32 functions.

 

Instructions for rlibm-32:

RLIBM-32 is both a math library that provides correctly rounded result for all inputs and tools used to generate the correct polynomials. The techniques behind the tools will be appearing at PLDI 2021. Currently, RLIBM-32 supports a number of elementary functions for float and posit32 representations. 

### List of float functions supported by RLIBM-32

1. log(x), log2(x), log10(x)

2. exp(x), exp2(x), exp10(x)

3. sinh(x), cosh(x)

4. sinpi(x), cospi(x)

 

### List of posit32 functions supported by RLIBM-32

1. log(x), log2(x), log10(x)

2. exp(x), exp2(x), exp10(x)

3. sinh(x), cosh(x)

 

# Getting started with RLIBM-32:
There are various pre-requisites for using RLIBM-32 math library, testing it, or generating polynomials.
We describe the pre-requisites in each section. Alternatively, we have set up a docker image that contains the pre requisites and environment variables set up already. 

### Using Docker Image
1. Install docker if not already installed by following the installation documentation in this link: https://docs.docker.com/install/

2. Download the docker image
docker pull jpl169/rlibm-32
* The docker image is roughly ~6GB in size

3. Run the docker image
sudo docker run -it jpl169/rlibm-32

### Manual Installation
In each section (using math library, testing, generating polynomial) we list the pre-requisites and how to set up.

 

# How to build and use RLIBM-32 math library

### Prerequisite

If you want to compile the math library for posit32, you have to install SoftPosit. Please follow the instructions from https://gitlab.com/cerlane/SoftPosit.

 

### Setup

1. Create an environment variable SOFTPOSITPATH that points to the directory of SoftPosit:

export SOFTPOSITPATH=<path-to-softposit-directory>

  

2. Build the math library

  1. If you want to build all the math libraries, simply use make rule from the root directory

  cd <path-to-rlibm-32>

  make

 

  2. If you want to build math libraries for each representation separately, you can use these make rule

  cd <path-to-rlibm-32>

  make floatmlib

  make posit32mlib

  

### USAGE

The math library will be located in the `lib` directory.

  * floatMathLib.a : math library for float

  * posit32MathLib.a : math library for posit32.

 

The header files for each library is located in the include directory:

  * `float_math.h` : header for float math library

  * `posit32_math.h` : header for posit32 math library

 

You can use our library in the code similar to how standard math library is used, except our function names start with "rlibm_":

test.cpp: 

#include "float_math.h"

int main() {

  float result = rlibm_cospi(1.5f);

  return 0;

}

 

To build the program, include the math library in the compilation command:

g++ test.cpp -I<path-to-rlibm-32>/include/ <path-to-rlibm-32>/lib/floatMathLib.a -lm -o test

 

Currently, RLIBM-32 uses some functions from the default math library for range reduction (i.e., to decompose a floating point value into the integral part and fractional part) so make sure to include `-lm` flag.

 

# Testing Correctness of RLIBM-32

### Prerequisite

To run the testing script to check for correctness of RLIBM-32, you need to have installed MPFR and SoftPosit. SoftPosit can be installed via the instructions from https://gitlab.com/cerlane/SoftPosit.

 

### Setup

1. Create an environment variable SOFTPOSITPATH that points to the directory of SoftPosit:

export SOFTPOSITPATH=<path-to-softposit-directory>

  

2. Build the math library

cd <path-to-rlibm-32>

make

 

### Testing

1. To test the correctness of RLIBM-32's float functions, use the following command:

cd <path-to-rlibm-32>

cd rlibmCorrectnessTest/float/

./runAllParallel.sh -j <parallelism>

 

* Because the testing harness relies on MPFR math library to compute the correct result, the scripts can take hours to complete. In extreme case (sinpi(x)), it can take up to 24 hours to complete. Since there are a total of 10 float functions, we recommend parallelism of at least 4.

 

* Once the testing harness is complete, the results will be stored in `rlibmCorrectnessTest/float/Results/rlibm/` directory. 

 

2. To test the correctness of RLIBM-32's posit32 functions, use the following command:

cd <path-to-rlibm-32>

cd rlibmCorrectnessTest/posit32/

./runAllParallel.sh -j <parallelism>

 

* Because the testing harness relies on MPFR math library to compute the correct result, the scripts can take hours to complete. In extreme case (exp10(x)), it can take up to 12 hours to complete. Since there are a total of 8 float functions, we recommend parallelism of at least 4.

 

* Once the testing harness is complete, the results will be stored in `rlibmCorrectnessTest/posit32/Results/rlibm/` directory. 

 

# Testing Performance of Various Math Libraries

### Prerequisite

To run the testing script to check for performance, we recommend to also install Intel compiler (icc) via [this site ](https://software.intel.com/content/www/us/en/develop/tools/oneapi/hpc-toolkit/download.html).

 

1. Select the appropriate operating system

2. Select "Web & Local" distribution option

3. Select Online installer

4. On the right hand side (gray background) if you scroll down, it will show the steps to install. 

  * If your OS is Linux base, then you might use the command:

wget https://registrationcenter-download.intel.com/akdlm/irc_nas/17427/l_HPCKit_p_2021.1.0.2684.sh

bash l_HPCKit_p_2021.1.0.2684.sh

 

5. Follow the instruction. The installer will guide you through installing intel compiler.

  a. Make sure to install "Intel® oneAPI DPC++/C++ Compiler & Intel® C++ Compiler Classic." You can choose to not install any other components

  b. Make sure to remember the Installation directory

  c. If it shows you any warning about requiring the "Base toolkit" you can choose to ignore it.

    * The installation will take roughly 5~10 minutes.

 

6. Once installation is complete, run script to set variables:

cd <path to intel oneAPI directory>

. setvars.sh

 

### Setup

 

1. Create an environment variable ICCPATH that points to the directory of intel/oneapi directory. If you did not change the installation path while installing Intel compiler, then the path to Intel oneAPI directory will most likely end with "intel/oneapi":

export ICCPATH=<path to Intel oneAPI directory>

 

2. Build the math library

cd <path-to-rlibm-32>

make

 

3. To run the testing harness, we must first generate files containing oracle results. To generate oracle files for 32-bit float functions, 

export ORACLEPATH=<path to directory where you want to store oracle files for float functions>

cd <path to rlibm-32 directory>

make

cd GenerateOracleFiles/float

make

./runAll.sh

 

  * This step creates a number of <function name>Oracle files inside `ORACLEPATH`. Each oracle file is 16GB(4 bytes * 2^32) and there are 10 functions which requires a total of 160GB. This step will take roughly 1 hour.

 

4. To generate oracle files for 32-bit posit32 functions, 

export ORACLEPOSITPATH=<path to directory where you want to store oracle files for posit32 functions>

cd <path to rlibm-32 directory>

make

cd GenerateOracleFiles/posit32

make

./runAll.sh

 

  * This step creates a number of <function name>Oracle files inside `ORACLEPOSITPATH`. Each oracle file is 16GB(4 bytes * 2^32) and there are 9 functions which requires a total of 128GB. This step will take roughly 1 hour.

 

### TESTING

* To run a comprehensive testing suite, which tests the performance and correctness of glibc, intel, CR-LIBM, MetaLibm, and RLIBM-32 for float functions, use the pre-assembled testing script:

cd <path to rlibm-32 directory>

./runTestFloat.sh

./runTestPosit.sh

 

* Each test will output two lines of result. 

  1. The first line reports the number of cycles required to compute the function for all 2^32 inputs. Thus, to compute the average, you can use the reported number and divide by 2^32. 

  2. The second number reports the number of inputs that produce wrong results.

 

* Individual testing configuration (glibc, intel, CR-LIBM, MetaLibm, *or* RLIBM-32) is stored in its own directory in `testing/float/` (for float functions) or `testing/posit32/` (for posit32 functions). For example, if you want to test the correctness and performance of RLIBM-32's float functions built with gcc, you can use the following commands:

cd <path to rlibm-32 directory>

cd testing/float/glibc_rlibm_O3_flags

./runAll.sh

 

# How to use RLIBM-32 Tool to Generate Polynomials

### Prerequisite

1. *SoftPosit:* Please follow the instructions from the [SoftPosit GitLab](https://gitlab.com/cerlane/SoftPosit).

2. *Soplex 4.0.1:* Please download soplex-4.0.1 from https://soplex.zib.de/index.php#download

* Make sure that you're downloading version 4.0.1

$ tar -xvf soplex-4.0.1.tar

$ cd soplex-4.0.1

$ mkdir build

$ cd build

$ cmake ..

$ make

$ cd ../..

 

3. MPFR library, and zlib library (required from soplex). On Ubuntu systems, they can be installed with:

sudo apt-get install build-essential libgmp3-dev libmpfr-dev zlib1g zlib1g-dev

 

### Setup

Set environment variables to SoftPosit and Soplex:

$ export SOFTPOSITPATH=<path to SoftPosit directory>

$ export SOPLEXPATH=<path to soplex-4.0.1 directory>

 

### Generating Polynomials

There are two steps in generating polynomials: (1) Generating files containing reduced input and reduced intervals. (2) Based on the files, generate polynomials that produce results that satisfy the reduced input - interval constraints. Please note that the files containing reduced inputs and intervals are large. In extreme cases (i.e., exp10(x) for posit32), this file can be as large as ~80GB. Additionally, both the files and polynomials can take several hours (up to 24 hours for each function).

 

1. If you have more than 500GB of space, then you can use the existing script that automatically generates the files and polynomial. To produce the polynomials for float, use the following command:

cd <path to rlibm-32 directory>

./floatIntervalGen.sh

./floatFunctionGen.sh

 

The first script generates the files containing reduced inputs and intervals and puts it into the `intervals` directory. The second script generates correct polynomials for each function based on the generated files. The coefficients of the polynomials are saved into header files (`*.h`) in `functiongen/float` directory.

 

To produce the polynomials for posit32, use the following command:

cd <path to rlibm-32 directory>

./posit32FunctionGen.sh

./posit32IntervalGen.sh

 

The first script generates the files containing reduced inputs and intervals and puts it into the `intervals` directory. The second script generates correct polynomials for each function based on the generated files. The coefficients of the polynomials are saved into header files (`*.h`) in `functiongen/posit32` directory.

 

2. If you would like to generate polynomials for each function separately (to save space, etc) then follow the next instructions. We will use an example of generating the polynomials for ln(x) for float type. Other functions/types should be adjusted accordingly:

  a. Generate file containing reduced inputs/intervals information:

    cd <path to rlibm-32 directory>

    cd IntervalGen/float

    make

    ./Log10 FloatLog10Data

 

  b. Once the process finishes, use `FloatLog10Data` to generate the polynomial:

    cd <path to rlibm-32 directory>

    cd functiongen/float/

    make

    ./Log ../../IntervalGen/float/FloatLogData Log.log Log.h

 

   * The program requires three arguments. (1) The reduced inputs/intervals file, (2) A filename where the program writes logging information, and (3) A filename for the header file where the polynomial coefficients are stored.

   * Once the process finishes, `Log.h` can be found in `functiongen/float` and it will contain coefficients of the polynomials.

 

____________________________________________________________________________________

 

Instructions for PLDI2021Artifact:


Abstract:

We present the artifact for the accepted paper, High Performance
Correctly Rounded Math Libraries for 32-bit Floating Point
Representations. The artifact provides all necessary source code as
well as testing harness to show the correctness and speedup of the
proposed math library, RLibm-32. The readme documentation describes how
to install all necessary tools and explain the steps to run
experiments. We provided scripts to automatically execute each part of
the experiment. As fully evaluation the entire artifact will require
at least 4 days of work (even with some parallelism), we also provide
a list of scripts for the "SHORT" version of the artifact
evaluation. The short version will require roughly 2 hours to
complete. We highly recommend to use the short version instead of the
full version. However, for completeness, we also describe how to
completely evaluate the artifact towards the bottom of the readme
file.


_________________________________________________________________________________________________

* Important note:

Completely evaluating everythign in this artifact will take several
days (4+ days). There are a total of 4 phases in our evaluation:

    Computing reduced intervals
    Generating polynomial
    Checking for correctness
    Testing speedup

Thus, in this artifact, we provide a "short" version of each phase of
evaluation. The short version will take roughly 2 hours to
complete. We HIGHLY HIGHLY recommend you to evaluate the artifact
using the short version.

We also provide methods to perform full evaluation of the 3rd and 4th
phase (checking for correctness and testing speedup) without actually
requiring to generate polynomials. However, checking for correctness
requires roughly 24 hours to complete and testing the speedup of
RLibm-32 requires roughly 8-10 hours to complete.

In conclusion, we strongly recommend you to evaluate the artifact
using the short version.


_________________________________________________________________________________________________

* Hardware recommendation:

All of our evaluations were performed on a machine with 2.10GHz Intel
Xeon Gold 6230R and 187GB of RAM running Ubuntu 18.04. However, the
artifact should be executable using most modern machine with at least
16GB of memory. To run the "short" version, we recommend disk space of
at least 40GB to be safe. To run the "full" version, we recommend 
disk space of at least 200GB.

* Software recommendation:

We tested our artifact on Ubuntu-16.04 and Ubuntu-18.04. We provide
installation instructions to install all pre-requisite software.


_________________________________________________________________________________________________

* INSTALLATION GUIDE:

There are two methods to install the artifact: (1) Installing docker
and downloading the docker image for the artifact or (2) Manual
installation. We recommend to use the docker image provided to
expedite the installation process.

** DOWNLOADING PREBUILT DOCKER IMAGE

   We have prebuilt a docker image and hosted it in the docker hub.

(1) Install docker if not already installed by following the
installation documentation in this link:
https://docs.docker.com/install/

    We recommend installing docker and evaluating our artifact on
    a machine with Ubuntu. Although docker can be used with
    Windows or MacOS, docker will run on top of a linux VM,
    significantly slowing down all process. All docker commands
    may require "sudo" permission. If you get any errors when
    trying to use docker, prefix all commands using the "sudo"
    command

(2) Download the docker image

    $ docker pull jpl169/pldi2021artifact

    The docker image is roughly 5.61GB in size

(3) Run the docker image
    
    $ sudo docker run -it jpl169/pldi2021artifact

    You will be placed in the PLDI2021Artifact directory. You
    are now ready to evaluate the artifact. Go down to the
    EVALUATION GUIDE section for artifact evaluation.


(*) NOTE ON WORKING WITH DOCKER (COPYING FILES)

    Later in the EVALUATION GUIDE, you may want to copy files
    from the docker container to your host machine. This will
    allow you to compare or inspect files easier. To copy files
    from container to host machine, follow the steps below:

    (a) Remember the <path> and <file name> you wish to copy
    (b) exit from docker container by using the command

        $ exit

    (c) Find the ID of the container you were in

        $ docker container ls -a

    (d) Copy file from the host terminal:

        $ docker cp <container id>:<path>/<filename> <destination>

    For example, if you want to copy "Example.pdf" in the "/home/foo"
    directory from the container Id 12345 to your host machine's 
    current directory, the command will look like
        
        $ docker cp 12345:/home/foo/Example.pdf .

    (e) If you want to go back into the container, use the command

        $ docker start <container id>
        $ docker attach <container id>


** MANUAL INSTALLATION

(1) To install and evaluate the artifact manually, first install a list of prerequisites:

    $ sudo apt-get update
    $ sudo apt-get install build-essential cmake git libgmp3-dev libmpfr-dev zlib1g zlib1g-dev wget python python3 python3-pip parallel
    $ sudo python3 -m pip install matplotlib

(1.1) To prevent gnu parallel from messaging multiple times, run the code to agree with their term:
    $ parallel --citate
    -> type 'will cite'

(2) Download and install soplex 4.0.1:
    
    Please download soplex-4.0.1 from https://soplex.zib.de/index.php#download
    ** Make sure that you're downloading version 4.0.1 **

    $ tar -xvf soplex-4.0.1.tar
    $ cd soplex-4.0.1
    $ mkdir build
    $ cd build
    $ cmake ..
    $ make
    $ cd ../..

(3) Download and compile SoftPosit :
    $ git clone https://gitlab.com/cerlane/SoftPosit.git
    $ cd SoftPosit
    $ git checkout 983e821dd30b9da5467bcbea2895fdbacc1ce264
        $ cd build/Linux-x86_64-GCC/
        $ make
        $ cd ../../..

(4) Download and install Intel oneAPI Base Toolkit

    * You can access it through this site:
      https://software.intel.com/content/www/us/en/develop/tools/oneapi/hpc-toolkit/download.html

    (a) Select the appropriate operating system
    (b) Select "Web & Local" distribution option
    (c) Select Online installer
    (d) On the right hand side (gray background) if you scroll
    down, it will show the steps to install. 

    If your OS is Linux base, then you might use the command:

    $ wget https://registrationcenter-download.intel.com/akdlm/irc_nas/17427/l_HPCKit_p_2021.1.0.2684.sh

    $ bash l_HPCKit_p_2021.1.0.2684.sh

    Follow the instruction. The installer will guide you through 
    installing intel compiler.
    
    (a) Make sure to install "Intel® oneAPI DPC++/C++ Compiler &
    Intel® C++ Compiler Classic." Choose to not install any other
    components
    (b) Make sure to remember the Installation directory
    (c) If it shows you any warning about requiring the "Base
    toolkit" you can choose to ignore it.

    The installation will take roughly 5~10 minutes.

    Once installation is complete, run script to set variables:
    $ cd <path to intel oneAPI directory>
    $ . setvars.sh

(5) Set the following environment variables:
    $ export SOFTPOSITPATH=<path to SoftPosit directory>
    $ export SOPLEXPATH=<path to soplex-4.0.1 directory>
    $ export ICCPATH=<path to Intel oneAPI directory>
    
    * If you did not change the installation path while installing 
      Intel compiler, then the path to Intel oneAPI directory will 
      most likely end with "intel/oneapi"

(6.a) Download PLDI21 artifact from github
    $ git clone https://github.com/jpl169/PLDI2021Artifact
    $ cd PLDI2021Artifact

(6.b) Or Download PLDI2021 artifact from Zenodo and untar the file:
    - Zenodo page: https://doi.org/10.5281/zenodo.4579410
    $ tar -xvf PLDI2021Artifact.tar
    $ cd PLDI2021Artifact

(7) Installation is complete. Go to EVALUATION GUIDE section.


_________________________________________________________________________________________________

* EVALUATION GUIDE FOR THE ARTIFACT

    If you have followed one of the two installation methods in
    the INSTALLATION GUIDE section, you should be in the
    PLDI2021Artifact directory. As mentioned before, we provide
    both "short" version and "long" version of artifact
    evaluation. We highly recommend to try the "short" version, as
    the "long" version will take several days to complete. We
    first describe how to evaluate our artifact using the "short"
    version. Then we will explain the "long version"


_________________________________________________________________________________________________

* EVALUATION GUIDE (SHORT VERSION)

(1) Computing reduced intervals

    In this part, we will compute reduced intervals necessary to
    generate polynomials of sinh(x) and cosh(x) for float
    representation. To execute this step, run the command

    $ ./runFloatIntervalGen_Short.sh

    This process should take roughly 30min. Once this process
    finishes, you will find that there is a new folder "intervals"
    which contains two files, FloatCoshData and FloatSinhData
    which are necessary for the next step

(2) Generating polynomial (Corresponding to Table 2)

    In this part, we generate the polynomials based on the reduced
    intervals we computed in the previous step. To execute this
    step, run the command
        
    $ ./runFloatFunctionGen_Short.sh

    This process should take roughly 15min ~ 30min. Once this
    process finishes, you will find that there are two files
    "Sinh.h" and "Cosh.h" in the directory
    include/float_headers. You can check the file against the
    reference header files in
    include-reference/float_headers. Although each value in the
    array may slightly differ, the size of the array should be
    exactly the same.

(3) Checking the correctness of RLibm-32 (Corresponding to Table 1)

    In this part, we check the correctness of RLibm-32 functions. To
    expedite the process, we select one million uniformly
    distributed input points. Thus, the exact result and number
    will be different from the result in the paper. To check the
    correctness of math library functions for float, run the
    command

    $ ./runFloatLibTest_reference_Short.sh

    This process should take roughly 1 minute. Once this process
    finishes, it will print out whether RLibm-32, GLibc, and Intel
    math library produces correct results. For all functions, you
    should see the message: "RLibm-32 returns correct result for all
    inputs." You should also see that for a number of functions,
    GLibc and Intel's float math library does not produce correct
    results. Because we only tested with 1 million inputs, the
    number of inputs that produce wrong results will be different
    from the paper.

    To check the correctness of math library functions for
    posit32, run the command

    $ ./runPosit32LibTest_reference_short.sh

    This process should take roughly 1 minute

    Again, for each function, you should see the message: "RLibm-32
    returns correct result for all inputs." Because we did not
    test all inputs, it will also show that both GLibc and Intel's
    double library produces correct result as well, unlike what is
    reported from the paper.

(4) Speedup of RLibm-32 (Corresponding to Figure 3 and Line 1134)

    * NOTE: Since the submission of the paper, Intel has
    completely changed the toolkit for installing Intel's
    compiler. Thus, the speedup result may differ from what's
    reported in the paper. We will update the result in the final
    version accordingly.

    In this part, we check the speedup of RLibm-32 against Glibc and
    Intel math library. To expedite the process, we select one
    million uniformly distributed input points again. To check the
    speed of RLibm-32's float functions, run the command

    $ ./runFloatOverheadTest_reference_Short.sh

    This process should take roughly 1 minute

    Once this process finishes, it will create two files, 
    (a) floatAgainstGlibcShort.pdf
    (b) floatAgainstIntelShort.pdf
    
    These two graphs correspond to Figure 3(a) and Figure 3(b),
    respectively. Although the numbers may look different, it
    should still be the case that the bars are above 1x speedup in
    most cases, and average is above 1x speedup in all cases.

    To check the speedup of RLibm-32's posit32 functions, run the
    command

    $ ./runPosit32OverheadTest_reference_Short.sh

    This process should take roughly 1 minute

    Once this process finishes, it will print the speedup of
    RLibm-32 against Glibc and Intel's double math library. The
    speedup will be different because it only tests 1 million
    points, but speedup should be roughly 1x ~ 1.15x.

 

_________________________________________________________________________________________________

* EVALUATION GUIDE (LONG VERSION)

    * NOTE: Before beginning the long version, PLEASE understand
    that the whole process will take roughly 4 days or longer. If
    you do not wish to spend so much time, consider doing the
    SHORT version of the evaluation

(1) Computing reduced intervals for float functions

    This process computes the reduced intervals for all 10
    RLibm-32's float functions. To execute this phase, run the
    command

    $ ./runFloatIntervalGen.sh

    This script will automatically run the underlying scripts in
    parallel. By default, the parallelism is 2. If you would like
    to increase the parallelism, use the "-j" flag:

    $ ./runFloatIntervalGen.sh -j <# jobs>

    However, since this phase is I/O heavy, we recommend to limit
    parallelism to 4.

    This process will take roughly 12-13 hours. Once this process
    finishes, you will find that there is a new folder "intervals"
    which contains 10 files, which are necessary files to generate
    polynomials for each float functions.

(2) Generating polynomial for float functions (Corresponding to Table 2)

    In this part, we generate the polynomials based on the reduced
    intervals we computed in the previous step. To execute this
    step, run the command
        
    $ ./runFloatFunctionGen.sh

    This script will automatically run the underlying scripts in
    parallel. By default, the parallelism is 2. If you would like
    to increase the parallelism, use the "-j" flag:

    $ ./runFloatFunctionGen.sh -j <# jobs>

    However, since this phase is I/O heavy, we recommend to limit
    parallelism to 4.

    This process should take roughly 5-6 hours. Once this process
    finishes, you will find that there are header files in
    include/float_headers. You can check the file against the
    reference header files in
    include-reference/float_headers. Although each value in the
    array may slightly differ, the size of the array should be
    exactly the same.

(3) Computing reduced intervals for posit32 functions

    This process computes the reduced intervals for all 3 RLibm-32's
    posit32 functions. To execute this phase, run the command

    $ ./runPosit32IntervalGen.sh

    This script will automatically run the underlying scripts in
    parallel. By default, the parallelism is 2. If you would like
    to increase the parallelism, use the "-j" flag:

    $ ./runFloatIntervalGen.sh -j <# jobs>

    However, since there are only 3 jobs total, the maximum should
    be 3.

    This process will take roughly 12-13 hours. Once this process
    finishes, you will find that there is a new folder "intervals"
    which contains 3 posit32 data files, which are necessary files
    to generate polynomials for each posit32 functions.

(4) Generating polynomial for posit32 functions (Corresponding to Table 2)

    In this part, we generate the polynomials based on the reduced
    intervals we computed in the previous step. To execute this
    step, run the command
        
    $ ./runPosit32FunctionGen.sh

    This script will automatically run the underlying scripts 
    in parallel. By default, the parallelism is 3. If you would like
    to increase the parallelism, use the "-j" flag:

    $ ./runFloatFunctionGen.sh -j <# jobs>

    However, since there are only 3 jobs total, the maximum should
    be 3.

    This process should take roughly 1 hour. Once this
    process finishes, you will find that there are header files in
    include/posit32_headers. You can check the file against the
    reference header files in
    include-reference/posit32_headers. Although each value in the
    array may slightly differ, the size of the array should be
    exactly the same.

(5) Checking the correctness of RLibm-32 (Corresponding to Table 1)

    In this phase, we check the correctness of RLibm-32 functions.
    Depending on whether you generated the header files in the 
    previous (1)-(4) step, you should use different scripts.

    (a) If you did generate the polynomials, you can check 
    correctness of the generated polynomials.

    (b) If you did not generate the polynomials, you can check
    correctness of the reference polynomial that we generated.

    To check the correctness of the polynomials YOU GENERATED
    for float, run the command: 

    $ ./runFloatLibTest.sh

    To check the correctness of the REFERENCE polynomial for
    float, run the command:

    $ ./runFloatLibTest_reference.sh

    This script will automatically run the underlying scripts in
    parallel. By default, the parallelism is 4. If you would like
    to increase/decrease the parallelism, use the "-j" flag.

    This process should take roughly 12 hours. Once this process
    finishes, it will print out whether RLibm-32, GLibc, and Intel
    math library produces correct results. For all functions, you
    should see the message: "RLibm-32 returns correct result for all
    inputs." You should also see that for a number of functions,
    GLibc and Intel's float math library does not produce correct
    results.

    * Note: Because our oracle for Sinpi and Cospi are
    exceptionally slow, we only check the result of inputs x = 0
    to 2.0 for Sinpi and Cospi. This is sufficient to extrapolate
    the result of Sinpi and Cospi to all other inputs, due to the
    properties of Sinpi and Cospi.

    To check the correctness of the polynomials YOU GENERATED for
    posit32, run the command:

    $ ./runPosit32LibTest.sh

    To check the correctness of the REFERENCE polynomial for
    posit32, run the command:

    $ ./runPosit32LibTest_reference.sh

    This script will automatically run the underlying scripts in
    parallel. By default, the parallelism is 3. If you would like
    to increase/decrease the parallelism, use the "-j" flag.  Note
    that the script only runs 3 jobs. So the maximum parallelism
    is 3.

    This process should take roughly 10 hours. Once this process
    finishes, it will print out whether RLibm-32, GLibc, and Intel
    math library produces correct results. For all functions, you
    should see the message: "RLibm-32 returns correct result for all
    inputs." You should also see that for a number of functions,
    GLibc and Intel's float math library does not produce correct
    results.

(6) Speedup of RLibm-32 (Corresponding to Figure 3 and Line 1134)

    * NOTE: Since the submission of the paper, Intel has
    completely changed the toolkit for installing Intel's
    compiler. Thus, the speedup result may differ from what's
    reported in the paper. We will update the result in the final
    version accordingly.

    In this part, we check the speedup of RLibm-32 against Intel and
    Glibc math library. Depending on whether you generated the
    header files in the previous (1)-(4) step, you should use
    different scripts.

    (a) If you did generate the polynomials, you can check speedup
    of the generated polynomials.

    (b) If you did not generate the polynomials, you can check
    speedup of the reference polynomial that we generated.

    To check the speedup of the polynomials YOU GENERATED for
    float, run the command:

    $ ./runFloatOverheadTest.sh

    To check the speedup of the REFERENCE polynomials for float,
    run the command:

    $ ./runFloatOverheadTest_reference.sh

    This script will automatically run the underlying scripts in
    parallel. By default, the parallelism is 1. If you would like
    to increase/decrease the parallelism, use the "-j" flag. We
    recommend to keep the parallelism low, to roughly 2-4.

    This process should take roughly 5-6 hours. Once this process
    finishes, it will create two files,
    (a) floatAgainstGlibc.pdf
    (b) floatAgainstIntel.pdf
    
    These two graphs correspond to Figure 3(a) and Figure 3(b),
    respectively. Although the numbers may look different, it
    should still be the case that the bars are above 1x speedup in
    most cases, and average is above 1x speedup in all cases.

    To check the speedup of the polynomials YOU GENERATED for
    posit32, run the command:

    $ ./runPosit32OverheadTest.sh

    To check the speedup of the REFERENCE polynomial for posit32,
    run the command:

    $ ./runPosit32OverheadTest_reference.sh

    This script will automatically run the underlying scripts in
    parallel. By default, the parallelism is 1. If you would like
    to increase/decrease the parallelism, use the "-j" flag. We
    recommend to keep the parallelism low, to roughly 2-3.

    This process should take roughly 2 hours. Once this process
    finishes, it will print the speedup of RLibm-32 against Glibc
    and Intel's double math library. The speedup may be different
    from reported from the paper, but speedup should be roughly 1x
    ~ 1.15x.

Files

Files (92.6 MB)

Name Size Download all
md5:dca8d559ac6713ca9cb11a93239be717
4.8 MB Download
md5:cd77974db0fb39d290a4c803c370328d
87.8 MB Download