High Performance Correctly Rounded Math Libraries for 32-bit Floating Point Representations
Description
There are two files in this archive.
PLDI2021Artifact.tar: This is the artifact submission version. It includes 10 float functions and 3 posit32 functions. If you would like to evaluate exactly what was there when the artifact was submitted for consideration, please use this file.
rlibm-32.tar: This is the modified and extended version after the artifact was submitted. It includes 10 float functions and 8 posit32 functions.
Instructions for rlibm-32:
RLIBM-32 is both a math library that provides correctly rounded result for all inputs and tools used to generate the correct polynomials. The techniques behind the tools will be appearing at PLDI 2021. Currently, RLIBM-32 supports a number of elementary functions for float and posit32 representations.
### List of float functions supported by RLIBM-32
1. log(x), log2(x), log10(x)
2. exp(x), exp2(x), exp10(x)
3. sinh(x), cosh(x)
4. sinpi(x), cospi(x)
### List of posit32 functions supported by RLIBM-32
1. log(x), log2(x), log10(x)
2. exp(x), exp2(x), exp10(x)
3. sinh(x), cosh(x)
# Getting started with RLIBM-32:
There are various pre-requisites for using RLIBM-32 math library, testing it, or generating polynomials.
We describe the pre-requisites in each section. Alternatively, we have set up a docker image that contains the pre requisites and environment variables set up already.
### Using Docker Image
1. Install docker if not already installed by following the installation documentation in this link: https://docs.docker.com/install/
2. Download the docker image
docker pull jpl169/rlibm-32
* The docker image is roughly ~6GB in size
3. Run the docker image
sudo docker run -it jpl169/rlibm-32
### Manual Installation
In each section (using math library, testing, generating polynomial) we list the pre-requisites and how to set up.
# How to build and use RLIBM-32 math library
### Prerequisite
If you want to compile the math library for posit32, you have to install SoftPosit. Please follow the instructions from https://gitlab.com/cerlane/SoftPosit.
### Setup
1. Create an environment variable SOFTPOSITPATH that points to the directory of SoftPosit:
export SOFTPOSITPATH=<path-to-softposit-directory>
2. Build the math library
1. If you want to build all the math libraries, simply use make rule from the root directory
cd <path-to-rlibm-32>
make
2. If you want to build math libraries for each representation separately, you can use these make rule
cd <path-to-rlibm-32>
make floatmlib
make posit32mlib
### USAGE
The math library will be located in the `lib` directory.
* floatMathLib.a : math library for float
* posit32MathLib.a : math library for posit32.
The header files for each library is located in the include directory:
* `float_math.h` : header for float math library
* `posit32_math.h` : header for posit32 math library
You can use our library in the code similar to how standard math library is used, except our function names start with "rlibm_":
test.cpp:
#include "float_math.h"
int main() {
float result = rlibm_cospi(1.5f);
return 0;
}
To build the program, include the math library in the compilation command:
g++ test.cpp -I<path-to-rlibm-32>/include/ <path-to-rlibm-32>/lib/floatMathLib.a -lm -o test
Currently, RLIBM-32 uses some functions from the default math library for range reduction (i.e., to decompose a floating point value into the integral part and fractional part) so make sure to include `-lm` flag.
# Testing Correctness of RLIBM-32
### Prerequisite
To run the testing script to check for correctness of RLIBM-32, you need to have installed MPFR and SoftPosit. SoftPosit can be installed via the instructions from https://gitlab.com/cerlane/SoftPosit.
### Setup
1. Create an environment variable SOFTPOSITPATH that points to the directory of SoftPosit:
export SOFTPOSITPATH=<path-to-softposit-directory>
2. Build the math library
cd <path-to-rlibm-32>
make
### Testing
1. To test the correctness of RLIBM-32's float functions, use the following command:
cd <path-to-rlibm-32>
cd rlibmCorrectnessTest/float/
./runAllParallel.sh -j <parallelism>
* Because the testing harness relies on MPFR math library to compute the correct result, the scripts can take hours to complete. In extreme case (sinpi(x)), it can take up to 24 hours to complete. Since there are a total of 10 float functions, we recommend parallelism of at least 4.
* Once the testing harness is complete, the results will be stored in `rlibmCorrectnessTest/float/Results/rlibm/` directory.
2. To test the correctness of RLIBM-32's posit32 functions, use the following command:
cd <path-to-rlibm-32>
cd rlibmCorrectnessTest/posit32/
./runAllParallel.sh -j <parallelism>
* Because the testing harness relies on MPFR math library to compute the correct result, the scripts can take hours to complete. In extreme case (exp10(x)), it can take up to 12 hours to complete. Since there are a total of 8 float functions, we recommend parallelism of at least 4.
* Once the testing harness is complete, the results will be stored in `rlibmCorrectnessTest/posit32/Results/rlibm/` directory.
# Testing Performance of Various Math Libraries
### Prerequisite
To run the testing script to check for performance, we recommend to also install Intel compiler (icc) via [this site ](https://software.intel.com/content/www/us/en/develop/tools/oneapi/hpc-toolkit/download.html).
1. Select the appropriate operating system
2. Select "Web & Local" distribution option
3. Select Online installer
4. On the right hand side (gray background) if you scroll down, it will show the steps to install.
* If your OS is Linux base, then you might use the command:
wget https://registrationcenter-download.intel.com/akdlm/irc_nas/17427/l_HPCKit_p_2021.1.0.2684.sh
bash l_HPCKit_p_2021.1.0.2684.sh
5. Follow the instruction. The installer will guide you through installing intel compiler.
a. Make sure to install "Intel® oneAPI DPC++/C++ Compiler & Intel® C++ Compiler Classic." You can choose to not install any other components
b. Make sure to remember the Installation directory
c. If it shows you any warning about requiring the "Base toolkit" you can choose to ignore it.
* The installation will take roughly 5~10 minutes.
6. Once installation is complete, run script to set variables:
cd <path to intel oneAPI directory>
. setvars.sh
### Setup
1. Create an environment variable ICCPATH that points to the directory of intel/oneapi directory. If you did not change the installation path while installing Intel compiler, then the path to Intel oneAPI directory will most likely end with "intel/oneapi":
export ICCPATH=<path to Intel oneAPI directory>
2. Build the math library
cd <path-to-rlibm-32>
make
3. To run the testing harness, we must first generate files containing oracle results. To generate oracle files for 32-bit float functions,
export ORACLEPATH=<path to directory where you want to store oracle files for float functions>
cd <path to rlibm-32 directory>
make
cd GenerateOracleFiles/float
make
./runAll.sh
* This step creates a number of <function name>Oracle files inside `ORACLEPATH`. Each oracle file is 16GB(4 bytes * 2^32) and there are 10 functions which requires a total of 160GB. This step will take roughly 1 hour.
4. To generate oracle files for 32-bit posit32 functions,
export ORACLEPOSITPATH=<path to directory where you want to store oracle files for posit32 functions>
cd <path to rlibm-32 directory>
make
cd GenerateOracleFiles/posit32
make
./runAll.sh
* This step creates a number of <function name>Oracle files inside `ORACLEPOSITPATH`. Each oracle file is 16GB(4 bytes * 2^32) and there are 9 functions which requires a total of 128GB. This step will take roughly 1 hour.
### TESTING
* To run a comprehensive testing suite, which tests the performance and correctness of glibc, intel, CR-LIBM, MetaLibm, and RLIBM-32 for float functions, use the pre-assembled testing script:
cd <path to rlibm-32 directory>
./runTestFloat.sh
./runTestPosit.sh
* Each test will output two lines of result.
1. The first line reports the number of cycles required to compute the function for all 2^32 inputs. Thus, to compute the average, you can use the reported number and divide by 2^32.
2. The second number reports the number of inputs that produce wrong results.
* Individual testing configuration (glibc, intel, CR-LIBM, MetaLibm, *or* RLIBM-32) is stored in its own directory in `testing/float/` (for float functions) or `testing/posit32/` (for posit32 functions). For example, if you want to test the correctness and performance of RLIBM-32's float functions built with gcc, you can use the following commands:
cd <path to rlibm-32 directory>
cd testing/float/glibc_rlibm_O3_flags
./runAll.sh
# How to use RLIBM-32 Tool to Generate Polynomials
### Prerequisite
1. *SoftPosit:* Please follow the instructions from the [SoftPosit GitLab](https://gitlab.com/cerlane/SoftPosit).
2. *Soplex 4.0.1:* Please download soplex-4.0.1 from https://soplex.zib.de/index.php#download
* Make sure that you're downloading version 4.0.1
$ tar -xvf soplex-4.0.1.tar
$ cd soplex-4.0.1
$ mkdir build
$ cd build
$ cmake ..
$ make
$ cd ../..
3. MPFR library, and zlib library (required from soplex). On Ubuntu systems, they can be installed with:
sudo apt-get install build-essential libgmp3-dev libmpfr-dev zlib1g zlib1g-dev
### Setup
Set environment variables to SoftPosit and Soplex:
$ export SOFTPOSITPATH=<path to SoftPosit directory>
$ export SOPLEXPATH=<path to soplex-4.0.1 directory>
### Generating Polynomials
There are two steps in generating polynomials: (1) Generating files containing reduced input and reduced intervals. (2) Based on the files, generate polynomials that produce results that satisfy the reduced input - interval constraints. Please note that the files containing reduced inputs and intervals are large. In extreme cases (i.e., exp10(x) for posit32), this file can be as large as ~80GB. Additionally, both the files and polynomials can take several hours (up to 24 hours for each function).
1. If you have more than 500GB of space, then you can use the existing script that automatically generates the files and polynomial. To produce the polynomials for float, use the following command:
cd <path to rlibm-32 directory>
./floatIntervalGen.sh
./floatFunctionGen.sh
The first script generates the files containing reduced inputs and intervals and puts it into the `intervals` directory. The second script generates correct polynomials for each function based on the generated files. The coefficients of the polynomials are saved into header files (`*.h`) in `functiongen/float` directory.
To produce the polynomials for posit32, use the following command:
cd <path to rlibm-32 directory>
./posit32FunctionGen.sh
./posit32IntervalGen.sh
The first script generates the files containing reduced inputs and intervals and puts it into the `intervals` directory. The second script generates correct polynomials for each function based on the generated files. The coefficients of the polynomials are saved into header files (`*.h`) in `functiongen/posit32` directory.
2. If you would like to generate polynomials for each function separately (to save space, etc) then follow the next instructions. We will use an example of generating the polynomials for ln(x) for float type. Other functions/types should be adjusted accordingly:
a. Generate file containing reduced inputs/intervals information:
cd <path to rlibm-32 directory>
cd IntervalGen/float
make
./Log10 FloatLog10Data
b. Once the process finishes, use `FloatLog10Data` to generate the polynomial:
cd <path to rlibm-32 directory>
cd functiongen/float/
make
./Log ../../IntervalGen/float/FloatLogData Log.log Log.h
* The program requires three arguments. (1) The reduced inputs/intervals file, (2) A filename where the program writes logging information, and (3) A filename for the header file where the polynomial coefficients are stored.
* Once the process finishes, `Log.h` can be found in `functiongen/float` and it will contain coefficients of the polynomials.
____________________________________________________________________________________
Instructions for PLDI2021Artifact:
Abstract:
We present the artifact for the accepted paper, High Performance
Correctly Rounded Math Libraries for 32-bit Floating Point
Representations. The artifact provides all necessary source code as
well as testing harness to show the correctness and speedup of the
proposed math library, RLibm-32. The readme documentation describes how
to install all necessary tools and explain the steps to run
experiments. We provided scripts to automatically execute each part of
the experiment. As fully evaluation the entire artifact will require
at least 4 days of work (even with some parallelism), we also provide
a list of scripts for the "SHORT" version of the artifact
evaluation. The short version will require roughly 2 hours to
complete. We highly recommend to use the short version instead of the
full version. However, for completeness, we also describe how to
completely evaluate the artifact towards the bottom of the readme
file.
_________________________________________________________________________________________________
* Important note:
Completely evaluating everythign in this artifact will take several
days (4+ days). There are a total of 4 phases in our evaluation:
Computing reduced intervals
Generating polynomial
Checking for correctness
Testing speedup
Thus, in this artifact, we provide a "short" version of each phase of
evaluation. The short version will take roughly 2 hours to
complete. We HIGHLY HIGHLY recommend you to evaluate the artifact
using the short version.
We also provide methods to perform full evaluation of the 3rd and 4th
phase (checking for correctness and testing speedup) without actually
requiring to generate polynomials. However, checking for correctness
requires roughly 24 hours to complete and testing the speedup of
RLibm-32 requires roughly 8-10 hours to complete.
In conclusion, we strongly recommend you to evaluate the artifact
using the short version.
_________________________________________________________________________________________________
* Hardware recommendation:
All of our evaluations were performed on a machine with 2.10GHz Intel
Xeon Gold 6230R and 187GB of RAM running Ubuntu 18.04. However, the
artifact should be executable using most modern machine with at least
16GB of memory. To run the "short" version, we recommend disk space of
at least 40GB to be safe. To run the "full" version, we recommend
disk space of at least 200GB.
* Software recommendation:
We tested our artifact on Ubuntu-16.04 and Ubuntu-18.04. We provide
installation instructions to install all pre-requisite software.
_________________________________________________________________________________________________
* INSTALLATION GUIDE:
There are two methods to install the artifact: (1) Installing docker
and downloading the docker image for the artifact or (2) Manual
installation. We recommend to use the docker image provided to
expedite the installation process.
** DOWNLOADING PREBUILT DOCKER IMAGE
We have prebuilt a docker image and hosted it in the docker hub.
(1) Install docker if not already installed by following the
installation documentation in this link:
https://docs.docker.com/install/
We recommend installing docker and evaluating our artifact on
a machine with Ubuntu. Although docker can be used with
Windows or MacOS, docker will run on top of a linux VM,
significantly slowing down all process. All docker commands
may require "sudo" permission. If you get any errors when
trying to use docker, prefix all commands using the "sudo"
command
(2) Download the docker image
$ docker pull jpl169/pldi2021artifact
The docker image is roughly 5.61GB in size
(3) Run the docker image
$ sudo docker run -it jpl169/pldi2021artifact
You will be placed in the PLDI2021Artifact directory. You
are now ready to evaluate the artifact. Go down to the
EVALUATION GUIDE section for artifact evaluation.
(*) NOTE ON WORKING WITH DOCKER (COPYING FILES)
Later in the EVALUATION GUIDE, you may want to copy files
from the docker container to your host machine. This will
allow you to compare or inspect files easier. To copy files
from container to host machine, follow the steps below:
(a) Remember the <path> and <file name> you wish to copy
(b) exit from docker container by using the command
$ exit
(c) Find the ID of the container you were in
$ docker container ls -a
(d) Copy file from the host terminal:
$ docker cp <container id>:<path>/<filename> <destination>
For example, if you want to copy "Example.pdf" in the "/home/foo"
directory from the container Id 12345 to your host machine's
current directory, the command will look like
$ docker cp 12345:/home/foo/Example.pdf .
(e) If you want to go back into the container, use the command
$ docker start <container id>
$ docker attach <container id>
** MANUAL INSTALLATION
(1) To install and evaluate the artifact manually, first install a list of prerequisites:
$ sudo apt-get update
$ sudo apt-get install build-essential cmake git libgmp3-dev libmpfr-dev zlib1g zlib1g-dev wget python python3 python3-pip parallel
$ sudo python3 -m pip install matplotlib
(1.1) To prevent gnu parallel from messaging multiple times, run the code to agree with their term:
$ parallel --citate
-> type 'will cite'
(2) Download and install soplex 4.0.1:
Please download soplex-4.0.1 from https://soplex.zib.de/index.php#download
** Make sure that you're downloading version 4.0.1 **
$ tar -xvf soplex-4.0.1.tar
$ cd soplex-4.0.1
$ mkdir build
$ cd build
$ cmake ..
$ make
$ cd ../..
(3) Download and compile SoftPosit :
$ git clone https://gitlab.com/cerlane/SoftPosit.git
$ cd SoftPosit
$ git checkout 983e821dd30b9da5467bcbea2895fdbacc1ce264
$ cd build/Linux-x86_64-GCC/
$ make
$ cd ../../..
(4) Download and install Intel oneAPI Base Toolkit
* You can access it through this site:
https://software.intel.com/content/www/us/en/develop/tools/oneapi/hpc-toolkit/download.html
(a) Select the appropriate operating system
(b) Select "Web & Local" distribution option
(c) Select Online installer
(d) On the right hand side (gray background) if you scroll
down, it will show the steps to install.
If your OS is Linux base, then you might use the command:
$ wget https://registrationcenter-download.intel.com/akdlm/irc_nas/17427/l_HPCKit_p_2021.1.0.2684.sh
$ bash l_HPCKit_p_2021.1.0.2684.sh
Follow the instruction. The installer will guide you through
installing intel compiler.
(a) Make sure to install "Intel® oneAPI DPC++/C++ Compiler &
Intel® C++ Compiler Classic." Choose to not install any other
components
(b) Make sure to remember the Installation directory
(c) If it shows you any warning about requiring the "Base
toolkit" you can choose to ignore it.
The installation will take roughly 5~10 minutes.
Once installation is complete, run script to set variables:
$ cd <path to intel oneAPI directory>
$ . setvars.sh
(5) Set the following environment variables:
$ export SOFTPOSITPATH=<path to SoftPosit directory>
$ export SOPLEXPATH=<path to soplex-4.0.1 directory>
$ export ICCPATH=<path to Intel oneAPI directory>
* If you did not change the installation path while installing
Intel compiler, then the path to Intel oneAPI directory will
most likely end with "intel/oneapi"
(6.a) Download PLDI21 artifact from github
$ git clone https://github.com/jpl169/PLDI2021Artifact
$ cd PLDI2021Artifact
(6.b) Or Download PLDI2021 artifact from Zenodo and untar the file:
- Zenodo page: https://doi.org/10.5281/zenodo.4579410
$ tar -xvf PLDI2021Artifact.tar
$ cd PLDI2021Artifact
(7) Installation is complete. Go to EVALUATION GUIDE section.
_________________________________________________________________________________________________
* EVALUATION GUIDE FOR THE ARTIFACT
If you have followed one of the two installation methods in
the INSTALLATION GUIDE section, you should be in the
PLDI2021Artifact directory. As mentioned before, we provide
both "short" version and "long" version of artifact
evaluation. We highly recommend to try the "short" version, as
the "long" version will take several days to complete. We
first describe how to evaluate our artifact using the "short"
version. Then we will explain the "long version"
_________________________________________________________________________________________________
* EVALUATION GUIDE (SHORT VERSION)
(1) Computing reduced intervals
In this part, we will compute reduced intervals necessary to
generate polynomials of sinh(x) and cosh(x) for float
representation. To execute this step, run the command
$ ./runFloatIntervalGen_Short.sh
This process should take roughly 30min. Once this process
finishes, you will find that there is a new folder "intervals"
which contains two files, FloatCoshData and FloatSinhData
which are necessary for the next step
(2) Generating polynomial (Corresponding to Table 2)
In this part, we generate the polynomials based on the reduced
intervals we computed in the previous step. To execute this
step, run the command
$ ./runFloatFunctionGen_Short.sh
This process should take roughly 15min ~ 30min. Once this
process finishes, you will find that there are two files
"Sinh.h" and "Cosh.h" in the directory
include/float_headers. You can check the file against the
reference header files in
include-reference/float_headers. Although each value in the
array may slightly differ, the size of the array should be
exactly the same.
(3) Checking the correctness of RLibm-32 (Corresponding to Table 1)
In this part, we check the correctness of RLibm-32 functions. To
expedite the process, we select one million uniformly
distributed input points. Thus, the exact result and number
will be different from the result in the paper. To check the
correctness of math library functions for float, run the
command
$ ./runFloatLibTest_reference_Short.sh
This process should take roughly 1 minute. Once this process
finishes, it will print out whether RLibm-32, GLibc, and Intel
math library produces correct results. For all functions, you
should see the message: "RLibm-32 returns correct result for all
inputs." You should also see that for a number of functions,
GLibc and Intel's float math library does not produce correct
results. Because we only tested with 1 million inputs, the
number of inputs that produce wrong results will be different
from the paper.
To check the correctness of math library functions for
posit32, run the command
$ ./runPosit32LibTest_reference_short.sh
This process should take roughly 1 minute
Again, for each function, you should see the message: "RLibm-32
returns correct result for all inputs." Because we did not
test all inputs, it will also show that both GLibc and Intel's
double library produces correct result as well, unlike what is
reported from the paper.
(4) Speedup of RLibm-32 (Corresponding to Figure 3 and Line 1134)
* NOTE: Since the submission of the paper, Intel has
completely changed the toolkit for installing Intel's
compiler. Thus, the speedup result may differ from what's
reported in the paper. We will update the result in the final
version accordingly.
In this part, we check the speedup of RLibm-32 against Glibc and
Intel math library. To expedite the process, we select one
million uniformly distributed input points again. To check the
speed of RLibm-32's float functions, run the command
$ ./runFloatOverheadTest_reference_Short.sh
This process should take roughly 1 minute
Once this process finishes, it will create two files,
(a) floatAgainstGlibcShort.pdf
(b) floatAgainstIntelShort.pdf
These two graphs correspond to Figure 3(a) and Figure 3(b),
respectively. Although the numbers may look different, it
should still be the case that the bars are above 1x speedup in
most cases, and average is above 1x speedup in all cases.
To check the speedup of RLibm-32's posit32 functions, run the
command
$ ./runPosit32OverheadTest_reference_Short.sh
This process should take roughly 1 minute
Once this process finishes, it will print the speedup of
RLibm-32 against Glibc and Intel's double math library. The
speedup will be different because it only tests 1 million
points, but speedup should be roughly 1x ~ 1.15x.
_________________________________________________________________________________________________
* EVALUATION GUIDE (LONG VERSION)
* NOTE: Before beginning the long version, PLEASE understand
that the whole process will take roughly 4 days or longer. If
you do not wish to spend so much time, consider doing the
SHORT version of the evaluation
(1) Computing reduced intervals for float functions
This process computes the reduced intervals for all 10
RLibm-32's float functions. To execute this phase, run the
command
$ ./runFloatIntervalGen.sh
This script will automatically run the underlying scripts in
parallel. By default, the parallelism is 2. If you would like
to increase the parallelism, use the "-j" flag:
$ ./runFloatIntervalGen.sh -j <# jobs>
However, since this phase is I/O heavy, we recommend to limit
parallelism to 4.
This process will take roughly 12-13 hours. Once this process
finishes, you will find that there is a new folder "intervals"
which contains 10 files, which are necessary files to generate
polynomials for each float functions.
(2) Generating polynomial for float functions (Corresponding to Table 2)
In this part, we generate the polynomials based on the reduced
intervals we computed in the previous step. To execute this
step, run the command
$ ./runFloatFunctionGen.sh
This script will automatically run the underlying scripts in
parallel. By default, the parallelism is 2. If you would like
to increase the parallelism, use the "-j" flag:
$ ./runFloatFunctionGen.sh -j <# jobs>
However, since this phase is I/O heavy, we recommend to limit
parallelism to 4.
This process should take roughly 5-6 hours. Once this process
finishes, you will find that there are header files in
include/float_headers. You can check the file against the
reference header files in
include-reference/float_headers. Although each value in the
array may slightly differ, the size of the array should be
exactly the same.
(3) Computing reduced intervals for posit32 functions
This process computes the reduced intervals for all 3 RLibm-32's
posit32 functions. To execute this phase, run the command
$ ./runPosit32IntervalGen.sh
This script will automatically run the underlying scripts in
parallel. By default, the parallelism is 2. If you would like
to increase the parallelism, use the "-j" flag:
$ ./runFloatIntervalGen.sh -j <# jobs>
However, since there are only 3 jobs total, the maximum should
be 3.
This process will take roughly 12-13 hours. Once this process
finishes, you will find that there is a new folder "intervals"
which contains 3 posit32 data files, which are necessary files
to generate polynomials for each posit32 functions.
(4) Generating polynomial for posit32 functions (Corresponding to Table 2)
In this part, we generate the polynomials based on the reduced
intervals we computed in the previous step. To execute this
step, run the command
$ ./runPosit32FunctionGen.sh
This script will automatically run the underlying scripts
in parallel. By default, the parallelism is 3. If you would like
to increase the parallelism, use the "-j" flag:
$ ./runFloatFunctionGen.sh -j <# jobs>
However, since there are only 3 jobs total, the maximum should
be 3.
This process should take roughly 1 hour. Once this
process finishes, you will find that there are header files in
include/posit32_headers. You can check the file against the
reference header files in
include-reference/posit32_headers. Although each value in the
array may slightly differ, the size of the array should be
exactly the same.
(5) Checking the correctness of RLibm-32 (Corresponding to Table 1)
In this phase, we check the correctness of RLibm-32 functions.
Depending on whether you generated the header files in the
previous (1)-(4) step, you should use different scripts.
(a) If you did generate the polynomials, you can check
correctness of the generated polynomials.
(b) If you did not generate the polynomials, you can check
correctness of the reference polynomial that we generated.
To check the correctness of the polynomials YOU GENERATED
for float, run the command:
$ ./runFloatLibTest.sh
To check the correctness of the REFERENCE polynomial for
float, run the command:
$ ./runFloatLibTest_reference.sh
This script will automatically run the underlying scripts in
parallel. By default, the parallelism is 4. If you would like
to increase/decrease the parallelism, use the "-j" flag.
This process should take roughly 12 hours. Once this process
finishes, it will print out whether RLibm-32, GLibc, and Intel
math library produces correct results. For all functions, you
should see the message: "RLibm-32 returns correct result for all
inputs." You should also see that for a number of functions,
GLibc and Intel's float math library does not produce correct
results.
* Note: Because our oracle for Sinpi and Cospi are
exceptionally slow, we only check the result of inputs x = 0
to 2.0 for Sinpi and Cospi. This is sufficient to extrapolate
the result of Sinpi and Cospi to all other inputs, due to the
properties of Sinpi and Cospi.
To check the correctness of the polynomials YOU GENERATED for
posit32, run the command:
$ ./runPosit32LibTest.sh
To check the correctness of the REFERENCE polynomial for
posit32, run the command:
$ ./runPosit32LibTest_reference.sh
This script will automatically run the underlying scripts in
parallel. By default, the parallelism is 3. If you would like
to increase/decrease the parallelism, use the "-j" flag. Note
that the script only runs 3 jobs. So the maximum parallelism
is 3.
This process should take roughly 10 hours. Once this process
finishes, it will print out whether RLibm-32, GLibc, and Intel
math library produces correct results. For all functions, you
should see the message: "RLibm-32 returns correct result for all
inputs." You should also see that for a number of functions,
GLibc and Intel's float math library does not produce correct
results.
(6) Speedup of RLibm-32 (Corresponding to Figure 3 and Line 1134)
* NOTE: Since the submission of the paper, Intel has
completely changed the toolkit for installing Intel's
compiler. Thus, the speedup result may differ from what's
reported in the paper. We will update the result in the final
version accordingly.
In this part, we check the speedup of RLibm-32 against Intel and
Glibc math library. Depending on whether you generated the
header files in the previous (1)-(4) step, you should use
different scripts.
(a) If you did generate the polynomials, you can check speedup
of the generated polynomials.
(b) If you did not generate the polynomials, you can check
speedup of the reference polynomial that we generated.
To check the speedup of the polynomials YOU GENERATED for
float, run the command:
$ ./runFloatOverheadTest.sh
To check the speedup of the REFERENCE polynomials for float,
run the command:
$ ./runFloatOverheadTest_reference.sh
This script will automatically run the underlying scripts in
parallel. By default, the parallelism is 1. If you would like
to increase/decrease the parallelism, use the "-j" flag. We
recommend to keep the parallelism low, to roughly 2-4.
This process should take roughly 5-6 hours. Once this process
finishes, it will create two files,
(a) floatAgainstGlibc.pdf
(b) floatAgainstIntel.pdf
These two graphs correspond to Figure 3(a) and Figure 3(b),
respectively. Although the numbers may look different, it
should still be the case that the bars are above 1x speedup in
most cases, and average is above 1x speedup in all cases.
To check the speedup of the polynomials YOU GENERATED for
posit32, run the command:
$ ./runPosit32OverheadTest.sh
To check the speedup of the REFERENCE polynomial for posit32,
run the command:
$ ./runPosit32OverheadTest_reference.sh
This script will automatically run the underlying scripts in
parallel. By default, the parallelism is 1. If you would like
to increase/decrease the parallelism, use the "-j" flag. We
recommend to keep the parallelism low, to roughly 2-3.
This process should take roughly 2 hours. Once this process
finishes, it will print the speedup of RLibm-32 against Glibc
and Intel's double math library. The speedup may be different
from reported from the paper, but speedup should be roughly 1x
~ 1.15x.
Files
Files
(92.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:dca8d559ac6713ca9cb11a93239be717
|
4.8 MB | Download |
|
md5:cd77974db0fb39d290a4c803c370328d
|
87.8 MB | Download |