One Polynomial Approximation to Produce Correctly Rounded Results of an Elementary Function for Multiple Representations and Rounding Modes

doi:10.5281/zenodo.5576679

Published October 5, 2021 | Version v2

Conference paper Open

One Polynomial Approximation to Produce Correctly Rounded Results of an Elementary Function for Multiple Representations and Rounding Modes

1. Yale University
2. Nagarakatte

Abstract :

We present the artifact for the accepted paper, One Polynomial Approximation to Produce Correctly Rounded Results of an Elementary Function for Multiple Representations and Rounding Modess. We describe the list of claims made by the paper, the installation instructions, and evaluation instructions. To ease the installation effort, we provide a docker image with all required softwares installed already. Additionally, we provide complete instructions to manually install the artifact on Ubuntu 20.04.

_________________________________________________________________________________________________

Table of Content:

1. Hardware recommendation
2. Claims made from the paper
3. Instalaltion Guide
4. EVALUATION GUIDE FOR THE ARTIFACT
(1) Generating the polynomials for ourLibm
(2) Check the correctness of ourLibm and mainstream math libraries
(3) Checking the performance of ourLibm (Claim 3)
5. More details on generating polynomials
6. How to use ourLibm!

_________________________________________________________________________________________________

1. Hardware recommendation:

All of our evaluations were performed on a machine with Intel Xeon Gold 6130 machine with 2.10GHz processor and 32GB of RAM, which runs Ubuntu-18.04. However, we predict that the artifact should be functional with modern machines with 16GB of RAM, running Ubuntu-18.04 or later. For the ease of installation purposes, however, we recommend running the evaluation on Ubuntu-20.04. The docker image we provide also uses Ubuntu-20.04.

_________________________________________________________________________________________________

2. Claims made from the paper:

All claims to be evaluated in the artifact are found in Section 5. Experimental Evaluation.

(1) Our technique successfully generates polynomials that produce correctly rounded results for 161 different FP representations with all standard rounding modes (claimed at the beginning of Section 6.2).

(2) Mainstream math libraries, on the other hand, do not produce correctly rounded results for all inputs, for all standard rounding modes, for float and tensorfloat32 types, for several elementary functions (claimed in Table 3 and Table 4).

(3) On average, OurLibm is faster than glibc's Libm, Intel's Libm, and CR-LIBM, while only 5% slower than RLibm-32 (claimed in Figure 14).

_________________________________________________________________________________________________

3. INSTALLATION GUIDE:

There are two methods to install the artifact: (1) Installing docker and downloading the docker image for the artifact or (2) Manual installation. We recommend to use the docker image provided to expedite the installation process.

** DOWNLOADING PREBUILT DOCKER IMAGE

We have prebuilt a docker image and hosted it in the docker hub.

(1) Install docker if not already installed by following the installation documentation in this link: https://docs.docker.com/install/

We recommend installing docker and evaluating our artifact on a machine with Ubuntu. Although docker can be used with Windows or MacOS, docker may run on top of a linux VM

(2) Download the prebuilt docker image by using the command:

$ sudo docker pull jpl169/popl2022artifact

The docker image is roughly 5.5GB

(3) Run the docker image:

$ sudo docker run -it jpl169/popl2022artifact

(4) If running the docker image on macOS, increase docker's memory resource. You can increase memory resource by going to docker->preference->Resources. 12GB of memory usage should be more than enough.

(5) Installation is complete. Go to EVALUATION GUIDE section.

** MANUAL INSTALLATION
Note: We show the manual installation process on Ubuntu 20.04. We specifically chose Ubuntu 20.04, since gcc version 10 can be easily installed via "apt install." We use gcc version 10 to test the correctness and performance of glibc's libm. gcc version 10 is only used for testing purposes. To generate polynomials, any gcc version will suffice. However, for artifact evaluation, we highly suggest Ubuntu 20.04 if doing manual installation. If your machine does not have Ubuntu 20.04 installed, you can use docker with a base image of Ubuntu 20.04. In the case of using docker, make sure to omit the "sudo" command since you're automatically logged in as the root. Still, the best method for artifact evaluation is to use the docker image. Otherwise, gcc version 10 will have to be manually installed. We do not provide the steps to manually install gcc version 10.

(1) To install and evaluate the artifact manually, first install a list of prerequisites:

    $ sudo apt update
    $ sudo apt install -yq --no-install-recommends apt-utils
    $ sudo apt install -yq build-essential parallel cmake git libgmp3-dev libmpfr-dev zlib1g zlib1g-dev bc wget python3 python3-pip gcc-10 g++-10 ncurses-term
    $ sudo python3 -m pip install --upgrade pip
    $ sudo python3 -m pip install matplotlib

(2) Download and install soplex 4.0.1:

    Please download soplex-4.0.1 from https://soplex.zib.de/index.php#download.
    Note: Make sure the version you download is soplex-4.0.1.

    $ tar -xvf soplex-4.0.1.tgz
    $ cd soplex-4.0.1
    $ mkdir build
    $ cd build
    $ cmake ..
    $ make
    $ cd ../..

Now, set the environment variable:
$ export SOPLEXPATH=<path to soplex folder>

(3) Install Intel OneApi (to get Intel classic compiler: icc)

    1. Go to the link: https://software.intel.com/content/www/us/en/develop/tools/oneapi/hpc-toolkit/download.html
    2. Select the appropriate operating system (Linux)
    3. Select "Web & Logal" distribution option
    4. Select Online installer
    5. On the right hand side (gray background) if you scroll down, it will show the steps to install.
    On Linux base system, it will be:
    $ wget https://registrationcenter-download.intel.com/akdlm/irc_nas/18211/l_HPCKit_p_2021.4.0.3347.sh
    $ sudo bash l_HPCKit_p_2021.4.0.3347.sh

    6. Follow the instruction. The installer will guide you through installing intel compiler.
        a. Make sure to install "Intel® oneAPI DPC++/C++ Compiler & Intel® C++ Compiler Classic." You can choose to not install any other components
        b. Make sure to remember the Installation directory
        c. If it shows you any warning about requiring the "Base toolkit" you can choose to ignore it.
            * The installation will take roughly 5~10 minutes.
    7. Once installation is complete, run script to set variables:
    $ cd <path to intel oneAPI directory> (e.g., /opt/intel/oneapi)
    $ source setvars.sh

    8. Set the environment variable:
    $ export ICCPATH=<path to Intel oneAPI directory> (e.g., export ICCPATH=/opt/intel/oneapi/)

(4) Or, download POPL21 artifact from Zenodo and untar the file:
    - Zenodo page: https://doi.org/10.5281/zenodo.5550564
    $ tar -xvf POPL2022Artifact.tar
    $ cd POPL2022Artifact
    $ make

(7) Installation is complete. Go to EVALUATION GUIDE section.

_________________________________________________________________________________________________

4. EVALUATION GUIDE FOR THE ARTIFACT

If you have followed one of the two installation methods in the INSTALLATION GUIDE section, you should be in the POPL21-artifact directory. If you are evaluating the artifact using the provided docker image, you will automatically be placed into POPL21-artifact directory when you run the docker image.

(1) Generating the polynomials for ourLibm (Claim 1):

To test our technique in generating the polynomials, run the bash script,

$ ./GenerationTest.sh

This script will take roughly 7 minutes. It gives an example of how to generate a polynomial that produces the correct results of log2(x) for all inputs in [1, 2), for several FP representations, for all standard FP rounding modes. Once the script is finished, it should say a message similar to "successfully created..." in green text. If it produces a message similar to "incorrect polynomial ..." in red text, then the test failed.

    * The necessary files to generate the polynomial is in the folder, "smallGenerationTest." You can modify the files in there to generate polynomials of other elementary functions.
    - The "smallGenerationTest/GenerateOracleFiles" directory contains source code to generate the RNO result correctly.
    - The "smallGenerationTest/IntervalGen" directory contains source code to compute the reduced interval based on the range reduction strategy chosen to use with the log2(x) function.
    - The "smallGenerationTest/functiongen" directory contains source code to generate a polynomial that satisfies the intervals produced in the previous step.
    - The "smallGenerationTest/source" directory contains the source code of the elementary function implementation.
    - The "smallGenerationTest/correct_test" contains the source code to test the correctness of the created implementation.

    * As you can see, the root directory of POPL2022Artifact also contains similar directories (i.e., GenerateOracleFiles, etc.) The files in these directories contain how to generate elementary functions that produces correct results for all inputs. However, running this will take at a minimum 3-4 hours, for each elementary function. Hence, for the purposes of artifact evaluation, we show how to create polynomials for small input domain.

(2) Check the correctness of ourLibm and mainstream math libraries (Claim 2):

    * To test the correctness of ourLibm, run the bash script,

$ ./CorrTestRLibmAll.sh

This script will take roughly 15 minutes. This script tests whether ourLibm produces the correctly rounded results for uniformly sampled FP inputs with different amount of exponent bits (and mantissa bits), for all standard rounding modes. If the testing is successful, you should see "check" in green text, for each elementary function and for representations with different exponent bits. Otherwise, it will print "incorrect" in red text, which signifies that the test has failed.

We also set up a test script to run this test for ALL possible FP inputs. However, we really don't recommend running this for artifact evaluation purposes, as this will take > 48 hours, even when run in parallel...
$ ./CorrTestRLibmAll_Full.sh

* To test the correctness of mainstream math libraries, run the bash script,

$ ./CorrTestMlibs.sh

    This script will take roughly 15 minutes. This script tests whether different math libraries produce correctly rounded results. This test corresponds to Table 2 and Table 3. The output is written in terminal. Once the test is complete, you should see two tables, that resembles how Table2 and Table3 looks like in the submission pdf, respectively.
    For the test corresponding table 2, we uniformly sample inputs out of 4 billion float inputs. Because we uniformly sample inputs, the result will look different from the pdf. However, it should always be the case that when the pdf reports "checkmark," then our test also reports "o." If the pdf shows "N/A", then our test will show blank spaces for them. If the pdf reports "x", then our test may report "o" or "x". Additionally, because the test is set up to run in parallel, the rows that show up in the result of our test may not be the same as the rows shown in the pdf.
    For the test corresponding table 3, we test the correctness against all inputs (2^19 inputs). Hence, the result should be matched identically to table 3. If the pdf reports "checkmark" our test reports "o". If the pdf reports "N/A", then our test show blank spaces. If the pdf reports "x", then our test reports "x". Again, because the test is set up to run in parallel, the rows that show up in the result of our test may not be the same as the rows shown in the pdf.

We also set up a test script to run this test for ALL possible float inputs. Although we do not recommend to run this script for artifact evaluation purposes (as it will take 3-4 hours to complete), it would be good to run it if there's enough time.

$ ./CorrTestMlibs_Full.sh

(3) Checking the performance of ourLibm (Claim 3):

    - To test the speedup of ourLibm, run the bash script,

    $ ./PerfTest.sh

This script will take roughly 15 minutes. This script uniformly samples some inputs to compare the performance. Once the script terminates, you will see the following files:

    (b) SpeedupOverGlibc.pdf - This file shows the graph presented in Figure 14 (a)
    (c) SpeedupOverIntel.pdf - This file shows the graph presented in Figure 14 (b)
    (a) SpeedupOverCrlibm.pdf - This file shows the graph presented in Figure 14 (c)
    (d) SpeedupOverRlibm32.pdf - This file shows the graph presented in Figure 14 (d)

- If you're using docker to evaluate the artifact, you can copy the pdf's from the container to your host machine using the following commands:

    $ exit
    $ sudo docker container ls -a (Find the correct container id)
    $ sudo docker cp <container id>:/home/POPL2022Artifact/SpeedupOverGlibc.pdf .
    $ sudo docker cp <container id>:/home/POPL2022Artifact/SpeedupOverIntel.pdf .
    $ sudo docker cp <container id>:/home/POPL2022Artifact/SpeedupOverCrlibm.pdf .
    $ sudo docker cp <container id>:/home/POPL2022Artifact/SpeedupOverRlibm32.pdf .

- To go back into docker container, use the following commands:

$ sudo docker start <container id>
$ sudo docker attach <container id>

To evaluate the result, compare the graphs from the pdf against Figure 14. The speedup may be differ greatly between hardware architecture and specific configurations. Additionally, since the inputs are uniformly sampled for the artifact evaluation, the speedup may be different from what is shown in Figure 14. However, it should still be the case that on average, ourLibm has speedup (bars are higher than 1.0) over glibc's libm, Intel's libm, and CR-LIBM, while slowdown (bars are less than 1.0) over RLibm-32.

_________________________________________________________________________________________________

5. More details on generating polynomials

* In general, generating a polynomial that produces the correctly rounded results for a given domain of inputs consists of three steps.
(Step 1) Computing the rno result : CalcResultsInRNO function in Fig. 12
(Step 2) Computing the odd intervals and the reduced intervals : CalcOddIntervals function in Fig. 12 and the first part of RLibmPolyGen function shown in Algorithm 1. The details of RLibmPolyGen is not shown in the paper.
(Step 3) Generating the polynomials based on the reduced intervals : The second half of RLibmPolyGen function shown in Algorithm 1.

* The directory "smallGenerationTest" shows how to execute each of the three steps using the artifact. The source code for each step is organized into separate directories:
(Step 1) GenerateOracleFiles : Compute rno result and store into oracle_file
(Step 2) IntervalGen : Compute reduced intervals using oracle_file and store into interval_file
(Step 3) functiongen : Use interval_files to generate polynomials and generate a table of coefficients

* We now describe what must be provided to perform each of the steps using the files in the smallGenerationTest directory. More details can be found in the comments of each file preluded with "TODO"

(Step 1) GenerateOracleFiles/Log2Small.c
   * The function "ComputeOracleResult" must provide a way to compute the correctly rounded rno result of f(x) given any input. Currently, this function produces correctly rounded rno results for log2(x).
   * In the "main" function, the RunTestHelper must be provided with the domain of inputs using the bit-patterns of the 32-bit float representation. For example, [0x3f800000, 0x40000000) means an input domain of [1, 2)
   * How to execute the program and the list of command line arguments:
       ./Log2Small <file to store rno results>

(Step 2) IntervalGen/Log2Small.cpp
   In general there are 5-6 things that must be provided to correctly generate reduced intervals. These functions need to be customized for the specific f(x) function. If a component is incorrect, it is common to see an infinite loop.
   * "ComputeSpecialCase" function : Given an input "x", determine whether f(x) is a special case value. Specifically, it is important to correctly determine whether f(x) is infinity, NaN, or the odd interval is a singleton interval. If it is, store the correct result in "res" and return true. Otherwise, return false.
   * "RangeReduction" function : The implementation of the range reduction function. If not using range reduction strategy, simply return "x"
   * "OutputCompensation" function: The implementation of the output compensation function. If not using range reduction strategy, simply return "yp"
   * "GuessInitialLbUb" function: To accurately compute the reduced interval, we need an initial point in the reduced interval. This initial point, when used with the output compensation function, must result in a value within the rounding interval. The initial point can be computed using the function that the polynomial is approximation. In most cases, that would be f(x) itself, unless the range reduction strategy performs function transformation (i.e. the range reduction strategy we use for log2(x)). Provide the initial guess into the variable "double initialGuess."
       - Note : If the initialGuess misses the rounding interval by a few ulps, don't worry too much. The bulk of the code in "GuessInitialLbUb" tries to wiggle around initialGuess to find a point that will result in the rounding interval.
   * "SpecCaseRedInt" function : This function is for very special case, where you need special computation to compute the reduced interval. In other words, you can use "SpecCaseRedInt" to implement your own algorithm to identify the reduced intervals. If you leave it as it is, the program will use a naive method to identify the reduced intervals, which is sufficient most of the time.
   * "main" function : Make sure to provide the input domain. This must match exactly with the domain provided during Step 1.
   * How to execute the program and the list of command line arguments:
       ./Log2Small <file to store interval information> <name of file containing rno results>

(Step 3) functiongen/Log2Small.cpp
   * The "power" vector describes the shape of the polynomial. For example, {1, 2, 3, 4, ... 8} indicates that we want to create a polynomial of the form c_1 x + c_2 x^2 + ... + c_8 x^8. Depending on the function you want to approximate, you may want to change the shape of the polynomial. For example,
       - e^x : {0, 1, 2, 3, 4, ...}
       - sin(x) : {1, 3, 5, ...}
       - Note : functiongen will automatically try to generate the lowest degree polynomial, even if you specify much more terms in the vector.

* How to execute the program and the list of command line arguments:
./Log2Small <file with interval info.> <log file output> <header file output> <size of piecewise polynomial in logscale, i.e., N creates 2^N polynomials>

* If the header file has values in the array, it means the polynomial was successfully generated. If the array is empty, then the polynomial was unsuccessful. You can check the log file to see what's going on.

* Note : If you use "GenerationTest.sh" to automate the process for you, the header file will be located in "smallGenerationTest/include/float34RO_headers." Also, it may surprise you that it prints out a bunch of message with "res" and "orc" alternating. This is because the script uses a pre-built template to check for the correctness of the generated polynomial, assuming that you're trying to generate log2(x). Make sure to fix the implementation and the testing script. Or, you can simply kill the process when you see "res" and "orc" printing out.

(Final) Once all three steps are done, the <header file output> should contain an 2-D array. Each array represents a polynomial in the piecewise polynomial. Each value in the array represents the coefficients. The coefficients corresponds to each of the terms specified in the "power" vector.

(Creating implementation) Using the generated polynomial, you can come the range reduction function, output compensation function, and the special case functions to implement f(x)!
* The implementation of log2(x) for our small sample is in smallGenerationTest/source/log2small.c
* Edit this file accordingly

(Testing) Make sure that you edit the testing script in smallGenerationTest/correct_test as well.
* The testing script is in smallGenerationTest/correct_test. The main file to change is LibTestHelper.h. This script uses the generated oracle file to compare against the implementation. Make sure that we are iterating through the correct input domain. Otherwise, we may see unexpected results.

(See how we generated the functions) POPL2022Artifact directory has GenerateOracleFiles, IntervalGen, and functiongen by itself. You can find the exact configuration we used to generate the polynomials. Be aware that using these code directly will take roughly 24 hours to generate a polynomial.

_________________________________________________________________________________________________

6. How to use ourLibm!

To compile OurLibm, go to the "POPL2022Artifact" directory and type,
$ make
This will create the static library lib/float34ROMathLib.a.

To use ourLibm functions, include the "float34RO_math.h" file found in the "include" directory.

To see a sample program that uses OurLibm, please look into the "sample" directory. It is a simple program that computes "e^0.005 / sinpi(0.75)". The makefile shows how to link the library and the dependency. Note that we rely on "-lm", the default compiler math library to extreme exponents, etc. We do not use the default math library's elementary functions internally.

Files

Files (25.1 MB)

Name	Size	Download all
POPL2022Artifact.tar md5:ae8072b052e945ee392b2ef941511683	25.1 MB	Download

	All versions	This version
Views	206	125
Downloads	26	16
Data volume	753.9 MB	476.5 MB

One Polynomial Approximation to Produce Correctly Rounded Results of an Elementary Function for Multiple Representations and Rounding Modes

Creators

Description

Files

Files (25.1 MB)