Published November 23, 2023 | Version v1
Dataset Open

Determining non-significant bits on a C++ implementation of the LeNet-5 convolutional neural network to be used for storing error correcting codes to protect weights and biases. Robustness assessment of the network after integrating the proposed codes.

Description

The architecture of the LeNet-5 convolutional neural network (CNN) was defined by LeCun in its paper "Gradient-based learning applied to document recognition" (https://ieeexplore.ieee.org/document/726791) to classify images of hand written digits (MNIST dataset).

This architecture has been customized to use Rectified Linear Unit (ReLU) as activation functions instead of Sigmoid.

It consists of the following layers:

  • conv1: Convolution 2D, 1 input channel (28x28), 3 output channels (28x28), kernel size 5, stride 1, padding 2.
  • relu1: Rectified Linear Unit (3@28x28).
  • max1: Subsampling buy max pooling (3@14x14).
  • conv2: Convolution 2D, 3 input channels (14x14), 6 output channels (14x14), kernel size 5, stride 1, padding 2.
  • relu2: Rectified Linear Unit (6@14x14).
  • max2: Subsampling buy max pooling (6@7x7).
  • fc1: Fully connected (294, 147)
  • fc2: Fully connected (147, 10)

The fault hypotheses for this work include the occurrence of:

  • S0/S1: multiple adjacent stuck-at-0 and stuck-at-1 faults to determine the least significant bits of weights and biases that could be used to store the proposed error correcting codes.
  • BF: single, double, and triple bit-flip faults to assess the robustness of the considered CNN

In the memory cells containing all the parameters of the CNN:  

  • w: weights (float32)
  • b: biases (float32)

All the images (10000) from the MNIST dataset have been used as workload.

The weights and biases of the LeNet-5 architecture have been protected using six different error correcting codes that have been deployed in the least significant bits of these elements.

The parity check matrices (H = P I) that define these ECCs are:

  • SEC(32, 26) (Hamming) under a classic policy (see methodology below):

        11010010001000011101101000 100000

        10101001000100011011010100 010000

        01100100100010010110110010 001000

        00011100010001001110001101 000100

        00000011110000100001111011 000010

        00000000001111100000000111 000001

  • SEC(23, 18) (Hamming) under a conservative policy (see methodology below):

        111100001111000000 10000

        110011101000111000 01000

        101011010100100110 00100

        010110110010010101 00010

        001101110001001011 00001

  • SEC(13, 9) (Hamming) under an aggressive policy (see methodology below):

        110111000 1000

        101100110 0100

        011010101 0010

        111001011 0001

  • DEC(32, 21) (low redundancy and reduced overhead DEC) under a classic policy (see methodology below):

        111000011001010010000 10000000000

        110110000011101000000 01000000000

        101011000110000010001 00100000000

        100101101000110001000 00010000000

        011010101100100000100 00001000000

        010101010100001001010 00000100000

        001100110010010100100 00000010000

        000011110001000110010 00000001000

        000000001111001101001 00000000100

        000000000000111100111 00000000010

        000000000000000011111 00000000001

  • DEC(28, 18) (low redundancy and reduced overhead DEC) under a conservative policy (see methodology below):

        111111000000000000 1000000000

        110100111100000000 0100000000

        110000100011110000 0010000000

        001110010011001100 0001000000

        101100001010101010 0000100000

        010001001101010110 0000010000

        001011000101101001 0000001000

        101000011000110101 0000000100

        010001110000011011 0000000010

        000010100110000111 0000000001

  • DEC(17, 9) (low redundancy and reduced overhead DEC) under an aggressive policy (see methodology below):

        111110000 10000000

        111001100 01000000

        110101010 00100000

        101010110 00010000

        101101001 00001000

        100110101 00000100

        100011011 00000010

        110000111 00000001

This dataset contains the raw data obtained from:

  • running exhaustive fault injection campaigns for increasingly multiple stuck-at faults in the least significant bits of all weights and biases (simultaneously) and for all the images in the workload.
  • running statistical fault injection campaigns for single, double, and triple bit-flip faults, randomly targeting the considered locations and images in the workload.

Files information

  • no_ecc folder: Results obtained for the original (not protected) version of the CNN.
    • golden_run.csv: Prediction obtained for all the images considered in the workload in the absence of faults (Golden Run). This is intended to act as oracle to determine the impact of injected faults.
    • sampling_SBF_10000.csv: Prediction obtained for running 10000 statistical fault injection experiments for single bit-flip faults.
    • sampling_DBF_10000.csv: Prediction obtained for running 10000 statistical fault injection experiments for double bit-flip faults.
    • sampling_TBF_10000.csv: Prediction obtained for running 10000 statistical fault injection experiments for triple bit-flip faults.
    • locating_sensitive_bits folder: Prediction obtained for all the images considered in the workload in presence of stuck-at-0/stuck-at-1 faults that simultaneously target the N least significant bits of all weights and biases. There is one file for each parameter of type of fault and range of targeted bits. Files for bits in the range [11, 0] are not included as they obtain eactly the same results as the Golden Run (faults do not alter the behaviour of the network).
  • sec/classic, sec/conservative, and sec/aggressive folders: They contain the results obtained for the CNN protected by SEC(32, 26), SEC(23, 18), and SEC(13, 9), respectively.
    • golden_run.csv: Prediction obtained for all the images considered in the workload in the absence of faults (Golden Run). This is intended to act as oracle to determine the impact of injected faults. It must be noted that this file could be different that the golden_run.csv file for the original version of the CNN, as deploying the ECC in the weights and biases may have affected the behaviour of the network.
    • sampling_SBF_10000.csv: Prediction obtained for running 10000 statistical fault injection experiments for single bit-flip faults. They should all be tolerated by the definition of the ECC.
    • sampling_DBF_10000.csv: Prediction obtained for running 10000 statistical fault injection experiments for double bit-flip faults. They could be more harmful than for the unprotected version of the CNN, as the ECC may erroneously flip correct bits.
  • dec/classic, dec/conservative, and dec/aggressive folders: They contain the results obtained for the CNN protected by DEC(32, 21), DEC(28, 18), and DEC(17, 9), respectively.
    • golden_run.csv: Prediction obtained for all the images considered in the workload in the absence of faults (Golden Run). This is intended to act as oracle to determine the impact of injected faults. It must be noted that this file could be different that the golden_run.csv file for the original version of the CNN, as deploying the ECC in the weights and biases may have affected the behaviour of the network.
    • sampling_DBF_10000.csv: Prediction obtained for running 10000 statistical fault injection experiments for double bit-flip faults. They should all be tolerated by the definition of the ECC.
    • sampling_TBF_10000.csv: Prediction obtained for running 10000 statistical fault injection experiments for triple bit-flip faults. They could be more harmful than for the unprotected version of the CNN, as the ECC may erroneously flip correct bits.

Methodology information

First, the CNN was used to classify all the images of the workload in the absence of faults to get a reference to determine the impact of faults. This is golden_run.csv file.

To locate non-significant bits in weights and biases, fault injection experiments were executed targeting all elements of all parameters of the CNN using the following procedure:

  • The initial mask targeted only the least significant bit
  • Until the mask targets all bits of the elements (32 bits as they are single-precision floating point values):
    • Affect the bits (setting them to 0 or 1 in case of stuck-at-0 or stuck-at-1 faults) identified by the mask for all elements of all parameters.
    • Classify all the images of the workload in the presence of this fault. The obtained output was stored in a given .csv file.
    • Remove the fault from the CNN by restoring the affected bits to its previous value.
    • Add the next adjacent bit to the mask, so it targets an additional least significant bit.

The analysis of the obtained results may help in determining which bits can be used to store an ECC:

  • which bits never affect the behaviour of the CNN, as the predicted classification is exactly the same than in the absence of faults.
  • which bits midly affect the behaviour of the CNN, as although the predicted classifications differ from those in the absence of faults, the accuracy of the network is barely affected.
  • which bits greatly affect the behaviour of the CNN, as the accuracy of the network is significantly affected.

Accordingly, three different policies have been identified for deploying an ECC using these bits:

  • Classic policy: The ECC protects as much bits as possible.
  • Conservative policy: The ECC protects all those bits that may affect the prediction of the network.
  • Aggressive policy: The ECC protects only those bits that significantly affect the accuracy of the network.

After designing and deploying a single ECC and a double ECC for each of the identified policies, fault injection experiments were executed to verify their behaviour in the presence of faults.

Single and double ECCs were tested against single and double bit-flip, respectively (all faults should be tolerated,) and double and triple bit-flips, respectively (a correct bit could be erroneously flipped.)

Due to the heavy computational load of the decoders, statistical injection was used to run the required fault injection campaigns with a sample size (number of experiments) of 10000.

Each experiment consisted in:

  • Randomly selecting the image to process, and the parameter, element, and bits (mask) to be targeted by the fault.
  • Affecting the bits (inverting them) identified by the mask.
  • Classifying the selected image of the workload in the presence of this fault. The obtained output was stored in a given .csv file.
  • Removing the fault from the CNN by restoring the affected bits to its previous value.

List of variables (Name : Description (Possible values))

  • IMGID: Integer number identifying the considered image (1-9999).
  • TENSORID: Integer number identiying the parameter affected by the fault (0 - No fault, 1 - conv1.w, 2 - conv1.b, 3 - conv2.w, 4 - conv2.b, 5 - fc1.w, 6 - fc1.b, 7 - fc2.w, 8 - fc2.b).
  • ELEMID: Integer number identiying the element of the parameter affected by the fault (-1 - No fault, [0-2] - conv1.b, [0-74] - conv1.w, [0-5] - conv2.b, [0-149] - conv2.w, [0-146] - fc1.b, [0-43217] - fc1.w, [0-9] - fc2.b, [0-1469] - fc2.w).
  • MASK: 8-digit hexadecimal number identifying those bits affected by the fault ([00000000 - No fault, FFFFFFFF - all 32 bits faulty]).
  • FAULT: String identiying the type of fault (NF - No fault, BF - bit-flip, S0 - Stuck-at-0, S1 - Stuck-at-1).
  • SOFTMAX: 10 decimal numbers obtained after applying the softmax function to the provided output. They represent the probability of the image of belonging to the corresponding category for classification.
  • PRED: Integer number representing the category predicted for the processed image.
  • LABEL: integer number representing the actual category for the processed image.

Files

MiniLenetCSECDEDFaultInjection.zip

Files (28.5 MB)

Name Size Download all
md5:83e317105f43935f478edc06afe9e36f
28.5 MB Preview Download

Additional details

Funding

Ministerio de Ciencia, Innovación y Universidades
Dependable-enough FPGA-Accelerated DNNs for Automotive Systems (DEFADAS) PID2020-120271RB-I00
Agencia Estatal de Investigación
Dependable-enough FPGA-Accelerated DNNs for Automotive Systems (DEFADAS) PID2020-120271RB-I00

Dates

Collected
2023-10-05