Published December 15, 2023 | Version v1.0
Dataset Open

LibvDiff-Dataset

  • 1. ROR icon Institute of Information Engineering
  • 2. ROR icon University of Chinese Academy of Sciences

Description

We collected the source code of 9 widely used OSS from Github [13] and their official websites as Table 1 shows. These OSS can be categorized into different groups based on their functionality, such as document formatting, and compression. We compiled the source code into binaries using different compilation options, including
4 architectures (ARM, X86, X64, PPC) and 4 optimization levels (O0, O1, O2, O3) with GCC v9.4.0. In total, we obtained 168 distinct versions of all OSS, resulting in 2688 (168 * 16) binaries as shown in Table 1. 

Accept for the binary dataset, an extra dataset with feature and dataset example for quick start is also provided.

  •  dataset_features_example.tar.gz: it includes features that we have already generated and can be evaluated quickly.
  • OSS_version_dataset.tar.gz: it includes the binary dataset mentioned in our paper, including 9 OSS, 168 versions and 2688 binaries, more details could be found in our paper (Sec 6). 
  • The source code of LibvDiff is available at heritage

The strucutre of dataset  is organized as:

- OSS   # e.g. freetype
   |-- Lib of OSS    # e.g. libfreetype
       |-- architecture   # e.g. ARM
           |-- optimization   # e.g. O0 
              |-- version   # e.g. VER-2-4-1
                  |-- binary   # e.g. freetype-2.4.0 

Files

Files (1.6 GB)

Name Size Download all
md5:2e3a11fe571c9c177f9174dedbaa6039
724.7 MB Download
md5:44aaf83435aa67296b2cbdfe3a0fb9c8
838.3 MB Download