LibvDiff-Dataset
Authors/Creators
Description
We collected the source code of 9 widely used OSS from Github [13] and their official websites as Table 1 shows. These OSS can be categorized into different groups based on their functionality, such as document formatting, and compression. We compiled the source code into binaries using different compilation options, including
4 architectures (ARM, X86, X64, PPC) and 4 optimization levels (O0, O1, O2, O3) with GCC v9.4.0. In total, we obtained 168 distinct versions of all OSS, resulting in 2688 (168 * 16) binaries as shown in Table 1.
Accept for the binary dataset, an extra dataset with feature and dataset example for quick start is also provided.
-
dataset_features_example.tar.gz: it includes features that we have already generated and can be evaluated quickly. OSS_version_dataset.tar.gz: it includes the binary dataset mentioned in our paper, including 9 OSS, 168 versions and 2688 binaries, more details could be found in our paper (Sec 6).- The source code of LibvDiff is available at heritage
The strucutre of dataset is organized as:
- OSS # e.g. freetype
|-- Lib of OSS # e.g. libfreetype
|-- architecture # e.g. ARM
|-- optimization # e.g. O0
|-- version # e.g. VER-2-4-1
|-- binary # e.g. freetype-2.4.0
Files
Files
(1.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:2e3a11fe571c9c177f9174dedbaa6039
|
724.7 MB | Download |
|
md5:44aaf83435aa67296b2cbdfe3a0fb9c8
|
838.3 MB | Download |