Joint Autoregressive and Graph Models for Software and Developer Social Networks

Hazra Rima; Aggarwal Hardik; Goyal Pawan; Mukherjee Animesh; Chakrabarti Soumen

This zip contains three CSV files and one folder. This dataset contains information for the recent ten distributions.

  • developer_attributes.csv: There are seven columns in this file.  "distro" (str) represents distribution name. "source" (str) denotes source package name. "person_id" (str) indicates developer identity. "closes" (int), "high" (int),  "medium" (int), "low" (int) are the features.
  • source_bugs.csv: In this file, three columns are present. "distro" (str) represents the distribution name. "source" (str) represents the source package name. "bug_count" (int) denotes the number of bugs that source package has at a particular distribution.
  • source_sizes.csv: In this file, three columns are present. "distro" (str) represents the distribution name. "source" (str) represents source package name. "size" (int) denotes the size of the package.
  • Dependency folder: Within this folder, ten dependency lists are present. Each file contains two columns i.e "start" (str) and "target" (str). Both of them represent source packages. So, we read as the "start" source package depends on "target" source package. 

Here is the arxiv version of our paper:

Here is the portal link:

