TSE_2022_Cross Project Code Context Prediction for Software Development Tasks
Creators
Description
The dataset of 3,469 code context models (Code Context Models.zip) and 14,510 code context patterns (Topological Patterns.zip) from Platform, PDE, and ECF.
Code Context Models
There are 10 files in each directory (i.e., working period), including:
1. interaction events.txt: the interaction events during this working period
2. event timestamps.txt: the timestamps of this working period, including:
- start datetime of the first event
- start datetime of the last event
- end datetime of the last event
- duration of interaction events (in seconds)
3. code elements_extracted.txt: the code elements the developer accesses during this working period, extracted from interaction events (e.g., org.eclipse.ecf.provider.irc.ui/src<org.eclipse.ecf.internal.irc.ui.wizards{IRCConnectWizardPage.java[IRCConnectWizardPage~getConnectID)
4. code elements_resolved.txt: the resolved code elements (e.g., 'org.eclipse.ecf.internal.irc.ui.wizards.IRCConnectWizardPage[getConnectID')
5. repository urls.txt: the urls of the git repositories identified from the code elements
6. code elements_total.txt: all the code elements extracted from the code snapshots
7. code context model.txt: the code context model with the code elements in [4]. This file is like:
vertices:
A@#c0
B@#f0
edges:
c0@#f0@#declare
A and B are the code elements, while c0, f0 are their ids (c: class, f: function). The relation between them is 'declare'.
8. code context model_total.txt: the code context model with the code elements in [6]
9. stereotype roles_total.txt: the stereotype roles of all the code elements in [6]
10. code context model_abstract.txt: the abstract code context model of which the code elements are abstracted as stereotype roles
Topological Patterns
There are 9, 8, and 7 pattern groups mined from Platform, PDE, and ECF respectively. Each pattern group is named 'project_X_MinSupp', where 'X' denotes the X-th cluster of code context models and 'MinSupp' = 0.02. The information about the X-th cluster can be found in the paper's appendix. There are various patterns in a pattern group. For example:
t # 25
v 0 DATA_PROVIDER
v 1 COLLABORATOR-CONSTRUCTOR
e 0 1 declare
12
The id of this pattern is 25, and there are two vertices 0 and 1, corresponding to DATA_PROVIDER and COLLABORATOR-CONSTRUCTOR respectively. There is a 'declare' edge from DATA_PROVIDER to COLLABORATOR-CONSTRUCTOR. This pattern occurs 12 times in all the abstract code context models