Published April 3, 2015 | Version v1
Dataset Open

apimining

Description

Manual migration of a large software project is a tedious, timeconsuming, and error-prone task. Toward reducing human effort in this task, there exist semi-automatic approaches and tools that help in language migration. Those tools/methods require users to define the migration rules/mappings between the corresponding Application Programming Interfaces (APIs) that are used in two languages. The existing tools/methods expect programmers to manually specify such API mappings. There is usually a large number of API mappings and many of them are newly introduced from time to time. Thus, existing tools can support only a subset of needed API mappings. As a result, the quality of the migrated code is reduced due to missing API mappings in such tools.

To reduce manual effort in defining the rules of API mappings for language migration, there exist approaches that automatically mine such API mappings from the corpus of the libraries’ client code that already has two corresponding versions in two languages. They all share the principle that two corresponding APIs and the usages in two languages have textually similar names, and the calling structures and parameters are similar. That principle and those heuristics might not always hold because in general, a library in the target language could use a quite different framework and different names of APIs than the library in the source language. A method call in a library might be replaced by a field access or by multiple statements in the target API. Even when it is replaced by a method call, the set of parameters might be different. Existing mining approaches also focus only on mapping of API methods/ classes, rather than entire API usages.

To address those challenges, we propose StaMiner, a data-driven model that statistically learns the mappings between entire API usages from the corpus of the corresponding methods in the client code of the APIs in two languages. Instead of using the heuristics as in the above deterministic methods, StaMiner is a statistical model that learns from the mappings in such a corpus. Currently, we focus on the API mappings between Java and C#. For each method, it builds a Groum, a graph representation for the API usages in the method. StaMiner extracts from the method’s Groum the connected sub-groums representing the API usages with one or multiple objects. For each of those sub-groums, it extracts its usage symbols. Then, the sequences of usage symbols for all sub-groums are put together separated by commas to form a sequence, called the sentence for the method. The sequences for sub-groums are placed in the sentence in the order of appearances of the first object in the code. The sentence for the respective method in C# is similarly built. Then, all the pairs of sentences for the corresponding methods are used as the input for a symbol-to-symbol alignment algorithm using Expectation-Maximization (EM). After that, StaMiner extends the resulting symbol-to-symbol alignments to compute the alignments for the sequences of symbols in two languages. The aligned sequences of symbols with their scores represent the mappings of API usages. Finally, the aligned sequences of symbols with scores higher than a threshold are presented in either the sequence or graph form.

The contributions of our work include:

 1. StaMiner: a novel, statistical learning method for mining API usage mappings between two languages.StaMiner also maps entire API usages;

 2. A tool to mine API usage mappings in Java and C#;
 3. An empirical evaluation to show StaMiner’s accuracy in mining API usages and supporting language migration.
 

Files

README.txt

Files (219.5 kB)

Name Size Download all
md5:e3a59ca9f7956b07b9669205859bddc6
36.0 kB Download
md5:cd43b95908b483ceae4258d7e9ea9811
77.1 kB Download
md5:bdd5d6dad0be10fb22c449f1d0e72a51
4.3 kB Download
md5:dc14c78e7e065c35c8002d5d1a800972
4.2 kB Download
md5:7086e81c4644108e3a934cf8a9c7795b
4.8 kB Download
md5:218d623dd985b9f87f563782baeb32a4
70.5 kB Download
md5:2d6041c3ae107d2975bef895e51d2eb2
12.9 kB Download
md5:79d4f0b1f31c32c5deeb2b8510d4bbe5
9.6 kB Download
md5:95d6b7906a297c0b0b2178cfc40256a3
228 Bytes Preview Download