There is a newer version of the record available.

Published January 31, 2014 | Version v1
Dataset Restricted

FIRE14 Detection of SOurce COde Re-use

  • 1. Universitat Politècnica de València
  • 2. Universidad Autonoma Metropolitana


This data was used for the PAN shared task on source code re-use detection at FIRE2014. 

Please find the task description at


For the training phase we provide an annotated corpus including with the programming language extensions. It includes information about whether a text fragment has been re-used and, if it is the case, what its source is.

  • The collection consists of source codes written in Java and C.
  • Re-use is commited in both programming languages but ONLY at monolingual level.
  • The Java collection contains 259 source codes from to
  • The C collection contains 79 source codes from 000.c to 078.c.
  • Relevance Judgements represent re-use in both directions(a→b and b→a)

In the test phase the only annotation that will be provided in the corpus is the programming language extensions.

  • It is divided by programming language (C/C++ and JAVA) so you do not need any pre-process to identify the programming language of the source codes.
  • Each programming language folder contains 6 folders (A1, B1, B2, C1 and C2) that contains a specific scenario with monolingual re-use.
  • There is not re-use between scenarios so you just need to look for re-used cases among the source code files inside each folder.
  • The name of the files consists of the name of the task which they belong plus an identifier. For example, file "B10021" belongs to scenario B1 and its identifier number is 0021.
  • It could not exist re-use between source codes that belong to different scenarios. For example, you do not have to submit a re-used case between files "B10021" and "B20013". The first one belongs to scenario B1 but the second one belongs to B2.



The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

Please send us a brief description of your intended use of this data and which Institutions are involved. 

You are currently not logged in. Do you have an account? Log in here