Thesis Open Access
In big data infrastructures, Persistent Identifiers (PIDs) are widely used to identify digital
content and research data. A typical example of PIDs is the Digital Object Identifier (DOI). In
a data centric application (such as a scientific workflow) it is often required to fetch different
data objects from multiple locations. When reproducing a workflow published by community,
data objects involved in the workflow often have PIDs. In this project we investigated how to
optimize the fetching and sharing of DOI identified objects with Information centric networking
paradigm such as Named Data Networking (NDN). In order to achieve that goal, first we
presented an approach for integrating PIDs with Named Data Networking (NDN) networks.
NDN identifies digital objects with their names and route them also based on their names.
In addition, we proposed an approach for optimizing the NDN network’s performance using
application level knowledge, such as the size, number, and order of the requested objects. We
investigated the effect of ordering a group of objects in ascending or descending order according
to their sizes before requesting them one by one. The results showed that the order of the
requests can dramatically influence performance of fetching objects from NDN networks.