Published March 31, 2023 | Version v1
The Graph Analysis toolbox is a number of software components for the computation, processing and analysis of graph collections.

Although the toolbox can be used generally for any type of data sources, it is specially oriented to the processing and analysis of large corpora of scientific documents. Starting from a given corpus of scientific documents (projects, papers, patents, etc.), the tools can be used to generate graphs describing the semantic similarity relations between documents; infer graphs connecting authors according to different types of relations (research affinity, cooperation, co-citation, etc.); and also infer graphs associated to other metadata in the corpus (affiliation, funding organizations, etc.). Additional analysis tools for node profiling and centralization measures can be used to identify the main features of documents, authors or other metadata based on the structure of their connections in the graph.

The key modules of the toolbox are:

1. Import data from files or SQL / Neo4J databases, for the generation or the enrichment of graphs.

2. Generation and processing of graphs.

3. Management and processing structured collections of graphs (named supergraphs)

The results of the analysis can be stored in formats that facilitate their visualization through applications such as the IntelComp-native graph visualization component developed by Work Package 4, and other external applications like Gephi or software modules like Halo.

Moreover, a complete python application has been developed, and all the software functionalities can be exploited by using a terminal-based interface, allowing its integration in automated scripted processes and by user friendlier navigation through a menu hierarchy.


