A Survey on Graph Database Management Techniques for Huge Unstructured Data

ABSTRACT


INTRODUCTION
Today, the user data is increasing rapidly due to many data generating processes like latest social media networks, rapid adaptation of smartphones and handheld devices further enhances the data creation.The computation of this data is becoming more difficult day by day, as the users of the digital data and networks are increasing by manifolds [1].Traditional databases cannot compute this huge data without complexity for the real-time responses; whereas, in the case of graph databases, a graph is generated for each entity, which speeds up the process.The use case for a graph database scenario is content-based data filtering.Graph database provides better performance and data consistency; hence many researchers are considering the graph models [2].
In order to handle the issues of storing huge data, many of the researchers have presented the concept of graph and graph storage, in which the graphs are implied to model the huge data with complicated design.In every graph, there will be nodes, properties, and edges as the relationship among them.The connected data graph database also offers the significant choice to deal with the structured, semi-structure and unstructured data [3].The graph database offers the fastest response to a query, many times, in milliseconds.Today, the graph databases are widely used in retail, social network, healthcare, communication and other online solutions.Operations like create, update, read and delete are available in graph database system.The drawback of these systems is that it is more expensive by nature than the traditional methods [4].This survey paper discusses the concepts of graph databases, review of the existing research regarding existing computational techniques of data management.Section 2 discusses some basic conceptual aspects of graph databases, modeling, computational techniques and comparisons of techniques.Section 3 provides the literature review of recent research work in graph database management, Graph database computational techniques.Section 4 provides the research gap in recent research work of Graph Databases.Section 5 describes the future research lineup, and finally, Section 6 concludes the paper.

GARAPH DATA
In recent years the way Internet and mobile communication have been used for different and varied needs and applications by a common user, academicians, researchers have been started rethinking as for how to store the huge data which is being generated every day, every hour and every minute.This need for the storage and retrieval of data and information brought back the concepts of graph and graph models [4], [5].
Graphs are used to model complicated structures.The graph is a collection of nodes, edges, and the relationships between them.In the graph, nodes are called entities, and there are many ways in which these entities are co-related in a different type of applications.The connection between these entities is called as a relationship.In graphs, data term "Attributes" related to entities and relationships are called labels.In a graph like structure, data is stored into nodes, and these nodes have some properties.In graphs, relationships consist of properties and connect one node to the other node.
The example shown in Figure 1, demonstrates the relationship between the two animals.In the above figure, two things are identified that Thing-1 and Thing-2, exhibit properties like animal type, name (cat & dog) and relationship.The representation says that the Thing-1 and Thing-2 are dog and cat respectively and are named as cute and handsome respectively.Finally, both dog and cat relationship is mentioned due to its animal category.

Graph Databases
The Graph database system has four different units such as creation, reading, updatation and removing, which can be used in designing of graph data model.The Index free square matrix of finite graph representation is more necessary to get the high-performance graph traversal.Graph database utilizes the square matrix or adjacency then each node manages the direct relationship with the adjacent nodes.The graph database exhibit a single data structure known as a graph, and it has no combined operation, and hence each edge will be connected to another edge.The graph will store the data in nodes having a relationship.The data in the graph database will follow the model graph property.Graph-oriented database is specialized No SQL, where relationships among the nodes are stored and managed generically.Built-in support for relations makes the traversal much faster for multidimensional, interconnected datasets; hence suitable for online transaction processing (OLTP).For the same reason, organic and ground-up products like neo4j offer multifold performance benefits in comparison to multilayer abstractions over the traditional technologies like relational databases (RDB), and object-oriented databases (OODB).It also simplifies the complexity of design and  [5], [6].
Graph databases are quickly making inroads into real life from research laboratories; many social networking enterprises like Twitter, Facebook, and Google have already adopted years ago.Recently the technology-not only the scientific data but also the web and many different kinds of data can be modeled as a graph.This helps to overcome the limitations of RDBMS, like predefined schema and to process complex queries in milliseconds.Especially, lack of schema allows developers to gain high productivity, besides providing the capability to process complex multilevel queries in real-time.E-commerce sites and users benefit from the easy processing of the recommended product.Machine learning algorithms are utilized the most, for the applications such as these where big data analytics is used by global top 100 companies.Bug Localization is another application area worth mentioning for the use of graph databases.Overall there is a variety of domains where graph data modeling can be applied to revolutionize the user experience [7].

Existing types of Graph Database Models
In recent past, many tools are developed using the graph database concept, for example, Neo4J and Sparksee [8].The tools like Oracle spatial and graph, OQGraph, and ArangoDB are designed as an abstraction with the underlying architecture of relational databases MySql, Oracle [9].Until now, there is no industry standard [10], and moreover many of them are designed to be suited for a particular domain [11].In the case of In-memory model, scalability is limited as the memory holds the content [12].Another reason for inefficiency into the model is due to horizontal scaling and layering mechanism.The requirement of the new paradigm is for handling an extensive data; very few models are designed to adopt parallelism as well OLTP.The early models lack standard query language, Application programming interface, and protocols as found in conventional models such as SQL, JDBC, and REST.Lately, Gremlin and SPARQL are gaining consensus, but the adaptation is too slow.

Neo4j (Neo Technology)
Neo4j is a disk-based transactional graph database and named as "World leading graph database."Its first release date was in 2007.Neo4j also supports another language like Python except for Java for graph operations.Neo4j is an open source project [7] available in a GPLv3 Community edition, with Advanced and Enterprise editions available under both the AGPLv3 as well as a commercial license.Neo4j is best graph database for enterprise deployment.It scales to billions of nodes and relationships in a network.Neo4j manages all the operations that modify data in a transaction.In Neo4j both nodes and relationship can contain properties.Neo4j is a graph database that manages graphs and is optimized for graph structure instead of tables.It is the more expressive type of graph database is similar to other graph databases.Neo4j is most popular graph databases today [8].

Hyper Graph DB
It is an open-source database supports hype graphs.Hyper graph [8] is different from the normal graph because in this edge is points to the other edges.In various fields, it is used in the modeling of the graph data.It supports online querying with an API written in Java.It is based on the Hyper Graph DB model.It is a universal data model highly complex and large-scale knowledge application.It has graph-oriented storage and customizable indexing.In this graph database, a hyper edge is easy to convert into a tuple.It is a distributed and graph-oriented database [8][9][10][11][12][13][14].

DEX
DEX [15] is said to be very efficient and bitmaps-based graph database and is written in C++ language.It was first released in 2008.It makes graph querying possible in different networks like social network analysis and pattern recognition.It is also known as high-performance graph database in the case of large graphs and useful for most of the NoSQL applications.The latest version of DEX supports both Java and.NET programming.It's portable and requires only a single JAR file for execution.DEX is called the fourth most popular graph database today [3], [6].

Trinity
Trinity is a distributed graph system [9] over a memory cloud.Memory Cloud is globally addressable in memory key-value store over a cluster of the machine.It provides fast data access power when we have large datasets.It is a large graph processing machine.It provides fast graph exploration and parallel computing for larger datasets.It also provides high throughput on large graphs which have a billion nodes.

ISSN: 2088-8708 
A Survey on Graph Database Management Techniques for Huge Unstructured Data (Patil N. S.) 1143

Infinite Graph (Objectivity)
Infinite Graph is produced by an organization called Objectivity.It is a type of company that works to develops database technologies supporting large-scale, object persistence and relationship analytics.An infinite graph database is a distributed graph database in Java, and it is based on a graph like structure.We can call infinite graph as a cloud-enabled graph database.It is designed for to handle the very high throughput.It is a single graph database distributed across multiple machines.There is a lock server which handles lock requests from database applications.It is capable of dealing with complex relationship requiring multiple hops.It provides graph-wise indexes on multiple key fields and also provides high performance regarding query [7], [10].

Titan
Titan [9] was adopted in 2012.It is written in Java and an open source project.The main benefit of using Titan is its scaling feature.It also provides support to very large graphs and scales with the number of machines in a cluster.It is also highly scalable graph database regarding concurrent users and size of the graph.It provides a batch graph processing with Hadoop framework and also gives answers to complex queries in milliseconds.It consists of three main components: a. Native Blueprints Implementation b.Gremlin Query language c.Rexster Server It follows property graph model and supports Gremlin: a graph traversal query language.It also offers an optimized disk representation for efficient use of storage and speed of accessing data.Applications can interact with Titan in mainly two ways: a.First Method is that calls Java-language API's related to Titan which includes its native API implementation.b.TinkerPop stack utilities such as Gremlin query language built atop Blueprints.

Recent research survey
The research in the domain of graph data is classified into ten different categories by considering IEEE Xplore journals.The categorization is given as below.

Data Store Efficiency
In order to bring the better efficiency of the data storage, some of the issues may exist among these data compression is necessary to store more data.Also, the data standardization may play a greater role to map the data and translate for cloud storage.In the large, the super graph search is required to choose the data graphs features.The recent work done in this category Table 1  Yuan et al. [10] Graph feature mining Query grouping mechanism Achieved better , faster and light weight filtering, Bei et al. [11] Graph search Distributed graph searching mechanism Achieved distributed graph database Goldberg et al. [12] Problem of fragment identification

Heuristic mechanism
Achieves optimized running time

Graph Indexing Method
The improvisation of graph data modeling was done for different data.The following Table 3 briefs the work performed for graph data modeling and graph based management system.

Author
Issue considered Method adopted Result Dongoran et al. [13] Data modeling Index construction, database filtering, sub-graph matching Achieved more path length, more indexing time Kang et al. [14] Dynamic graphs storage and manage Graph based database management system Robust in handling outliers.

Sub-graph matching method
This part briefs some research ideas presented by many researchers in data querying, sub-graph matching, etc.The recent works for better sub-graph matching are presented.Also, the works shown in the following Table 4 gives the ideas about various graph data techniques.

Author
Issue considered Method adopted Result Giugno Shasha [15] Graph querying Regular expression graph query language that combines Xpath and Smart; hashbased finger-printing performs well for small query graphs on large graph databases (in the thousands Bröcheler et al. [16] Sub-graph Matching probabilistic method to estimate probabilities; Partition algorithm for creating index Works efficiently, answering 778M edge real-world SN in under one second.Bröcheler et al. [17] Approximate sub-graph Matching

PMATCH algorithm
Efficient and scales to over a billion edges.Bröcheler et al. [18] Sub-graph Matching; longtailed degree distributions delicious social book-marking service Faster than static cost models for warm caches.Hong et al. [19] Set similarity Set similarity pruning and structurebased pruning; dominating-set-based sub-graph matching; inverted pattern lattice and structural signature buckets are designed outperforms state-of-the-art methods by an order of magnitude Hoksza and Jelínek [20] protein-protein interface (PPI) identification knowledge-based approach Using Neo4j for mining protein graphs in comparison to Microsoft SQL Server, Neo4j is a viable option for small, sub-graph query types

Semantic
The recent works that are addressed the semantic approach, Query semantic data processing; data analysis in a graph database is given Table 5 [34] Graph Modeling For Mobiles implementation of an extractor module (in java language) Reverse engineering from iOS platform to Android platform Leida and.Chu [35] Distributed SPARQL query answering over RDF data streams Business Process Monitoring domain for Query workload balancing Approach for efficient and scalable query processing over RDF graphs distributed over a local data grid.Mordinyi et al. [36] efficient data store that is capable of versioning and querying local and common concepts NoSQL graph database outperforms ontology stores and match solutions relying on relational databases Balboni et al. [37] Evolution Analysis natural language processing engines to build temporal graph database Got large amount of open source documents Wu and Chen [38] Frequent Sub-graph Mining By normalizing the incidence matrix Achieved higher speed and efficiency John et al. [39] Learning process enhancement against population Natural Language Processing enhanced learner centered online learning experience Xu and Luo [40] Expression-Driven Sketch Graph Matching for Face Recognition multi-layer grammatical face model recognition rates were improved, especially for the smiling and screaming faces whose line-edge maps are greatly distorted Figueira and Libkin [41] Querying Graphs Parikh automata real-life querying

Social Networking
The recent ideas towards the graph data generated by social networks are presented in the following Table 6.

IntA
Survey on Graph Database Management Techniques for Huge Unstructured Data (Patil N. S.) 1141

Figure 1 .
Figure 1.Example of Graph data Figure 2. Units of graph database system

Table 6 .
ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 8, No. 2, April 2018 : 1140 -1149 1146 Work for social networking graph database Comp Eng, Vol. 8, No. 2, April 2018 : 1140 -1149 1142 implementation; popular notation is "if you can see whiteboard, you can graph."Being a high-level abstraction to the network model database, it reduced the coding effort to one-tenth; it's a key technology used in rapid application development (RAD)

Table 1 .
below.Work for data store efficiency In order to facilitate isomorphism and similarity queries and building efficient graph database systems and accelerate graph similarity search, much significant works is performed.The following Table2is the some of the chosen work in database indexing method.

Table 2 .
Work for database indexing method

Table 5 .
. Work for graph database semantic A Survey on Graph Database Management Techniques for Huge Unstructured Data (Patil N. S.) 1145