Simulation of BRKSS Architecture for Data Warehouse Employing Shared Nothing Clustering

The BRKSS Architecture is based upon shared nothing clustering that can scale-up to a large number of computers, increase their speed and maintain the work load. The architecture comprises of a console along with a CPU that also acts as a buffer and stores information based on the processing of transactions, when a batch enters into the system. This console is connected to a switch (p-ports) which is again connected to the c-number of clusters through their respective hubs. The architecture can be used for personal databases and for online databases like cloud through router. This architecture uses the concept of load balancing by moving the transaction among various nodes within the clusters so that the overhead of a particular node can be minimised. In this paper we have simulated the working of BRKSS architecture using JDK 1.7 with Net beans 8.0.2. We compared the result of performance parameters sch as turnaround time, throughput and waiting time with existing hierarchical clustering model.


Maximum Possibility of Clusters and Nodes
Here, the number of clusters formed and the number of nodes depend upon the number of ports in the switch. Two ports of the switch will be used for connecting with the console and the router. Suppose that'd' is the number of nodes in each cluster. In Table-1, the table gives an idea about the maximum number of clusters that could be formed. Here, up to 64 port switch have been shown which could be increased based on how much large is the data warehouse.

II. PROPOSED ALGORITHM
To overcome the limitations of load balancing in shared nothing clustering Inter-query Parallelism has been implemented in the proposed algorithm where many diverse queries or transactions are executed in parallel with one another on many processors. This will not only increase the throughput but will also scale up the system.

The steps of the algorithm are stated below:
Step-1 : Consider the number of transactions entering into the system in a batch mode.
[Suppose 'm' numbers of transactions are there in a batch] Step-2: Check the number of clusters.
[Suppose 'c' be the number of clusters] Step-3: Calculate the maximum value for each cluster (max c ) and node (max n ). max c = m/c max n = max c /d Where, max c = 0 and max n = 0 initially and d is the number of nodes in a cluster.
Step-4 : Distribute all the transactions evenly in the cluster based upon the max c value and in the nodes based upon max n value.

Node Based
Step-5 : Now, calculate max q = max n /10 Where, max q is the number of transactions that will enter into the MLFQ apiece time for execution and also calculate rem n = max nmax q for apiece node Where, rem n is the remaining number of transactions of a node.
Step-6 : Now for Node based Load Balancing, perform MLFQ Scheduling in apiece node.
Step-6 (a) : Allocate a ready queue to the processor of all the nodes and split the ready queue into 'q' number of queues.
Step-6 (b) : Put highest priority to q 0 as q 0 is the first queue and lowest priority to q n as q n is the last queue.
Step-6 (c) : Perform Round Robin Scheduling from q 0 to q n-1 and FCFS in q n .
Step-6 (d) : Follow the MLFQ rules while performing the scheduling. Considering two jobs A and B entering into the queue, apply the following rules: Rule-1 : If Priority (A) > Priority (B), A will run (B doesn't).

Rule-2 : If Priority (A) = Priority (B), A and B both run in RRS.
Rule-3 : When a job enters the system, it is placed at the highest priority, that is, the topmost queue. Rule-4 : Once a job uses up its time allotment at a given level (regardless of how many times it has given up the CPU), its priority is reduced, that is, it moves down one queue. This is called the Gaming Tolerance.
Rule-5 : After some time period S, move all the jobs in the system to the topmost queue. This is also known as Priority Boost. The above rules are applicable for a transaction or a query as well.
Step-6 (e) : At the end of apiece transaction, take up a new one from rem n . Step-6 (f) : After time interval t z, status regarding the number of executed transactions and the remaining transactions will be send to the buffer from apiece node of a cluster.
Step-7 : If value of rem n does not become 0 within time t z, perform Node Based Load Balancing through Push Migration approach.
Step-7 (a) : After receiving the status, check in the buffer.  If rem n = max n / 2 in all the nodes, then situation is stable, continue with the execution and move to Step-9.  If rem n > max n / 2 in all the nodes, then give them more time to reach the stable situation and then move to Step-9.  If rem n = max n / 2 in half of the nodes and rem n > max n / 2 in other half, then give some time for execution so that most of the nodes would either reach to rem n < max n / 2 or rem n = max n / 2. Then move to Step-9.  If in most of the nodes rem n is much less than max n / 2 and in a few nodes rem n = max n / 2, then continue with the execution and after that move to Step-9.  If rem n is much less than max n / 2 in maximum nodes and in some nodes rem n > max n / 2, then start performing load balancing.
Step-7 (b) : When condition 7 (a) (v) occurs in the node(s), then send a signal to the console through switch.
Step-7 (c) : Console in return will send an instruction to the node(s) to submit the remaining transactions rem n .
Step-9 : With the end of all the transactions, again a new max c will enter and repeat the above steps. Cluster Based If after t y time, the console does not get any information regarding a particular cluster, then it will assume that a fail over has occurred in the cluster. Then the console will perform cluster based load balancing to shift the load of the fail over cluster to the rest of the active clusters.
Step-10 : After time interval 't y ', console will check the executed transactions max e and the remaining transactions rem c for apiece cluster and a copy of rem c transaction will be send to the buffer. rem c = max cmax e Step-11 : Perform Cluster Based Load Balancing through Push Migration approach when cluster fail over will take place.
Step-11 (a) : After time t y, check in the buffer.
 If rem c = max c / 2 in all the active clusters, then situation is stable, continue with the execution and wait for condition 11 (a) (v) to occur.  If rem c > max c / 2 in all the active clusters, then give them more time to reach the stable situation and wait for condition 11 (a) (v) to occur.  If rem c = max c / 2 in half of the active clusters and rem c > max c / 2 in other half, then give some time for execution so that most of the clusters will either reach to rem c < max c / 2 or rem c = max c / 2 and wait for condition 11 (a) (v) to occur.  If in most of the active clusters rem c is much less than max c / 2 and in a few active cluster rem c = max c / 2, then continue with the execution and wait for condition 11(a) (v) to occur.  If rem c is much less than max c / 2 in all the active clusters, then performs load balancing.
Step-11 (b): Redistribute rem c of the fail over cluster into the other active clusters that would satisfy the condition rem c < = max c / 2 in the active clusters.
Step-13 : At the end of all the transactions, again a new batch will enter and repeat the above steps. At apiece t z interval, a status about the nodes will be send to the console. The console will get information about the remaining transactions of apiece node (rem n) and will decide whether continuous execution or load balancing is required or not.

Initially,
After third Iteration in d1 and d2, rem n is much less than their max n / 2 and in d4, rem n is stable, but in d3, rem n > max n / 2, so, Node Based Load Balancing is performed. Here, 1,500 transactions would be taken away from d3, making it stable and then putting that load into either d1 or d2. While performing the above Iterations, a status about all the nodes and their clusters would go to the console and it will get updated on a regular basis.

Cluster Based Load Balancing
The console will get information in the time interval t y about the executed number of transactions, that is, max e and a copy of all the remaining transactions rem c . So, when fail over of any cluster occurs, then the console will send the unexecuted copy of transactions of the fail over cluster to the other clusters. At the end of t y 1 interval, console will have the status of max e and a copy of rem c within it, till t y 2 execution ends successfully. After that it will hold a copy of rem c and status of max e till t y 3 execution ends successfully.
In t y 2, a fail over occurs and the console that is holding the value of rem c from t y 1 interval will distribute it to the other active clusters until they themselves come to a value much less than max c / 2.

III. SIMULATION OF THE ALGORITHM AND RESULT ANALYSIS
The BRKSS algorithm has been simulated by using JDK 1.7 with Netbeans 8.0.2 and the database has been maintained by MySQL. The algorithm takes the following user inputs: number of cluster, number of nodes, user queries which may be numerous at a particular time period. User queries are the transaction that determines the performance of a DW. The output is obtained for Turnaround Time, Waiting Time and Throughput for a given set of inputs and the result is compared with existing pseudo mesh schema.
I have discussed the results for three cases, which are shown in Table 3.1, Table 3.2 and Table 3.3. Also, the comparative result analysis of the proposed and existing hierarchical clustering model is displayed graphically.
As discussed, this architecture is based upon shared nothing clustering that can scale-up to a large number of computers, increase their speed and maintain the workload. To support it the proposed algorithm has been simulated and the results shown that the performance of BRKSS is better than the existing algorithm.

IV. CONCLUSION
The simulation of BRKSS algorithm has given positive results in its favour when it is compared with existing hierarchical clustering algorithm in terms of turnaround time, throughput and waiting time. Also, the results are consistent for different permutations and combinations of nodes and clusters.