A Proposed DDS Enabled Model for Data Warehouses with Real Time Updates

ABSTRACT


INTRODUCTION
In the today's world of internet era, role of electronic information cannot be ignored. Effective processing of these electronic information helps manager in daily activities during decision. According to Inmonin, data warehouse can be defined as subject oriented, nonvolatile, time-varying and integrated collection of data and these attributes makes its different from operational databases [1]. Data warehouse enables its users generally decision makers such as managers to make better and faster decisions during strategic planning. If data provided by data warehouse is efficient and informative then it helps managers to take better decision during the strategic planning of an enterprise. Need of automated warehouse is also increasing now a days. If automation of warehouses is done correctly, it helps in reducing costs, efforts and the most important to reduce the human errors which may lead to incorrect decisions corresponding leads to inefficient utilization of data warehouse. Several researchers concluded ERP as solution. ERP stands for Enterprise Resource Planning. With the help of ERP, an enterprise can perform certain operation of enterprise such as collect, store, interpret and manages data from many processes. ERP if successfully implemented will results in efficient utilization of resource and efficient management which leads to better decisions. Implementing ERP requires significant investment. If it fails it may lead to significant financial losses. Challenges that were identified in literature are business process complexities, proper understanding the organization needs, skilled staff etc. Around 66 to 70 organizations which implemented ERP were failed to reap the benefit the ERP [2]. Data warehouse can be think as one of the important components without which existence of decision support system is very hard to realize. Now a day's policy makes or decision makers of an enterprise started deploying and focusing on maintaining Data warehouse. Any industry for example let's say retail, telecommunication, healthcare manufacturing etc. are maintaining Data ware houses to support their decisions or simple words to improve their decision making capabilities [1] [5]. This improved decision making helps in accuracy in future forecast or to improve their earnings in business.
Preparation of Data warehouse passes through three major operations i.e. Extract, Transform and Loading. In short these processes are combining known as ETL. Collection of data from different data bases may contains errors and anomalies. If such data is directly put into data warehouse and decision is taken on this prepared data ware house, then? Definitely it will results in wrong reflection in output or organization performance. So all these three processes i.e. ETL needs to perform to clean the data and then loading these data to data warehouse is performed.
Rest of the paper is organized into five sections. Section II contains the literature survey. Section III contains the proposed method. Section IV contains the proposed algorithm and expected outcomes. Section V contains the conclusion and future research directions.

SYSTEMATIC LITERATURE REVIEW PROCESS
Authors in their paper conducted a survey which was based on Telecommunication Company. Telecommunication Company has small warehouse consisting of scratch cards and simcards. The whole process is carried using manual entry excel sheets. The aim of this survey is to find out the processes or procedures which can be automated. When this step is completed successfully, another step is to choose software program. Software program is chosen according to need of an enterprise and can withstand with the large amount of data. Automation of warehouse helps in controlling, movement and storage of products along with enhanced security. Author in their automation applied the FIFO concept.
Authors Nur Hani et al. in their paper titled "User Requirement Analysis in Data Warehouse Design: A Review" discussed about the various analysis approaches that focusses on the role of user requirement in data warehouse design. Four broad categories in which user requirements approaches can be classified are: Goal driven, data driven, mixed driven and mixed approaches. These classification was performed by the researchers in order to identify the role of user requirements but it is very difficult for data warehouse designer to find out the suitable technique which they should select in designing of data warehouse [4]. Author in this paper also discuss about the strength and weakness in these four categories. According to the author the most critical phase in data warehouse development is requirement analysis. In the papers [6] [7] author also shown that 80% DW project fail to fulfil business objectives. Because of variation in end user. Some of the researcher also mentioned in their papers about ignorant behavior of decision makers towards this phase [9] i.e. Requirement analysis phase. They were more concerned about technical aspects rather than requirement analysis phase [8]. Concept is more clear if any IT people will unable to understand or there is miscommunication between IT and policy makers or decision makes, will lead to poor data warehouse design which ultimately results in failure of date warehouse objectives [10].
The first approach is Data-driven approach. Some research papers refers this Data-driven approach by other name known as supply-driven approach [1] [7].In this kind of approach Database administrator plays a very important role. Transactional data is analyzed and logical schema is build. Generally this kind of approach eliminates the need of user involvement.
Second approach is user driven approach [11]. This approach uses the concept of bottom up. Project manager plays a key role. Project manager has the responsibility to document all the requirements of different business user. This documented information is integrated with data warehouse.
Goal-Driven is third approach. In this approach top level management plays an important role. The management person or policy makers decides the goal priorities. Based on these goals, data warehouse is expected to give the answers i.e. how much these goals have been achieved [12].
Fourth approach is Mixed-Driven approach. This kind of approach have been develop to strengthen the requirement analysis.
Winter & Strauch [13] in their research paper proposed an approach that requires two things, identify the end users that plays lead role in decision making in an organization and an application that can connect data warehouse to information. End user will decide their organizational requirements but in priority wise. Before these requirements is finally converted into information and finally mapped with data warehouse, this end user requirement process is iterated till end user satisfies with its outcome.
Data driven approach has several strengths such as data availability decides design of data warehouse. The schema generated with this approach is known for their stability [4]. But this approach almost ignore the involvement of end user. Also some of the researcher agrees on this point that it is very difficult to perform the ETL process on large data sources in order to generate relevant information. In user driven approach, end user gets priority. This kind of approach is highly appreciated by the end user. But this kind of approach has certain limitations. Such as it is very difficult to satisfy all the requirements of end user by mapping it with warehouse. Authors agrees on this point that requirement engineering must be performed to ensure the smooth process. Shao et al [14] in their research talked about the Real-time data warehouse. In their research they researched about the structure of real time data warehouse. They structured the data warehouse which is based on double mirror replication mechanism and multi-level caches. Now a days the data ware house cannot be considered in isolation for decision making. Competitive world of todays is demanding. To make policies/decisions, both data i.e. organization data & data from outside worlds (competitors) are required. Security is also measure concerned now a day's. Use of good encryption technique/algorithms can be a solution (old solution). Authors in their paper compared the various techniques proposed in various articles on the basis of securities parameters such as: Encrypted data, Audit control, extendibility, platform independence model security, transformation, creation of PSM, QVT support, integration of multiplatform data. There is certain issues which must be taken care while designing of data warehouse. Data warehouses are decisional information artifacts that are embedded in the organizations that create/maintain them. Therefore, their contents must be highly supportive of the decision-making activity of organizations. The decision-making activity, in turn, is tightly coupled to the goals that an organization sets for itself. But the approaches discussed above do not take into account the larger organizational context in which the DW is to function. a. How can we ensure correct requirements? Correct query set that data warehouse is supposed to answer. b. Re-examine the notions of goals and scenarios for data-oriented systems. c. It can be seen that the requirements engineering problem for data warehouse systems is the inverse of that for functional systems, the former is aimed at the discovery of data and de-emphasizes functionality whereas the latter aims to discover the functionality of systems and de-emphasizes data discovery. This shift in emphasis demands for re-examination of the notions of goals and scenarios for data warehouse systems. d. How can an actor (stake holders) ensure about facts that are provided by data warehouse are meeting with the expectation in decision making process and in their success? e. Looking at software engineering and information system view so requirements engineering in context of Data Warehouse. It is well known that a data warehouse can be looked upon from the organizational and from the technical perspectives. The former looks upon the warehouse as embedded in an organization and considers the manner in which it supports organizational tasks. The latter deals with issues of data warehouse contents, their structure etc. The organizational view of data warehouse corresponds to the Information Systems perspective of Requirements Engineering whereas the technical view corresponds to the Software Engineering view. None of the approach for data warehouse development discusses the development of data warehouse from both points of view. f. Who should involve in requirement identification phase? g. How to avoid contradiction between expectations of various stakeholders and designed data warehouse? h. General lack of specific guidance for the requirement elicitation process for Identification of data warehouse contents. Number of authors has proposed to adapt traditional requirements engineering approach in specific context of development of data warehouse. But these approaches lack in specific guidance for requirements elicitation [15], [16], [17]. For example, the proposal of [Fab03] to build a framework for DW requirements engineering provides pointers to RE approaches that may be applicable, but does not establish their feasibility and also does not consider any detailed technical solutions. i. Lack of Automation of the Requirements Elicitation Process. None of the approach provides automation of the application of the requirements elicitation process. Few CASE tools for DW conceptual design have been implemented. In ADAPT and in GOLD, conceptual schema is directly drawn by the designer but no active support for requirements elicitation is provided.

PROPOSED WORK
This section discusses about the proposed work.

Initialization Phase
The first step is to identify the correct expectations from data warehouse of an actor (actor can be business experts, analyst experts, stake holders, project managers etc.). To find out the correct expectations to ensure correct decision, concept of a formal discussion (which can take place through online or) is proposed. It is expected that all actors such as business experts, analyst experts, stake holders, project managers etc. should be part of this discussion phase. All the experts (includes business experts, analyst experts, stake holders, project managers and any other important management or decision making person) will put their expectations (in form of draft document). Now this draft document will be verified by the technical experts (software engineer, DBA etc.) to ensure the valid expectations from designed data warehouse prepared from various data sources. If technical experts team find some invalid expectations or say some expectations for which out data sources doesn't contains any supportive facts will considered for elimination otherwise final draft is prepared and send or informed to every experts involved in discussion phase. Final draft is actually query sets which designed data warehouse software is expected to answer.
Mapping engine will contains program which is designed by the software team in order to map the query set requirements to data sources in order to create data warehouse. Mapping engine will also contains an intelligent program DDS which is responsible for triggered update.

Update Phase
Once when the first phase is completed successfully i.e. requirements or query sets is mapped with data sources and finally data warehouse is built, now it is ready for its users to ask queries and providing them Accurate answers. But what about update. As every time data sources are receiving records and these records after the ETL process should be loaded to Data warehouses to ensure accuracy in decision making. Proposed model includes two types of updates i.e. periodically and triggered update. Virtually data warehouse is divided into two parts: partition-1 and partition-2. Partition-1 contains the current time records which was uploaded to data warehouse by the mapping engine from data sources after performing ETL process. Partion-2 contains the historical records or records up to a certain period (i.e. information before periodic update can take place).
Mapping engine will update the data warehouse partion-1 after a period of time as defined in mapping engine program. There is also a provision of triggered updates. This kind of update takes place when DDS detects deviation from the expected pattern. DDS which is an intelligent program embedded in mapping engine will continuously monitoring the pattern from data sources. When DDS detects deviation ≥ Tdev , will set the FLAG == Triggered Update which will result in immediate update of partition-2 from partition-1 and simultaneously an alert is generated which is send to its users to catch their attention.

PROPOSED WORK
The Proposed algorithm is as follows:  message "authentication successful"; message "ask queries" from DW_Partition 2 ; } else { message "authentication unsuccessful" or "try again" go back login_screen;}

RESULTS AND OBSERVATIONS
Here are some of the queries (issues) which proposed model is able to answer. Query 1) How to ensure correct requirements to meet expectations of every actor? Verification and validation procedure in requirements/expectation during initialization phase.
Query 2) How to avoid contradiction between expectations of various stakeholders and designed data warehouse? By involving database designer and software experts in initialization phase. Involving database designer and software experts in initialization phase will ensure verification of correct requirements i.e. approx. correct mapping b/w their query set into required data warehouse to meet expectation of their user.
Query 3) How it improves REST alignment? If REST alignment is not done in an efficient manner, it will lead to defective development of data warehouse or simply efforts will be wasted. Misalignment will lead to disappointment as what the experts are expecting from data warehouse software, is unable to answer or convince or to provide supportive facts through which forecast or decision could be taken. Involving participation of management experts, top officials, analyst experts, advisors or any other experts along with technical experts during the initialization phase will ensure approx. accurate alignments.
Query 4) Does the proposed model include provision for real time data warehouse update? Yes, proposed model includes two types of updates i.e. periodic update ≥ Tperiod &Triggered Update When mapping engine module (inbuilt with deviation detection system (DDS)) detect some pattern which is not expected, it will immediately raise an alarm and update flag will be generated and data warehouse is updated immediately.

CONCLUSION
Increasing dependency on digital world gives birth to role of electronic information. If processing of electronic information is done effectively, helps in decision making and better future forecast. Data warehouse construction involves certain issues which needs to be resolved or minimized in case if they are difficult to eliminate. Issues may arise due to inconsistency of data, conflicts between logic, cost, user acceptance REST alignment etc. In this paper an approach named Deviation Detection System (DDS) has proposed. DDS approach tries to solve these above mentioned issues upto certain extent (may vary organization to organization needs but improved observations can be made). From table 2. It can be clealy observed that proposed algorithm reflect improved observations.