Automatic Data Interpretation in Accounting Information Systems Based on Ontology

Financial transactions recorded into accounting journals based on the evidence of the transaction. There are several kinds of evidence of transactions, such as invoices, receipts, notes, memos and others. Invoice as one of transaction receipt has many forms that it contains a variety of information. The information contained in the invoice identified based on rules. Identifiable information includes: invoice date, supplier name, invoice number, product ID, product name, quantity of product and total price. In this paper, we proposed accounting ontology and Indonesian accounting dictionary. It can be used in intelligence accounting systems. Accounting ontology provides an overview of account mapping within an organization. The accounting dictionary helps in determining the account names used in accounting journals. Accounting journal created automatically based on accounting evidence identification. We have done a simulation of the 160 Indonesian accounting evidences, with the result of precision 86.67%, recall 92.86% and f-measure 89.67%.


Introduction
In accounting, every financial transaction recorded based on the evidence of the transaction. The evidences of the transaction can be invoices, receipts, notes, letters of intent, electricity bill, telephone bill, etc. In accounting information system, entering transaction data into the system is a routine activity undertaken to accommodate the data [1]. In a large company that has high business complexity, data input activities would require considerable time. All economic transactions and accounting events should be reflected according to what it was and be classified into proper accounting elements and statements project [2].
Invoice as one of transaction receipts have very diverse formats. Currently, there are enough studies that discuss the introduction of the invoice format that can be used to speed up entering invoice data into the accounting system. An invoice is a special type of form that contains certain regularities in structure and content that aid in processing. Invoices also contain certain regions containing information that must be extracted from the document. And most of what is printed on an invoice is vital information that needs to be recovered [3]. Form and invoice analysis systems have to be fast, accurate and human independent as much as possible. The variation of information between documents makes the processing task really difficult. Table extraction and processing has been a subject of interest during the last years [4].
In the invoice there are some entities that need to be recognized so that it becomes the basis for the processing of information, such as invoice number, invoice date, product name, unit price, quantity, etc. The process to recognize these entities can be done through several methods. David A. Kosiba, et al,conducted the research to identify the invoice using the method of combining textual and graphical processing by analyzing the intersection line and the line features in the document as well as searching for possible keywords such as item number; quantity, total, etc. T.A. Bayer, et al, [5] proposed a generic system for processing invoices. detecting the location the item is expected and then using this information for syntax driven extraction in this location.
However, the next step required to record transactions in accounting journal still need much time. Therefore, a new model is required to ensure that the accounting recording can be performed automatically based on the evidence of the transaction.
The method used in this paper is based on an ontological approach. The ontology approach is a method to describe the concept of knowledge in a domain, which contains the types, properties, and relationships of concepts within a domain [6], [7] created the ontology model based on structural relationships between tasks and components. [8] proposed a new approach to construct a knowledge base with combination of ontology knowledge model and an algorithm of knowledge base. [9] using ontology knowledge to describe and draw a semantic relationship that represent the complexity of data relationship on learning environment.
Ontology in the field of accounting has been done by some previous researchers [10], [11]. Florin 2007, built an accounting ontology in relation to the dissemination of knowledge in the field of accounting. The built ontology emphasizes the process of collecting accounting knowledge sourced from the expert, the accounting scholarship (from the accounting book) and the policies related to accounting. Florin states that ontology is a collection of objects, properties of the object and relationships between objects in a specific domain. The objects in question include concepts, entities, events, actions and processes. Ontology presents terms that make it possible to describe the knowledge of a particular domain, the meaning of each acceptable term and also the relationship between the terms. The accounting ontology built by Florin does not explain the technical use of accounting knowledge.
The research conducted on this paper concerns the development of accounting ontology used as a basis for mapping of accounting transactions. It can also be used to create an accounting journal automatically. Accounting journals are created automatically based on previously built accounting ontology. These accounting journals was created also based on the Indonesian accounting evidence.

Research Method
The method used in this paper based on text recognition to identify the entities contained in the evidence of accounting transactions. Some rules are created to identify entities in the evidence of accounting transactions, as shown in Figure 1. a. Creating Some Rules Evidence of accounting transactions is analyzed to obtain pattern recognition. Furthermore, the rule was made with the heuristic method based on pattern recognition. The rule was built to recognize the keywords contained in the transaction evidence. b. Entities Identification Identification some entities in the evidence of accounting transactions based on the rule that has been made before. These entities were recognized as the basis for accounting records (accounting journals). Evidence of accounting transactions was transformed from image into text through the Optical Character Recognized (OCR) process. c. Transaction Identifications Transaction identification was performed to determine transaction type based on transaction evidence. Type of transactions can be cash receipt, cash disbursement, sales or purchase. d. Indonesian Accounting Dictionary The Indonesian accounting dictionary contains the grouping accounts of Indonesian accounting. In this accounting dictionary included some keywords that characterize each account. These keywords serve as the basis for determining the account used in the accounting journal. e. Accounting Ontology Accounting ontology was built to helpcreated accounting journals automatically. Accounting ontology was built to describe the mapping of accounts in accounting records. f. Create Accounting Journal As the final result of accounting data interpretation is automatically generated accounting journal based on accounting ontology and accounting dictionary that has been built previously. An automatic document classification system based on the analysis of the graphical information present on documents (i.e., logos and trademarks) has been done by C. Alippi, et al [12]. And V. Gupta, et al, has done research to create rules used to tag the business invoices but limited to only one type of invoice in which no tables [13]. Boumedyen, et al, has built an accounting ontology as a first step for creating an organizational accounting repository that will allow domain knowledge storing and dissemination. A concept hierarchy was developed which can be used for simplifying and understanding accounting system in a better manner [14].
However, this paper not only classifies documents but focuses on defining accounting journals that need to be performed on a financial transaction.

Development of Accounting Ontology
The ontology is a representation vocabulary, often specialized to some domain or subject matter.More precisely, it is not the vocabulary as suchthat qualifies as an ontology, but the conceptualizationsthat the terms in the vocabularyare intended to capture. The term ontology is sometimes used to refer to a body of knowledge describing some domain, typically acommonsense knowledge domain, using arepresentation vocabulary [15].
Development of accounting ontology is an interactive process that requires collaboration between information technology and accounting experts. Ontology is a vocabulary representation, often devoted to certain domains or subjects. Zuniga, 2001, says that ontology in information systems is a formal language designed to show a domain part of knowledge. And it is also designed for one or more specific uses that appear in relation to managing to computerize any information as much as possible. These are all businesses that characterize today's business as well as the educational environment [16].
Development of ontology in accounting transactions can facilitate in understanding the accounting transactions. An accounting transaction is a business event that has an influence on a company's financial statements. Accounting transactions are recorded in the accounting records of a company.
Examples of accounting transactions are as follows:  Receive cash payments from customers  Cash sales to customers  Sales on credit to customers  Employee salary payments Any recording of an accounting transaction has to follow the rules in the accounting equation. Which states that each recording of an accounting transaction shows the value of the asset equal to the liability plus the company's capital. The accounting equation becomes the basis by which a double entry system is performed. The accounting equation is: Formalization of ontology is created to provide an overview of the ontology structure constructed. And alsoto explain about the relationships that occur between elements in accounting ontology. The structure of accounting ontology proposed in this study is to provide an overview of mapping and relationships among accounts in accounting. In Figure 2, the ontology structure of accounting is proposed. There are 2 classes, namely account group and account header, and there are individuals, namely account name. Class account group describes the division of account groups in accounting. Where in the accounting domain there are five main groups, namely assets, liabilities, capital, income and expenses. Class account header is a sub class of the class account group. In the class account header, there are groupings of accounts for each group account. Development of accounting ontology has been able to mapping the accounts contained in accounting. Recording transactions into accounting journals based on accounts contained in an organization's accounting. Accounts in accounting are classified into five groups, i.e. assets, liabilities, equity, income and expense.
Based on the understanding of the accounting equation and accounting cycle as well as the types of accounting transactions, in Figure 3 is shown accounting ontology that has been built using protégé 5.1.0. Accounting ontology that has been built is used to help created the accounting journal automatically on financial transactions that occur.
Development of accounting ontology in Figure 3 can be seen that the ontology centered on the account, which is divided into 5 (five) groups, namely: Then each group is parsed up to the account name level to be used in the creation of an accounting journal. The following Table 1 outlines the levels built on accounting ontology. Associated with the creation of accounting journals, then used as the account name is the accounts that enter the group level 3, level 4 or level 5. Determination of what level will be used based on the highest level in each group of accounts.

Rules Used to Identification Some Entities
The development of rules for the identification of keywords against accounting transactions is done by heuristic method. The heuristic method is used to ignore some possible solutions without having to trace the whole string in full. The heuristic method can be used to reduce the number of searches that need to be done in finding a solution in a string. The entity recognition process in the evidence of accounting transactions is conducted on certain types of transactional evidence, such as proof of invoice transaction with various invoice models commonly used in Indonesia with Indonesian language.
The invoice form analyzed includes 160 (one hundred and sixty) kinds of invoice form in Indonesian language. The invoices are analyzed to create rules to identify the entities in the invoices.
Some rules defined to identify the entities contained in a sales invoice automatically. The rules are applied as an algorithm to determine the classification on a string. Here are some rules designed to identify the entities on sales invoices.

Invoice Date Rule
A rule defined to identify invoice date. In Indonesian invoice, there were some types of

Supplier Name Rule
Supplier name rule was designed to determine the supplier name in an invoice. This rule identified supplier name based on dataset of supplier name. We compared the string in invoice with dataset of supplier name.

Invoice Number Rule
Invoice number rule was designed to determine the invoice number in an invoice. This rule identified invoice number based on dataset of invoice number format. Dataset of invoice number format was constructed with supplier name. The first line is the supplier name, and the next line is the invoice number format of that supplier name. Examples:

Product ID Rule
Product ID rule was designed to determine the product ID in an invoice. This rule identified product ID based on dataset of product ID. Dataset of product ID was constructed based on the product data used in this study. Examples: p001, b001, b002, b003, ipos04, beng, se01, se02, se03, BR-019823.

Product Name Rule
Product name rule was designed to determine the product name in an invoice. This rule identified product name based on dataset of product name. Dataset of product name was constructed based on the product data used in this study. Examples: beras, pepsodent, computer, meja komputer1 d20, meja makan, meja belajar.

Quantity Rule
Quantity rule was designed to determine the quantity in each product in an invoice. This rule identified quantity based on the smallest value (number) found on each line of product item. In the meantime, this rule still ignored if there are discount on each line of product item.

Total Price Rule
Total price rule was designed to determine the total price in each product in an invoice. This rule identified total price based on the biggest value (number) found on each line of product item. Total price rule was designed to found the unit price of each product after we found the quantity of each product and also to get the total invoice value.
In addition to invoices, both purchase invoices and sales invoices, rule-making is also conducted against other forms of proof of transactions, i.e. proof of electricity payment and proof of phone payment.

Proof of Electricity Payment Rule
There are 3 (three) entities in proof of electricity payment recognized by the system, namely (1) customer ID, (2) month and year of payment period, (3) total payment value. A rule created to recognize a customer ID is to search the customer's id keyword in a proof of payment of electricity. The keyword to get customer id is by searching the word {id pel, idpel, idpelanggan, id pelanggan}. For that reason, the rules for recognizing customer id are written as follows: 1. String  {id pel, idpel, idpelanggan, id pelanggan} 2. ID => ID is a numeric after string A rule made to recognize the month and year of payment period is to search for the keyword 'bl/th' in a proof of payment of electricity. To determine the value of month and year, then string after 'bl/th' is the value of month and year of payment.
Rules for recognizing months and years of electricity payments, written as follows: A rule created to recognize the total value of electricity payments is to look for the 'total bayar' keyword found on the proof of payment of electricity. Then after that, when found numeric after the word 'total bayar' then the numeric is the total value of payment.
Rule to recognize the total value of electricity payments, written as follows: 1. String = {"total bayar"} 2. Total payment => total payment is a numeric after string

Proof of Phone Payment Rule
On proof of phone payment, the only recognized entity is the total value of the phone payment. The rule to recognize the total value of phone payments has been made. The rule to recognize the total value of phone payments is based on the keyword 'total' in a proof of phone payment.
The rule to recognize the total value of phone payments, written as follows: 1. String = {"total"} 2. Total payment => is a numeric after string Based on the rules to identify the entities then the algorithm has built. The algorithm to identify invoice number based on the rules that have built before like below. The algorithm was created for every rules that have been built.

Transaction Identification
The application of the rules used to recognize entities against proof of transactions was done after transaction identification. Transaction identification was performed to determine transaction type based on transaction evidence. In this paper, the evidence of the transaction being analyzed is limited to purchase invoice, sales invoice, electricity payment or phone payment.
The rules for recognizing the type of the transaction evidence through the following rules. The rules applied to identify the type of proof of the transaction are executed at the first time the proof of the transaction is analyzed. Any proof of transaction will get a value, when Once it is known the transaction proof type based on the highest value, the next step is to identify the keyword on the transaction evidence by using the entity recognition rules. Any value earned from the rules executed on the transaction proof will be the basis for recording the accounting journal.

Development of Indonesian Accounting Dictionary
The Indonesian accounting dictionary is a list of account names commonly used in Indonesian accounting with a positive (+) or negative (-) sign. These account names are followed by some keywords that characterize and base identification of these accounts. These keywords found in the evidences of the financial transactions. The positive (+) sign specifies that the account in the accounting journal will be recorded in the debit column when the value of the account increases and is recorded in the credit column when the account value is reduced. While the negative (-) sign represents that account in the accounting journal will be recorded in the credit column when the value of the account increases and is recorded in the debit column when the account value is reduced.
The structured of Indonesian accounting dictionary is as follow:

[account name];[+/-];[keywords]
For example, for account name: electricity cost with keywords {electricity, cost, PLN} and with positive sign. PLN is stand forPerusahaan Listrik Negara or National Electricity Company. These keywords could be found in Indonesian electricity bill. We give positive sign because electricity cost would be recorded in the debit column when the value of this account increase. In Indonesian Accounting Dictionary, this account would be recorded as below: electricitycost;+;[electricity;cost;PLN] In this study, we has been compiled 150 lines of accountsin the Indonesian accounting dictionary. Indonesian accounting dictionary is very useful to be a guide in the accounting intelligence system.

Results and Analysis 7.1. Result
Any financial transactions that occur in a company should be recorded in an accounting journal. Recording an accounting journal is performed when the transaction has occurred. Recording is done based on transaction evidence. The evidence of financial transactions may be in the form of invoices, memos, receipts or other official forms whether issued by internal company or external parties.
The accounting journal becomes the basis for the company to prepare the financial statements at the end of each accounting period. Any financial transactions recorded in the accounting journal must follow the prevailing listing rules in the accounting domain. One of the rules on which accounting journals are based is the accounting equation. The accounting equation states that the amount of assets must always be equal to the total liabilities of the company plus the amount of the company's equity.
The process of recording an accounting journal based on proof of transaction takes a long time when entering data into the accounting system. The intelligence accounting system was able to translate the text contained in each transaction proof so that it could automatically perform the classification appropriately.
The use of accounting ontology that has been built is helpful in creating accounting journal entry automation. Based on the accounting ontology that has been built, it can be mapped accounting journal entry process. In addition, the accounting journal entry is automatically also supported by the existence of Indonesian accounting dictionary that has been built. This accounting dictionary contains a list of accounts used in Indonesia with a symbol that specifies the accounting journal entry columns. Algorithm created to generate an automated accounting journal are as follows: 1. Enter each account name from dictionary into array 2. Check the account characteristics of each account in dictionary with proof of transaction 3. Rate each account for each of the attributes earned 4. Specify the account that has the largest first feature value as the primary account 5. Specify the account that has the second greatest feature value as a companion account 6. Create a journal based on the main account and companion account 7. If the main account has a positive (+) sign property in dictionary, then place it in the debit section of the accounting journal, and if it has a negative negative (-) property on dictionary, then put it in the credit section of the accounting journal. 8. Place a companion account as opposed to the main account Algorithm that has been built was applied to recognize the entities in the invoice. Invoices have various forms in which there are a variety of information. In this paper, we analyzed one hundred and sixty kinds of invoices. These invoices have been taken from some companies in some big cities in Indonesia, like Jakarta, Bandung, Yogyakarta and Semarang. We choose these cities because they are the central of economy in Indonesia. Figure 4 displays one of invoice form analyzed. Based on the invoice, then the OCR process is carried out so as to produce a string. String generated from OCR then analyzed using algorithms that has been built. Figure 5 shows the interface result done by a system. In the box detail of the transaction was presented invoice processed by OCR. Then, after getting an invoice in the form of text, the system process the invoice to obtain the desired entities, as shown in the result box. Then, based on the results of the analysis that has been done, accounting journal entries was created automatically. Accounting journal was created based on the ontology that have been built before. XML was created based on the accounting ontology to help creating accounting journal, like below.  Figure 6 shown the journal generated automatically by the system. Purchase -Debt Accounts Payable -Credit This journal to record purchases made without down payment.

Journal 2:
Purchase -Debt Cash -Credit Payable Account -Credit This journal to record purchases made with down payment.

Journal 3:
Purchase -Debt Payable Account -Credit Purchase Discounts -Credit This journal to record purchases made with purchase discount.
Accounting journal generated by the system based on the analysis of the string that has been done. If found string sound like discounts (disc, discount, discounts) with a value greater than zero, then the journal purchases made with include account purchase discount.

Calculation Model Accuracy with Precision, Recall and F-Measure
Classification in this paper used a supervised learning method. In the classification, it takes training set as learning data. Each sample of the training set has the attributes and class label. The training set should be able to represent the state of test data to get good accuracy when performing classification test data. If this is not ensured, then the accuracy usually poor or less well [8].
Confusion matrix presented in Table 3 below based on test of 160 sample data of transaction evidences.  Precision and recall have been calculated based on the confusion matrix table above. Precision (p) = number of samples correctly classified positive category divided by the total sample were classified as positive samples, get the value of 86.67%. Recall (r) = number of samples classified positive divided by the total samples in the testing set positive category, get the value of 92.86%. F-Measure (F1) is the harmonic mean of precision and recall, scored 89.66%.

Conclusion
In accounting domain, every financial transactions recorded in the accounting journal. Financial transactions recorded into accounting journals based on the evidence of the transaction. There are several kinds of evidence of transactions, such as invoices, receipts, notes, memos and others. In a large company that has high business complexity, data input activities would require considerable time.
Invoice as one of transaction receipts have very diverse format. it contain certain regions containing information that must be extracted from the document. In this paper, we have identified several entities in invoice, namely invoice date, supplier name, invoice number, product ID, product name, quantity and total price.
Accounting ontology and Indonesian accounting dictionary have been built. It can be used in intelligence accounting systems. Accounting ontology provides an overview of account mapping within an organization. The accounting dictionary helps in determining the account names used in accounting journals.
Then, based on the results of the analysis that has been done, accounting journal entries was created automatically. After simulation with 160 Indonesian transaction evidences, we get the result of precision 86.67%, recall 92.86% and f-measure 89.67%.