Analysis of S2 (Spherical) Geometry Library Algorithm for GIS Geocoding Engineering

Geocoding is a common technique to transform address information into digital latitude/longitude format. One of the engineering conversions can be used is Google Maps based on S2 (Spherical) Geometry Library algorithm. This journal explains the quality analysis of the algorithm using geocoding quality matrix testing from hundreds of address data samples particularly on three cities in Indonesia-Jakarta, Bandung, and Balikpapan. However, the result of this research concludes that completeness of address information will affect its overall fourth matrix quality and the linkages of it such as transform success rate, landmark exactness, the score of accuracy and range of radius in meter.


Introduction
Recently, Geographic Information System (GIS) has become a part of digital life that could not be separated from all of activites. So many GIS development methods had been implemented to produce maps services in order to support operational business or stands to support decision activity for some organizations. For example the availability of GIS and GPS (Global Positioning System) used up combination can improve the added value of business organization extensively [1]. Furthermore, the satellite sensing calibration combination also makes a great and efficient way to monitor the economic modernization [4]. Based on economic research, 80% data owned by business organization contains geographic location information such as address data, sales area boundary or distribution route [7] or at least there will be a single address column inside customer, distributor or even supplier database that keeps residence information or sales location.
Those address data is valuable organization asset and obtained to be a powerful resource for a relied GIS by converting the address into a digital location. This process itself called geotagging or geocoding. Geocoding in Figure 1 explains a subsequent conversion process of textual location information (address or place name) into geographic digital data representation [5]. Geocoding engineering process generally built-in from API (Application Programming Interface) facility of online maps service provider such as Google Maps, Bing Maps, OpenStreet Maps, MapQuest, etc. They also provide howto's in order to make GIS system developer easier to use the service. Figure 2 explains that Google Maps is the most favorite utilized online maps service instead of others used by particularly GIS system developer.

Figure 2. Online Maps Service Utilization Comparison Trend
Google Maps APIs provides several online maps services such as maps visualization, distance tracker, geocoding, cluster maps and other features. S2 (Spherical) Geometry Library algorithm is powering behind the Google Maps geocoding API extraction process [11]. Geocoding quality result is the main objective focus for GIS developer system [5] because accuracy issues are always to be the most critical point inside geocoding process resultaccuracy between the latitude/longitude coordinate and the physical address . This is important since customer maps system in business organization-for example-could not ensure whether how accurate its location compared to geocoding data.
In comparison, a 2014 research about geocoding process based on R algorithm was tested and measured according to geocoding accuracy issues with the result as shown in Table 1 [16]. This current research would be revealed the analysis of Google Maps based S2 (Spherical) Geometry Library geocoding algorithm by testing and measuring the accuracy issues according to geocoding quality matrix from given size of address data sample also conclude the resulting test and addressing some solutions to increase the accuracy of geocoding result.

Literature Review
According to Mishra et al, GIS is a system used to gather, integrate, analyze and processing geographic data [8]. Meanwhile, Patil said that GIS is the combination between maps contains digital location data, statistical data analysis and database technology [7]. Her previous research, Risma Ekawati concluded that GIS visualized digital data in maps representation [6]. In business field recently, GIS integrated with marketing programs contains data observatory purpose for sales intelligent system, sales decision support system and sales area coverage analysis [2].
Modern GIS technology support to manage, display and explore business location information, thus evolved from powerful location marked into critical business supporting tool [15]. Instead of business purposes, GIS also used in some other public sectors such as telecommunication, agriculture, health, crime analysis, traffic monitoring, government, research and development, defense system, etc. For example, an effort to decrease air pollution and increase better air quality in Jakarta, Indonesia government s upported by GIS system [8]. This research adopted by previous research done by Williams, L. and Wilkins in 2014. While they used R algorithm for the same geocoding process purposed and the result of its quality as shown in Table 1. Table 1. Accuracy Comparison Hasil Geocoding Result Using R Algorithm [16] According to Venkatesh, GIS consists of 5 elements: software, hardware, geographic data, public data and organization [15]. Meanwhile, Mishra et al concluded that GIS contain 5 elements: software, hardware, data, people and method [8]. In recent times, the elements combination brought significant changes for GIS development technology [6]. For the last decades, Google, OpenStreetMap, and Bing made improvements to their free online maps services. But paid maps service companies like ArcGIS or QGIS also joining the bat tle competition of GIS technology and creating their modern customized GIS features to get more particular GIS system developer as the customer. In fact, Google produced hundreds powerful API's to their maps customers [6]. From period 2005 to 2012, at least there were 800.000 peoples uses Google Maps API for their GIS purposes [3]. OpenStreetMap for another example: a great significant amount of road maps length in Germany increased by the community during 2007-2011 in Figure 3 [9]. Therefore as a conclusion, GIS utilization trend would increase year by year and would prepare for future scientific geographic data transformation, including business needs purpose.
In another phrase, geotagging (or geocoding) described as identification process and extraction of data entity and transform it into geographic content such as people, organization, and location [10]. Basically, geocoding conversion processed based from 3 references layer of geographic object data such as single location point, road segmented data (address, city, postcode) and areal unit group (geographic polygon object) [5]. As the output result, geocoding conversion process from a single address given will produce a group of numbers as known as latitude/longitude data. Google Maps as the most favorite free online maps service provider also provides geocoding extraction process based on S2 (Spherical) Geometry Library algorithm [11]. Maps visualization has the ability to shows a basic quality level of geocoding process. Both models of map visualization used most for analytical study and quality control of multiple addresses geocoding process are heat maps and cluster maps. Heat maps visualization model as shown in Figure 4 represents graph from individual data in different colors [12]. Heat maps and cluster maps are often used to visualize a group of latitude/longitude data results from geocoding process [3]. Both maps model ( Figure 5)using the same statistical pattern recognition as visual identification methodology but with different approach such as: (a) Unsupervised Classification (cluster), used when there is no more data available to process (less of relationship between points or nothing at all); and (b) Supervised Classification (heat), used when there are more data available to process, by using discriminant function g(x), where x shows points under n-dimension Euclidean metric that complied gi(x) > gj(x) for each points relationship [4]. Statistical pattern recognition by cluster approach requires a minimum basic system but still able to show relevant visualization [4]. However, Table 2 explains that heat maps visualization model is more intuitive and accurate in visualizing data representation [14]. Heat maps API from Google Maps is based on geocode clustering data algorithm that similar to Kmeans clustering but faster to process than Fuzzy C-Means clustering algorithm [13].  Figure 6 shows scatter graph from 3 clusters K-Means proves that geocode clustering data algorithm is faster to process heat maps visualization. K-Means clustering basic principle are points grouping, data initialization, data classification, centroid calculation and convergent criteria [13].

Methodology
This research methodology starts from a customized-build software of address geocoding converter prototype based on Google Maps S2 (Spherical) Geometry Library algorithm to produce digital latitude/longitude format and shows the result into heat maps model visualization. Geocoding data result also tested using GPS (Global Positioning System) hardware based on fourth geocoding result matrix quality [5]: 1) Transform success rate, shows the proportional success value of the geocoding conversion result; 2) Landmark exactness, shows exactness percentage between geocode data and physical landmark (eg: street, post code, building, etc); 3) Score of accuracy, shows similarity level between location and geographic reference; and 4) Range of radius, shows distance accuracy level between geocoding location and real exact location. As shown in Figure 7, MySQL database used to keep both input address data and output geocoding data (latitude/longitude). The addresses data are stored in MySQL database with information table and data type as follow:  Figure 8 shows heat maps visualization based on geocoding data result in order to support the final objective research test.

Result and Discussion
There are 6.188 of data addresses population from 3 major cities in Indonesia involved in this research (Balikpapan city: 310 addresses, Bandung city: 1.646 addresses and Jakarta city: 4.232 addresses).  N means population with 5% error rate in each city. The result of the GPS test (based on fourth geocoding matrix quality [5]) as shown on the following Based on Slovin formula, there are 862 sample addresses to test from the total of 6.188 addresses population. Table 4 describes that 100% of 862 sample addresses are successful to transformed (geocoding process) using Google Maps based on S2 (Spherical) Geometry Library algorithm.
The algorithm has the ability to identify overall landmark exactness by street name identification at average 43,795% percentage rate: Jakarta 52,600%, Balikpapan 44,000%, and Bandung 34,783%. From GPS test, similarity rate between location and geographic references reach at average number of 25,557% percentage score: Jakarta 35,068%, Balikpapan 25,143% and Bandung 16,460%. While the overall score of radius range (in meters) shows average 79,907 m: Jakarta 71,759m, Balikpapan 82,114m, and Bandung 85,848m. From the test result, those last third matrix elements (Landmark exactness, Score of accuracy and Range of radius) are actually related to each other. It proves that low percentage rate of landmark exactness followed by lower percentage rate score of accuracy and higher range of radius between latitude/longitude and real position.
During research test, some parameters known affect the result test such as address data completeness, different numbering between I and 1 inside the address information and presumption that Google Maps unable to identify some of apartement name, housing name and building name from the 3 cities as the object of the research including block and number information from the address. In order to increase the quality of geocoding result, it is recommended to complete the address parameters since Google Maps doesn't have housing number data.

Conclusion and Suggestion
The research has demonstrated the analysis of S2 (Spherical) geometry library algorithm from given 862 sample addresses of total 6.188 population based on Slovin statistical formula. As the conclusion of this research, population and sample are known would not affect the fourth geocoding quality matrix significantly. Also, there is a direct proportional of relationships between last third matrix elements (landmark exactness, the score of accuracy and range of radius) that depends on address data information completeness. Heat maps visualization model made easier to analyze cluster distributed addresses as the result of geocoding process. This research shall continue to compare addresses aside from Indonesia to examine the consistency result of this analysis of S2 (Spherical) geometry library algorithm.