Visualizing Linked Open Statistical Data to Support Public Administration

Open data have tremendous potential which however remains unexploited. A large part of open data is numerical and highly structured. We concentrate on those data and capitalize on linked open data (LOD) as the underlying technology. In this paper, we present a number of tools to facilitate publishing and reusing of linked open statistical data. We propose an architecture and implementation that allows developing custom visualization and analysis tools without knowledge of LOD technologies. We further present work towards deploying relevant tools in six different countries to improve decision-making and transparency and thus support public administration.


INTRODUCTION
Opening up governmental data is a political priority in many countries to stimulate among others innovation and economic growth (e.g. [1]). As a result, a large number of public authorities have launched and maintain relevant portals [4]. e expected benets of opening data are multifaceted and range from transparency to economic growth. As an example, the global annual economic potential value of Open Data has been estimated to $3 trillion [7]. However, the potential of Open Government Data (OGD) has been unrealized to a large extend [9]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro t or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permi ed. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speci c permission and/or a fee. Request permissions from permissions@acm.org. dg.o '17, Staten Island, NY, USA © 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM. 978-1-4503-5317-5/17/06. . . $15.00 DOI: h p://dx.doi.org/10.1145/3085228.3085304 e di culty in exploiting open data seems surprising if we consider the huge importance data have in modern societies. Indeed, during the last years, businesses, academia and government employ various data analytics methods on their own data with great success. For example, business intelligence methods are employed by enterprises to help them survive in the global economy [13]. In addition, evidence based policy-making relies on data analytics to assist policy makers in producing be er policies [10].
is di culty could be explained by a number of barriers (legal, political, institutional, social, and technical) that hamper the interaction between public administration and society (citizens and enterprises). Public authorities publish open government data in an ad-hoc manner based on existing processes, according to their mandate, and o en under unclear licenses. ey also design and deliver services in a top-down manner. On the other hand, society has needs and data-driven public services, not raw data, can address these needs. As a result, society should be involved in service co-production to ensure that public services address their needs [12]. Society s data (e.g. citizen produced data, business data etc.) should also be re-used and combined with OGD in order to enable innovative public services.
In this paper, we present how Linked Open Statistical Data (LOSD) can be practically employed to overcome some of the abovementioned barriers. We concentrate here in the technological barriers in open data use. e rest of this paper is organized as follows. In section 2, we outline LOSD. In section 3, we present a proposed so ware architecture as well as a number of tools we have developed to exploit LOSD. In section 4, we present six real-life scenarios where our tools have been deployed to support policy-making and the expected results. Finally, in section 5 a discussion is provided along with the conclusions and future work.

LINKED OPEN STATISTICAL DATA (LOSD)
A large part of OGD is of a statistical nature, meaning that consists of numeric values that are highly structured [5]. Moreover, Linked Data has been introduced as a promising technological paradigm for opening up data because it facilitates data integration on the Web. In the case of statistical data, Linked Data has the potential to realize the vision of performing data analytics on top of integrated but previously isolated statistical data across the Web [6] [8]. A fundamental step towards this vision is the RDF Data Cube vocabulary [11].
is emerging eld (aka Open Statistics [3]) studies aspects of open data, data warehouses and business intelligence as well as linked open data. It provides pre-existing business intelligence capabilities on the Web but also propose new analysis capabilities that were not possible before. An example area is policy-making where integration of distributed data sources is needed to inform decision-making.
Currently, a number of pioneering organizations provide their open data as LOSD. ese include national statistics o ces in many countries such as the UK 1 , Scotland 2 , Italy 3 , Japan 4 , and France 5 .
Eurostat also aims at addressing Open Statistics challenges. For example, DIGICOM project aims, amongst others, to facilitate automated access to European aggregate data for heavy users or redisseminators and to improve access to micro data through linked open data.

LOSD TOOLS
We have developed a number of ICT tools to support publishing and reusing of LOSD. Tools for reuse are based on a proposed architecture to facilitate so ware development without LOSD programming skills.

LOSD publishing
e developed tools for LOSD publishing are based on existing open source tools, mainly Gra er 6 . is is a command-line application that enables creating LOSD from various le formats, such as CSV. We have developed two tools that enable creating LOSD from various le formats.
3.1.1 Table2qb and Gra er. e Table2qb tool 7 , implemented with Gra er, takes data in a speci c tabular structure, either as a CSV or Excel le, and converts it into an RDF Data Cube. Its functionality includes representing the data as a series of observations with dimensions, a ributes and measures, and generating the associated Data Structure De nition.

Data Cube
Builder. Data Cube Builder 8 is a tool for transforming non-RDF data sources to RDF Data Cube. It is built on top of the pre-existing tool TARQL. Data Cube Builder can be used through multiple interfaces such as desktop UI, command line, web user interface and as a web service.

LOSD reusing
LOSD tools can be used for data exploration and visualization. is includes performing classic OLAP operations, such as slicing and dicing.
In order to compare our approach to previous developments we adopt a simple so ware architecture that consists of three layers. e human interface layer is handling human-computer interaction, the business logical layer contains the main logic of the so ware and the data access layer is handling the communication with the data store [4]. n our case, the data store is an RDF management system, such as Virtuoso or Sesami, where LOSD reside. e communication with the data store is performed through a SPARQL endpoint using SPARQL, a language for managing RDF data.
In Figure 1, the traditional architecture of two tools, namely OLAP browser and Cube Visualizer, is presented. ese are monolithic, vertical applications where so ware is developed for all three layers. ese human interface and business logic layers are di erent in these tools. However, the data access layers are similar in both tools providing common data managing functionality. Nevertheless, these had to be coded separately leading to additional costs. More importantly, the development of the data access layers requires signi cant programming expertise in LOSD, a skill that is not widely available between programmers. is is a signi cant barrier for the development of LOSD reusing tools and the exploitation of LOSD in general.
In Figure 2, we present the same tools using an alternative layered architecture we have developed. Here, we have developed a JSON API for accessing RDF Stores in an easy and uniform way. is is based on a relevant speci cation that we devised. us, there is no more a need to implement a di erent data access layer for each tool as the same functionality is available through reusing the API.
As a result, it is now much easier to build so ware applications for reusing LOSD. Actually, all LOSD-related programming is now hidden and thus tools can be developed using only common Web programming skills, e.g. CSS and JavaScript. More technical details are outside the scope of this paper. e interested reader is advised to consult the relevant technical project reports and publications.
A short description of an indicative list of tools that we developed follows. Before that, we present some details on the JSON API Data Cube Access Speci cation and Implementation.
3.2.1 JSON API for Data Cube. JSON-QB API enables accessing data stored as RDF Data Cubes in a way that could be easily used by typical Web developers, i.e. programmers with JavaScript skills but without knowledge of Linked Data.

Data Cube
Explorer. Data Cube Explorer 11 is a web-based tool that catalogues and presents details of available data cubes to the users. It also enables users to preview cube data using pivot table, cube browser and other visualization widgets.

OLAP
Browser. e OLAP Browser 12 enables the exploration of RDF data cubes by presenting each time a two-dimensional slice of the cube as a table. OLAP Browser is based on the JSON-QB API for Data Cube implementation.

Cube
Visualizer. e Cube Visualizer 13 is a web application that creates and presents to the user graphical representations of an RDF data cube s one-dimensional slices. It is built as a client of the JSON-QB API implementation.

LOSD SUPPORT OF PUBLIC ADMINISTRATION
We have started deploying LOSD tools in six di erent se ings across Europe, as shown in Table 1. In this section, we describe the results of the activities performed so far. ese include problem de nition, expected project/service and progress so far.

e Flemish Government
Citizens and companies want to compare reported and permitted emissions or emission ratios between geographical regions or companies. ey also want to link emission data with population 9 h ps://github.com/OpenGovIntelligence/json-qb 10    Finally, government is also target user of the nal service as it can be assisted in revising permi ed pollution levels in a certain area based on existing levels and reported emissions. Additional LOSD, R-statistics and visualization libraries are needed to develop the nal service. Figure 3 presents emission points across Flanders on a map. e user can explore locations, organizations, and detailed pollution data. ese data come from ve disparate datasets that are connected following the linked data principles. e Flemish service is available online at h ps://ontwikkel.milieuinfo.be/emissiepunten/.

e Estonian Ministry of Economic A airs
If someone wants to buy or rent a at, a house, o ce premises or a land in Estonia, they usually need to go to real estate websites to get the general information (total area, built year, building material, ownership, etc.). If they want to know speci c details or learn about restrictions concerning the at, house, o ce premises or land as well as the area in which the real estate is located, they need to visit numerous di erent databases owned by di erent public authorities. e foreseen service in this pilot will present a large amount of useful information about the at, house, o ce premises or land that di erent types of users are interested in. is information would normally have to be searched for from di erent registers, so the main bene t would be substantial timesaving for users. e nal service will show di erent information on a map, providing visualizations and information on one screen. e target users of this service will be all citizens who need to rent or purchase new real estate or land, real estate broker, real estate developers, investors, notaries and government o cials who are responsible for urban planning and "long range developments/plans" who need information on trends to improve the area. e platform should foremost be able to visualize statistical data in one map solution. e development of this service requires opening up and linking data from disparate sources.
So far, data from businesses registry and car crashes have been transformed to LOSD. Visualization tools have been used to be er understand that data. Fig. 4 presents a map of Tallinn where users can explore car accident incidents.

e Greek Ministry of Interior
e Ministry of Interior and Administrative Reconstruction is in charge of monitoring and managing an approximate number of 11.500 government vehicles, which are used by all Greek Public Agencies. e datasets it possesses originate from di erent sources and have not yet been properly structured and combined in order to be converted into meaningful information, which will facilitate internal decision-making and increase transparency towards the public. e target users of the nal product will be Greek Public Agencies who use government vehicles. ese will use the service to obtain measures and reports on their use of government vehicles. In addition, the ministry will be able to take relevant decisions to the management of vehicles more quickly and accurately. Towards this end, organizational, cultural, institutional and legal challenges should be also identi ed and addressed [2].
So far, a spreadsheet with vehicles has been cleaned and transformed into LOSD while tools have been used to obtain some initial visualizations and insights. Fig. 5 presents a pie chart visualization where users can explore the di erent types of vehicles in relation to registration date, vehicle brand, and fuel type.

e Irish Marine Institute
In Ireland, a search and rescue operation problem is perceived as a cross-country problem of public administration, business and citizens. e rescue team wants to know the current conditions in the waters around the coastline. A member of the team wants to return information, such as geo-located photographs, to team s coordinator so he can be kept up to date of the search team s location and conditions. In addition to the public authorities, the public is involved in searching the coastline. e volunteers want to have access to the same apps and much of the same data as the authorities, but some information may not be available to them. e team s coordinator review the information collected by the app a er each rescue to build up dataset which allows him to develop local search and rescue policies. e nal service will provide a tool in which search and rescue personnel can identify the key areas to search for a casualty in the water for rescue or recovery. is may also include onshore search parties who can provide their location and coastal imagery. e target users of this service are search and rescue services. Statistical analysis of data on entry and rescue/recovery locations, visualization of forecast model outputs, visualization of tra c situation in the bay and predictive analysis of particle tracking are the tools needed to develop the nal service. e data needed in this scenario were already in LOSD format hence this pilot concentrated on reusing tools (see Fig. 6).

e Lithuanian Ministry of Economy
Market research is a national business problem in Lithuania. Entrepreneurs in Vilnius city have no information about the opportunities and competition in the areas they want to start their businesses. ey need to invest a lot of resources in order to nd out if their idea has any potential. Linked Open Statistical Data (LOSD) can be used to simplify market research and decision-making process during the business planning stage. e nal service will let users navigate the Vilnius city map and see all active businesses from up to the ve most popular business areas in the city. e target users of this service will be So far, data from various databases have been transformed to LOSD and di erent visualizations are possible (see Fig. 7).

UK Tra ord Council
e Department for Work and Pensions (DWP) is a central government department, who maintains around 800 Job Centre Plus in England where people can claim out of work bene ts, receive advice on CV-writing and interviews, and apply for jobs. e DWP is reviewing the distribution of these Job Centers and wants open statistical data to help this job done. e nal service will allow exploration of data, especially from a spatial point of view -allowing people to see where needs are greater and where there are available assets or groups who could support an alternative model of delivering Job Centre Plus services. e tool will also provide a dashboard for decision-makers to get the most up-to-date information about worklessness that they can explore, drill-through, etc. e target users of this service will be local DWP teams responsible for recon guring the Job Centers, in conjunction with Local Authority leads for worklessness. Ultimately, the public are the users of the Job Centre Plus.

DISCUSSION AND CONCLUSION
Linked Open Statistical Data (LOSD) is emerging to overcome some of the Open Data barriers. LOSD forces data structure and quality and enables data integration across the Web. erefore, it can signi cant improve adoption and support in decision-making. LOSD however is not without its challenges. One of these is the lack of LOSD tools along with the signi cant programming expertise required for their development. is is mainly related to the need to access data stored in RDF using SPARQL.
In this paper, we propose a new architecture based on standardizing the LOSD Data Access layer of these tools. We propose and implement a speci cation enabling uniform and easy access to LOSD using standard APIs and JSON. Using this API, it is now easy for any Web programmer to develop custom visualizing and analysis tools for LOSD. We believe this is a signi cant step towards wide adoption of LOSD. Future work in this area is to standardize this speci cation.
In this paper, we further present a number of tools that we have used for LOSD publishing and reuse. Tools for publishing enable developing LOSD from CSV and Excel les. Tools for reuse provide browsing capabilities and OLAP operations and capitalize on the API mentioned above. Future work in this area includes improvement of existing tools (e.g. in terms of user experience) and development of more tools to cover additional analysis needs.
Finally, we present six scenarios where LOSD tools have been deployed under real-life situations. e aim is to improve internal decision-making for Public Authorities leading to be er policymaking as well as to increase transparency and information provision to the public and businesses. Future work in the area includes the use of tools in actual se ings as well as the evaluation of the employment of LOSD in terms of tangible and intangible bene ts and problems.