Published September 2, 2017 | Version v1
Conference paper Open

How Linked Data can Aid Machine Learning-Based Tasks

  • 1. Institute of Computer Science, FORTH-ICS, Greece, and Computer Science Department, University of Crete, Greece

Description

The discovery of useful data for a given problem is of primary importance since data scientists usually spend a lot of time for
discovering, collecting and preparing data before using them for various reasons, e.g., for applying or testing machine learning algorithms. In this paper, we propose a general method for discovering, creating and selecting, in an easy way, valuable features describing a set of entities for leveraging them in a machine learning context.We demonstrate the feasibility of this approach by introducing a tool (research prototype), called LODsyndesisML, which is based on Linked Data technologies, that a) discovers automatically datasets where the entities of interest occur, b) shows to the user a big number of useful features for these entities, and c) creates automatically the selected features by sending SPARQL queries. We evaluate this approach by exploiting data from several sources, including British National Library, for creating datasets in order to predict whether a book or a movie is popular or non-popular. Our evaluation contains a 5-fold cross-validation and we introduce comparative results for a number of different features and models. The evaluation showed that the additional features did improve the accuracy of prediction.

Files

Tzitzikas_2017_TPDL.pdf

Files (884.6 kB)

Name Size Download all
md5:01291e32bc9fb3e3b0d2001320da9ca4
884.6 kB Preview Download

Additional details

Funding

European Commission
BlueBRIDGE – Building Research environments for fostering Innovation, Decision making, Governance and Education to support Blue growth 675680