Published February 28, 2009 | Version v1
Journal article Open

Fingerprinting Lexical Contexts over the Web

  • 1. Polytechnic of Bari, Taranto, Italy

Description

In this paper a novel technique for identifying lexical contexts in web resources is presented. The basic idea is to consider web site anchortexts as lexicalized descriptions of an individual ontology organized in the form of a graph of concept words. In the search for peculiar semantic patterns, the concept of web minutia (transposed from the forensic domain) is introduced. The proposed technique consists in searching for web minutiae in the analyzed web sites by means of a golden ontology. Web minutiae act as fingerprints for context-specific web resources; in this sense they are a powerful computational tool to identify and categorize the Web. The WordNet database has been used as golden ontology for our experiments on English web documents. WordNet allows for indexing and retrieving word senses and inter-word taxonomical relations like hyponymy and hypernymy. It has proven to be an efficient mediator between web ontologies and context-dependent taxonomies. Our experiments have been carried out on a preliminary data set of several tens of thousand links taken by web sites of thirteen UK universities. Preliminary results seem to confirm the ability of web minutiae to identify lexical contexts across the Web.

Files

jucs_article_29342.pdf

Files (320.7 kB)

Name Size Download all
md5:f30e7ad3c915ecd6e125324d21831f90
320.7 kB Preview Download