Published February 19, 2015 | Version Final
Project deliverable Open

LoCloud D2.6: Crawler ready tagging tools

  • 1. Asplan Viak Internet AS (AVINET)

Description

The LoCloud Crawler Ready Tagging Tools (henceforth CRTT) are a set of experimental tools for automatically extracting structured metadata from HTML mark-up loaded from web documents.  The objective is to verify if the crawling/indexing method applied by the mainstream search engines could be a viable, simplified supplement to the comprehensive Europeana ingestion process. To this end, the CRTT have been validated using small institutions as a test case.

This deliverable describes the rationale, technology, validation testing and next steps for the LoCloud CRTT.

Notes

LoCloud was funded by the European Commission's ICT Policy Support Programme. Grant Agreement number: 325099

Files

LoCloud-D2.6_Crawler_ready_tagging_tools.pdf

Files (2.9 MB)

Name Size Download all
md5:2d841a9925fe60a4768eed2354650dcf
2.9 MB Preview Download