{ "access": { "embargo": { "active": false, "reason": null }, "files": "public", "record": "public", "status": "open" }, "created": "2021-02-17T17:41:09.158156+00:00", "custom_fields": {}, "deletion_status": { "is_deleted": false, "status": "P" }, "files": { "count": 1, "enabled": true, "entries": { "D3.4 Linguistic Pipelines for Semantic Enrichment v.3 (Submitted to EC).pdf": { "checksum": "md5:081f41d86f82f782fa4d2f0e163f4891", "ext": "pdf", "id": "53ad10b8-ffcc-4a87-91eb-58a6a102e73d", "key": "D3.4 Linguistic Pipelines for Semantic Enrichment v.3 (Submitted to EC).pdf", "metadata": null, "mimetype": "application/pdf", "size": 2965288 } }, "order": [], "total_bytes": 2965288 }, "id": "4546049", "is_draft": false, "is_published": true, "links": { "access": "https://zenodo.org/api/records/4546049/access", "access_links": "https://zenodo.org/api/records/4546049/access/links", "access_request": "https://zenodo.org/api/records/4546049/access/request", "access_users": "https://zenodo.org/api/records/4546049/access/users", "archive": "https://zenodo.org/api/records/4546049/files-archive", "archive_media": "https://zenodo.org/api/records/4546049/media-files-archive", "communities": "https://zenodo.org/api/records/4546049/communities", "communities-suggestions": "https://zenodo.org/api/records/4546049/communities-suggestions", "doi": "https://doi.org/10.5281/zenodo.4546049", "draft": "https://zenodo.org/api/records/4546049/draft", "files": "https://zenodo.org/api/records/4546049/files", "latest": "https://zenodo.org/api/records/4546049/versions/latest", "latest_html": "https://zenodo.org/records/4546049/latest", "media_files": "https://zenodo.org/api/records/4546049/media-files", "parent": "https://zenodo.org/api/records/4546048", "parent_doi": "https://zenodo.org/doi/10.5281/zenodo.4546048", "parent_html": "https://zenodo.org/records/4546048", "requests": "https://zenodo.org/api/records/4546049/requests", "reserve_doi": "https://zenodo.org/api/records/4546049/draft/pids/doi", "self": "https://zenodo.org/api/records/4546049", "self_doi": "https://zenodo.org/doi/10.5281/zenodo.4546049", "self_html": "https://zenodo.org/records/4546049", "self_iiif_manifest": "https://zenodo.org/api/iiif/record:4546049/manifest", "self_iiif_sequence": "https://zenodo.org/api/iiif/record:4546049/sequence/default", "versions": "https://zenodo.org/api/records/4546049/versions" }, "media_files": { "count": 0, "enabled": false, "entries": {}, "order": [], "total_bytes": 0 }, "metadata": { "creators": [ { "affiliations": [ { "name": "Sirma AI" } ], "person_or_org": { "family_name": "Todor Primov", "name": "Todor Primov", "type": "personal" } }, { "affiliations": [ { "name": "Sirma AI" } ], "person_or_org": { "family_name": "Andrey Avramov", "name": "Andrey Avramov", "type": "personal" } }, { "affiliations": [ { "name": "Sirma AI" } ], "person_or_org": { "family_name": "Nikola Rusinov", "name": "Nikola Rusinov", "type": "personal" } }, { "affiliations": [ { "name": "Sirma AI" } ], "person_or_org": { "family_name": "Vladimir Alexiev", "name": "Vladimir Alexiev", "type": "personal" } } ], "description": "
This deliverable is the third report on the progress of T3.4 Semantic Enrichment. It aims to describe the practical application of advanced text analytics pipelines used to extract and semantically annotate information from unstructured textual data sources from the Big Data Grapes (BDG) data pool. The report describes the practical approach of designing a source knowledge graph for wine and wine review related information; semantic data fusion with basic ontologies and thesauri of relevant terminologies from the BigDataGrapes data pool; designing named entity recognition pipelines for data extraction public wine reviews and configuration of semantic search on top of the annotated content. The demonstrated approach is generic and can be applied on any type of unstructured content (research publications, news articles, patent data, trials reports, food quality reports, etc) using any of the available in the BDG data pool terminologies (sensor data, wine varieties, etc) or any other data set available in the linked open data (LOD) cloud.
\n\nThe work reported in the first version of the deliverable (Version 1 of D3.4 - Linguistic Pipelines for Semantic Enrichment, reported in M12 of BDG project) was focused mostly on setting up the overall semantic enrichment workflow that must be followed, covering domain modeling; building a core knowledge graph to support the semantic enrichment; development and customization of NLP pipeline components; post-processing of the annotation schema into a corresponding RDF representation.
\n\nThe second reported period (Version 2 of D3.4 - Linguistic Pipelines for Semantic Enrichment, in M24 of BDG project) was planned to apply the generic semantic enrichment approach on a concrete use case and to demonstrate how end users can benefit of using semantic enrichment to navigate and browse through large sample linked data set (described in Version 2 of D4.3 - Models and Tools for Predictive Analytics over Extremely Large Datasets reported in M15 of BDG project).
\n\nThe current work describes improvements implemented in the semantic enrichment of the data set used in Version 2 of D3.4 - Linguistic Pipelines for Semantic Enrichment including 1) extraction and filtering of grape, wine and food concepts from the data set; 2) semantic enrichment of wine reviews textual fields with these concepts and 3) improvement of the semantic search building new search indices over the semantically enriched wine reviews.
\n\nIn addition to the work related to the Wine Search demonstrator was developed a PubMed Central web crawler that can be configured to download fresh relevant content for research related to wine, antioxidants and other relevant bioactive compounds. The content is then processed by a text analysis pipeline which identifies instances of organic compounds of interest for the project and classify them to functional groups of compounds (e.g. flavonoids, glycosides, etc).
", "funding": [ { "award": { "acronym": "BigDataGrapes", "id": "00k4n6c32::780751", "identifiers": [ { "identifier": "https://cordis.europa.eu/projects/780751", "scheme": "url" } ], "number": "780751", "program": "H2020", "title": { "en": "Big Data to Enable Global Disruption of the Grapevine-powered Industries" } }, "funder": { "id": "00k4n6c32", "name": "European Commission" } } ], "languages": [ { "id": "eng", "title": { "en": "English" } } ], "publication_date": "2020-11-27", "publisher": "Zenodo", "resource_type": { "id": "publication-deliverable", "title": { "de": "Projektergebnis", "en": "Project deliverable" } }, "rights": [ { "description": { "en": "The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited." }, "icon": "cc-by-icon", "id": "cc-by-4.0", "props": { "scheme": "spdx", "url": "https://creativecommons.org/licenses/by/4.0/legalcode" }, "title": { "en": "Creative Commons Attribution 4.0 International" } } ], "title": "BigDataGrapes D3.4 - Linguistic Pipelines for Semantic Enrichment", "version": "3.0" }, "parent": { "access": { "owned_by": { "user": 197726 } }, "communities": { "default": "ac1a29f8-93dc-4733-ae73-ab3eeb9c7f90", "entries": [ { "access": { "member_policy": "open", "record_policy": "open", "review_policy": "open", "visibility": "public" }, "children": { "allow": false }, "created": "2018-06-27T22:24:39.054543+00:00", "custom_fields": {}, "deletion_status": { "is_deleted": false, "status": "P" }, "id": "ac1a29f8-93dc-4733-ae73-ab3eeb9c7f90", "links": {}, "metadata": { "curation_policy": "All public documentation and/or dissemination material produced by the BigDataGrapes project will be included under this community. Any material either not belonging to the above category or characterized as restricted will be declined.
\r\n", "page": "The BigDataGrapes community includes most of the public deliverables, documents and dissemination material created through the H2020 BigDataGrapes Project (http://www.bigdatagrapes.eu/), which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780751
", "title": "BigDataGrapes Project" }, "revision_id": 0, "slug": "bigdatagrapes", "updated": "2018-06-27T22:24:39.218097+00:00" }, { "access": { "member_policy": "open", "record_policy": "open", "review_policy": "open", "visibility": "public" }, "children": { "allow": false }, "created": "2022-11-23T15:53:29.436323+00:00", "custom_fields": {}, "deletion_status": { "is_deleted": false, "status": "P" }, "id": "f0a8b890-f97a-4eb2-9eac-8b8a712d3a6c", "links": {}, "metadata": { "curation_policy": "", "description": "", "page": "", "title": "EU" }, "revision_id": 0, "slug": "eu", "updated": "2022-11-23T15:53:29.436331+00:00" } ], "ids": [ "ac1a29f8-93dc-4733-ae73-ab3eeb9c7f90", "f0a8b890-f97a-4eb2-9eac-8b8a712d3a6c" ] }, "id": "4546048", "pids": { "doi": { "client": "datacite", "identifier": "10.5281/zenodo.4546048", "provider": "datacite" } } }, "pids": { "doi": { "client": "datacite", "identifier": "10.5281/zenodo.4546049", "provider": "datacite" }, "oai": { "identifier": "oai:zenodo.org:4546049", "provider": "oai" } }, "revision_id": 3, "stats": { "all_versions": { "data_volume": 103785080.0, "downloads": 35, "unique_downloads": 33, "unique_views": 42, "views": 48 }, "this_version": { "data_volume": 103785080.0, "downloads": 35, "unique_downloads": 33, "unique_views": 42, "views": 48 } }, "status": "published", "updated": "2021-02-18T00:27:23.084945+00:00", "versions": { "index": 1, "is_latest": true } }