There is a newer version of this record available.

Dataset Open Access

A controlled vocabulary defining the semantic perimeter of Sustainable Development Goals

Duran-Silva, Nicolau; Fuster, Enric; Massucci, Francesco Alessandro; Quinquillà, Arnau

MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="">
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Sustainable Development Goals</subfield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Controlled vocabulary</subfield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Text indexing</subfield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Natural Language Processing</subfield>
  <controlfield tag="005">20201022121755.0</controlfield>
  <controlfield tag="001">3567769</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">SIRIS Academic</subfield>
    <subfield code="a">Fuster, Enric</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">SIRIS Academic</subfield>
    <subfield code="a">Massucci, Francesco Alessandro</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">SIRIS Academic</subfield>
    <subfield code="a">Quinquillà, Arnau</subfield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">206519</subfield>
    <subfield code="z">md5:699d2d1c436cb29f67e94a15f4260e63</subfield>
    <subfield code="u"> [zenodo].xlsx</subfield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2019-12-09</subfield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire_data</subfield>
    <subfield code="o"></subfield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">SIRIS Academic</subfield>
    <subfield code="a">Duran-Silva, Nicolau</subfield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">A controlled vocabulary defining the semantic perimeter of Sustainable Development Goals</subfield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u"></subfield>
    <subfield code="a">Creative Commons Attribution Share Alike 4.0 International</subfield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2"></subfield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;A set of controlled terms that define the scope and breadth of &lt;a href=""&gt;Sustainable Development Goals (SDGs) as defined by the United Nations&lt;/a&gt;.&amp;nbsp; These terms may be used to tag and index textual records in accordance with SDGs.&lt;/p&gt;

&lt;p&gt;The vocabulary is constructed by means of the following steps:&lt;/p&gt;

	&lt;li&gt;An initial set of terms per SDG target is built by extracting key terms from the UN official list of Goals, Targets and Indicators&lt;/li&gt;
	&lt;li&gt;The list is manually enriched by performing a review of the literature produced around SDGs and by compiling lists of pertinent words per Target mentioned by the reviewed documents&lt;/li&gt;
	&lt;li&gt;A reference textual corpus is downloaded by searching for the initial set terms defined at step 1. and 2. The corpus is used to train a Word2Vec word embedding model (a machine learning model based on neural networks).&lt;/li&gt;
	&lt;li&gt;The terms&amp;rsquo; list is then enriched by means of automatic methods, which are run in parallel:
		&lt;li&gt;The trained Word2Vec model is used to select, among the indexed keywords of the reference corpus, all terms &amp;ldquo;semantically close&amp;rdquo; to the initial set of words. This step is carried out to select terms that might not appear in the texts themselves, but that were deemed pertinent to label the textual records.&lt;/li&gt;
		&lt;li&gt;Further terms that are mentioned in the texts of the reference corpus and that are valued by the trained Word2Vec model as &amp;ldquo;semantically close&amp;rdquo; to the initial set of words are also retained. This step is performed to include in the controlled vocabulary a series of terms that are related to the focus of the SDGs and which are used by practitioners.&lt;/li&gt;
		&lt;li&gt;An automated algorithm is used to retrieve, from the APIs of WikiPedia a series of terms that have some categorical relationships (i.e. those that are indexed as &amp;ldquo;a broader concept of&amp;rdquo;, or &amp;ldquo;equivalent to&amp;rdquo; in DBpedia) with the initial set of words.&lt;/li&gt;
	&lt;li&gt;The final list produced by steps 1-4 s finally manually revised&lt;/li&gt;
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.3567768</subfield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.3567769</subfield>
    <subfield code="2">doi</subfield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
All versions This version
Views 1,3541,195
Downloads 295261
Data volume 59.7 MB53.9 MB
Unique views 1,1651,054
Unique downloads 274247


Cite as