Annotation collection Open Access

Asian Directories: Foreign Residents Benchmark Dataset

Cornwell, Peter J; Herren-Oesch, Madeleine

DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="" xmlns="" xsi:schemaLocation="">
  <identifier identifierType="DOI">10.5281/zenodo.2580998</identifier>
      <creatorName>Cornwell, Peter J</creatorName>
      <givenName>Peter J</givenName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="">0000-0001-5178-7577</nameIdentifier>
      <affiliation>University of Westminster, Data Futures</affiliation>
      <creatorName>Herren-Oesch, Madeleine</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="">0000-0002-6905-8775</nameIdentifier>
      <affiliation>Europa Institute Basel</affiliation>
    <title>Asian Directories: Foreign Residents Benchmark Dataset</title>
    <date dateType="Issued">2019-08-30</date>
  <resourceType resourceTypeGeneral="Collection"/>
    <alternateIdentifier alternateIdentifierType="url"></alternateIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.2580997</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf"></relatedIdentifier>
    <rights rightsURI="">Creative Commons Attribution Non Commercial Share Alike 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
    <description descriptionType="Abstract">&lt;p&gt;This record comprises a benchmark dataset based on the listings of foreign residents in the 1896, 1899 and 1934, 1937 volumes of the Asian Directories &amp;amp; Chronicles serial, which was published annually by &lt;em&gt;The Hong Kong Daily Press&lt;/em&gt; between 1863 and 1941. With the current exceptions of 1866, 1867, 1872, 1875 and 1884 all of the volumes of the Directories &amp;amp; Chronicles have been assembled by the Europa Institute at the University of Basel. In a collaboration with Data Futures, high-resolution digitization of the pages of the volumes and analysis of OCR data has enabled automated detection of each person record in the foreign resident listings and generation of 60,712 annotations. The latter are represented here as OADM, a precursor of WADM which is currently in widespread use, although this dataset will be upgraded to WADM as soon as more applications emerge which support it. The OCR text has subsequently been corrected and tokenized with the aid of surname and location dictionaries created from the corpus, to produce searchable person &amp;#39;instance&amp;#39; data using the schema at and attached hereto.&lt;/p&gt;

&lt;p&gt;The years selected for this benchmark serve the dual purposes of developing dynamic dictionaries for automating correction and tokenization of the remaining volumes of the serial, and they are also pivotal in relation to historic events in East Asia. The First Sino-Japanese War, waged from July 1894 to April 1895, was followed by establishment of large numbers of small communities of foreign residents throughout East Asia. In the early 20th century consolidation of foreign residents in larger communities in coastal cities was followed by a marked exodus during escalating conflict in the Second Sino-Japanese War between July 1937 and September 1945, which some sources date back to the Japanese invasion of Manchuria in 1931. These population shifts are visible when the benchmark dataset is rendered on to maps, as shown in the PDF file attached here. The annotations and instance data are presented at&lt;a href=""&gt;;/a&gt; as an Invenio repository, enabling searching by person and location, as well as viewing person instances in context of the page of the serial where the person was listed&amp;mdash;via Mirador and a IIIF service.&lt;/p&gt;

    <description descriptionType="Other">Swiss National Science Foundation grant 100011_184860/1 "Divisive Power of Citizenship"</description>
All versions This version
Views 564560
Downloads 459451
Data volume 3.4 GB3.3 GB
Unique views 465463
Unique downloads 366362


Cite as