Published August 30, 2019 | Version 1.1.0
Annotation collection Open

Asian Directories: Foreign Residents Benchmark Dataset

  • 1. Data Futures GmbH, University of Westminster
  • 2. Europa Institute Basel
  • 1. University of Basel, Data Futures
  • 2. Data Futures
  • 3. University of Basel


This record comprises a benchmark dataset based on the listings of foreign residents in the 1896, 1899 and 1934, 1937 volumes of the Asian Directories & Chronicles serial, which was published annually by The Hong Kong Daily Press between 1863 and 1941. With the current exceptions of 1866, 1867, 1872, 1875 and 1884 all of the volumes of the Directories & Chronicles have been assembled by the Europa Institute at the University of Basel. In a collaboration with Data Futures, high-resolution digitization of the pages of the volumes and analysis of OCR data has enabled automated detection of each person record in the foreign resident listings and generation of 60,712 annotations. The latter are represented here as OADM, a precursor of WADM which is currently in widespread use, although this dataset will be upgraded to WADM as soon as more applications emerge which support it. The OCR text has subsequently been corrected and tokenized with the aid of surname and location dictionaries created from the corpus, to produce searchable person 'instance' data using the schema at and attached hereto.

The years selected for this benchmark serve the dual purposes of developing dynamic dictionaries for automating correction and tokenization of the remaining volumes of the serial, and they are also pivotal in relation to historic events in East Asia. The First Sino-Japanese War, waged from July 1894 to April 1895, was followed by establishment of large numbers of small communities of foreign residents throughout East Asia. In the early 20th century consolidation of foreign residents in larger communities in coastal cities was followed by a marked exodus during escalating conflict in the Second Sino-Japanese War between July 1937 and September 1945, which some sources date back to the Japanese invasion of Manchuria in 1931. These population shifts are visible when the benchmark dataset is rendered on to maps, as shown in the PDF file attached here. The annotations and instance data are presented at as an Invenio repository, enabling searching by person and location, as well as viewing person instances in context of the page of the serial where the person was listed—via Mirador and a IIIF service.



Swiss National Science Foundation grant 100011_184860/1 "Divisive Power of Citizenship"



Files (240.5 MB)

Name Size Download all
5.4 MB Preview Download
13.6 MB Preview Download
40.4 MB Preview Download
15.7 MB Preview Download
46.5 MB Preview Download
19.1 MB Preview Download
54.3 MB Preview Download
11.5 MB Preview Download
34.0 MB Preview Download
3.0 kB Preview Download