Dataset Open Access

Treatment of American National Archives Records of World War II Prisoners of War (0326).

Cornwell, Peter J; Herren-Oesch, Madeleine

Data manager(s)
Granville, Daniel

NARA PoW Data W.D. A.G.O. FORM NO. 0326.

This deposit contains a dataset relating to persons interned between December 7, 1941 and November 19, 1946, which has been enhanced to make it more accessible to scientists. It is based on information from the U.S. National Archives and Records Administration (NARA), which is unrestricted and available at https://aad.archives.gov/aad/series-description.jsp?s=644&popup=Y, and informs this summary. The NARA 'series' is part of Record Group 389: Records of the Office of the Provost Marshal General. It identifies 79 'places of capture' globally "Using copies of reports from the International Committee of the Red Cross ...". The Scope & Content Note states:

"This series has information about U.S. military officers and soldiers and U.S. and some Allied civilians who were prisoners of war and internees. The record for each prisoner provides serial number, personal name, branch of service or civilian status, grade, date reported, race, state of residence, type of organization, parent unit number and type, place of capture (theater of war), source of report, status, detaining power, and prisoner of war or civilian internee camp site. Records of prisoners of the Japanese who died also document whether the prisoner was on a Japanese ship that sank or if he or she died during transport from the Philippine Islands to Japan. There are no records for some prisoners of war whose names appear in the lists or cables transmitted to the Office of the Provost Marshal General by the International Committee of the Red Cross."

The U.S. War Department used punched cards to manage this information, although "The punch card records were transferred to NARA with virtually no agency documentation." According to the Custodial History Note:

"The U.S. Army transferred punch card records of World War II prisoners of war (POWs) to NARA as a unique series in its 1959 transfer of all of the U.S. Army's Departmental Archives. In 1978 the Veterans Administration borrowed most of the punch card records of repatriated U.S. military personnel for a study of Repatriated U.S. Military Prisoners of War, migrated the data on almost all of the borrowed cards to an electronic format and returned the punch cards and two electronic records data files to NARA. In 1995 NARA migrated the data from almost all of the remaining punch card records to an electronic format and has subsequently preserved all of the records in a single data file."

It is evident that the organization of this data file assumes access to other information, also accessible in CSV files in the series, in order to interpret detailed information, such as branch of service, grade, parent unit number and detaining power. For example, records appear in the following format:

O&745255ABDALLAH EDWARD A       2 LT    G1AC 200803413223003620O7222094171035   
32214872ABDALLAH JOSEPH T       CPL     61INF10230241231100157069802075181087   
36336867ABDAY JOSEPH C          PVT     81INF10170231611100168069516075181004 

constituting a serial number, then a name, then a textual code for rank; followed by a string, (starting G1AC on the first line) which encodes the remaining information. For example, the first digit (G) can be looked up in cl_1279.csv to decode ‘2nd lieutenant’, corroborating in this case the appearance of '2 LT. 'AC' indicates 'armofservicecode: AIR CORPS', but less obviously, 'detainingpower: Germany'; 'race: White' and 'theater: European Theater: Germany'. This single line is the entirety of the information provided per person instance by the NARA series. Users of this potentially valuable resource must develop automation in order to be able to search and employ it effectively; no such tools or specification from which software might be developed immediately is provided.

Significantly, this task is hampered by evidence of corruption of the some of the information, which may be due solely to the digitization process mentioned above being applied to the paper records, but possibly with subsequent contribution of fixity effects. NARA documentation does not refer to data integrity issues and, especially since the dataset which NARA provides is large, it may only be during development of automation to employ the series that such issues are discovered. Examples of problems include substitution of characters, such as 'O' replacing '0' and vice-versa; '}' replacing '3' and '&' replacing '8', or less obviously 'L' mis-recognized as '-' and 'II' replacing 'H'.

12138003 AREY GERALD J          S SG    41AC 2002064123S55}340069802055181033

Ideally, access to high resolution scans of the paper documents could be used to address these issues, or external documents. However, checking for completeness of each of the components of a person record enables detection of compromised entries and, where character substitution affects decoding of key information, other contextual information is often available to validate decoding such strings with these characters re-substituted. The larger percentage of strings which already decode plausibly without intervention do not contain incidences of such characters (so there is strong evidence that they are invalid in particular positions.

Unfortunately, there is a proportion of digital records with more severe corruption which cannot be addressed without access to scanned imagery of the paper records, for example:

O&557875ANDREW THOMAS A         2 LT    G1AC 2011094115         70140
 6881276AFTEWICZ EDWARD L       PVT     81INF10150231321100135069508065181004
      6 APLIN -OR-&  -            3      1INF102    1             1 0  1 1

As of the initial date of this deposit is anticipated that such access will be possible to support further work on this series.

The dataset in this deposit does not contain records for which decoding is compromised to the extent that information to populate a basic person schema is incomplete. However, although 36,791 of the 143,374 person records in the NARA series were found to be compromised in some way, 19,624 of those have been substantially decoded and/or repaired and further work is being undertaken to both improve decoding of the 126,207 available here and to retrieve others among the 17,167 which are currently inaccessible.

This dataset has been enhanced to present the original NARA 'single data file' as a JSON resource which is more accessible for search and analysis, since each record is document-oriented (containing labels and values for each field, together with provenance information) for example:

{
    "$schema": "https://schemata.hasdai.org/historic-persons/historic-person-entry-v0.0.2.json",
    "location": [
      {
        "association": "military service",
        "transcription": "European Theater: Germany"
      },
      {
        "association": "interred",
        "transcription": "Stalag 2D Stargard Pomerania, Prussia 53-15"
      }
                ],
    "name": {
      "familyname": "AARON",
      "givenname": "JACK",
      "rank": "SGT",
      "transcription": "AARON JACK"
            },
    "set": {
      "id": "https://persons.freizo.org/export/pow/1.0.0",
      "partof": "10.5281/zenodo.3565392",
      "title": "WDAGO-0326"
           },
    "source": {
      "type": "data file"
              }
  },

The schema employed here serves a specific purpose, in addition to on-going work identifying and correcting errors in the NARA data: it supports work to discover other instances of persons appearing in this NARA series 0326, which also appear in external documentation. For example, in a separate collaborative project with Europa Institute at the University of Basel, a benchmark dataset has been produced based on listings of foreign residents in the Asian Directories and Chronicles, which forms a deposit at 10.5281/zenodo.2580997 and employs a compatible schema for the purpose of efficient comparison with this and other datasets. Other schemata could be employed for different purposes—leading to alternate datasets, all derived from series 0326. The full extent of information currently decoded from NARA series 0326 is presented at https://pow.freizo.org/ which provides search facilities by person name, plus interactive filters for person rank, service and theater of conflict.

Files (94.5 MB)
Name Size
historic-person-entry-v0.0.2.json
md5:3f18ce71862f223245db73a85d352338
2.8 kB Download
pow-20191219.json
md5:9b65b40ef8af97ab69271322bbbbe65c
94.5 MB Download
206
18
views
downloads
All versions This version
Views 206206
Downloads 1818
Data volume 755.8 MB755.8 MB
Unique views 162162
Unique downloads 1212

Share

Cite as