Readme.txt for 'Historically Irish Surnames' dataset. Documentation written on 22 July 2015, London UK by Adam Crymble (adam.crymble@gmail.com; http://adamcrymble.org). Data Creation occurred between November 2010 and June 2013. With thanks to Katrina Navickas for reading and providing comments on this documentation. _License_: I release the following documents under a creative commons 'CC-BY 4.0' license: * Readme.txt (this document) * IrishSurnamesList.csv (the data) * RootIrishSurnamesAndVariants.csv (the data in another format) _Dataset Citation_: Anyone publishing academically or commercially based on research or work conducted with this dataset in whole or in part is asked to credit the author with the following citation and an appropriate URL: Adam Crymble, 'Historically Irish Surnames Dataset', _Zenodo_ (2015), DOI: 10.5281/zenodo.20985. _Project Description_: This dataset provides a list of surnames that are reliably Irish and that can be used for identifying textual references to Irish individuals in the London area and surrounding countryside within striking distance of the capital. This classification of the Irish necessarily includes the Irish-born and their descendants. The dataset has been validated for use on records up to the middle of the nineteenth century, and should only be used in cases in which a few mis-classifications of individuals would not undermine the results of the work, such as large-scale analyses. These data were created through an analysis of the 1841 Census of England and Wales, and validated against the Middlesex Criminal Registers (National Archives HO 26) and the Vagrant Lives Dataset (Crymble, Adam et al. (2014). Vagrant Lives: 14,789 Vagrants Processed by Middlesex County, 1777-1786. Zenodo. 10.5281/zenodo.13103). The sample was derived from the records of the Hundred of Ossulstone, which included much of rural and urban Middlesex, excluding the City of London and Westminster. The analysis was based upon a study of 278,949 adult males. Full details of the methodology for how this dataset was created can be found in the following article, and anyone intending to use this dataset for scholarly research is strongly encouraged to read it so that they understand the strengths and limits of this resource: Adam Crymble, 'A Comparative Approach to Identifying the Irish in Long Eighteenth Century London', _Historical Methods: A Journal of Quantitative and Interdisciplinary History_, vol. 48, no. 3 (2015): 141-152. The data here provided includes all 283 names listed in Appendix I of the above paper, but also an additional 209 spelling variations of those root surnames, for a total of 492 names. _Abstract_: Historians seeking to identify the Irish have overwhelmingly relied upon nominal record linkage, thus limiting studies to periods and contexts in which corroborating records exist. Surname analysis provides an alternative; a subset of 283 Irish surnames was able to correctly isolate 40% of known Irish individuals across thousands of entries - sufficient for sampling the Irish in demographic studies. This conclusion was based on an analysis of 278,949 names from the London area in the 1841 census, and was tested and refined against 42,248 historical records pertaining to the poor in London between 1777 and 1820. _Keywords_: data mining, demographic history, surname, Ireland, London, quantitative history. _Data Files_: The complete datasets are provided in two different formats in separate files: ##'IrishSurnamesList.csv' This file contains all 492 names in a comma separated values (CSV) file. There is one name per row with seven columns of information. _Description of Data Columns_ __Surname__ The Irish surname, converted to lower-case and all punctuation removed (eg, O'Brien = obrien). __1841 Census: Irish__ The number of Irish-born adult males in the 1841 Census of England and Wales sample with this surname. __1841 Census: Total__ The total number of adult males in the 1841 Census of England and Wales sample with this surname. __1841 Census: % Irish__ The percentage of adult males in the 1841 Census of England and Wales sample with this surname who were Irish-born. __Middlesex Criminal Registers (HO 26): Irish__ The number of Irish-born individuals in the Middlesex Criminal Registers (1801-1805) with this surname. __Middlesex Criminal Registers (HO 26): Total__ The total number of individuals in the Middlesex Criminal Registers (1801-1805) with this surname. __Root Surname__ The 'root surname' - that is the most common spelling variant of this name as found in the 1841 Census of England and Wales sample. This is useful for identifying individuals whose name may be spelled differently in various records. ##'RootIrishSurnamesAndVariants.csv' This CSV file is structured around the 281 root surnames and provides the same information as the previous file, but in a different format. There is one root surname per row with five columns of information. _Description of Data Columns_ __Root Surname__ The 'root surname' - that is the most common spelling variant of this name as found in the 1841 Census of England and Wales sample. This is useful for identifying individuals whose name may be spelled differently in various records. __1841 Census: Irish__ The number of Irish-born adult males in the 1841 Census of England and Wales sample with this surname. __1841 Census: Total__ The total number of adult males in the 1841 Census of England and Wales sample with this surname. __1841 Census: % Irish__ The percentage of adult males in the 1841 Census of England and Wales sample with this surname who were Irish-born. __Surname root and known variants__ All of the names identified as useable variants of the root surname.