Presentation Open Access
Both relational databases and XML have strengths and weaknesses as data storage and modelling systems. Most researchers working with Humanities historical and literary data would argue for the superiority of XML, since it allows unlimited nesting, linking, and complexity. RDB proponents claim superior querying and processing speed, although recent advances in XML languages and tools have eroded that advantage.
Nevertheless, RDBs remain popular, and many researchers seem instinctively to prefer them. Most DH programmers have encountered researchers who know little about databases or data modelling, but are nevertheless convinced that what they need and must have for their project is a database. Databases are somehow compelling and attractive in a way that XML is not. Perhaps the familiarity of tabular data representations is comforting; maybe forcing data into constrained representations seems to constitute mastering it somehow.
So, sometimes against our better judgement or advice, a project may end up with both an RDB and an XML document collection, and programmers must then integrate these distinct forms of data when building project outputs. This presentation discusses the Digital Victorian Periodical Poetry (DVPP) project, where metadata about 15,000 poems from nineteenth-century periodicals is captured in a MySQL database, and periodically exported to create a TEI file for each poem. Many of the poems are then transcribed and encoded. The canonical source of metadata is the RDB, while the canonical source of textual data is the TEI file. Metadata in the TEI files must be periodically updated from the RDB, without disturbing the textual encoding. Changes to the RDB data may result in changes to the id and filename of the related TEI file, so any existing TEI data is migrated to a new file, and the SVN repository must be appropriately updated. All of this is done with XSLT and Ant.