Presentation Open Access

How we tripled our encoding speed in the Digital Victorian Periodical Poetry project

Holmes, Martin; Fralick, Kaitlyn; Fukushima, Kailey; Karlson, Sarah


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nam##2200000uu#4500</leader>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  </datafield>
  <controlfield tag="005">20200120172017.0</controlfield>
  <controlfield tag="001">3449241</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Victoria</subfield>
    <subfield code="a">Fralick, Kaitlyn</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Victoria</subfield>
    <subfield code="a">Fukushima, Kailey</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Victoria</subfield>
    <subfield code="a">Karlson, Sarah</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">2236806</subfield>
    <subfield code="z">md5:444e60f0b72bf7ffcdb58bf0d1d7adc5</subfield>
    <subfield code="u">https://zenodo.org/record/3449241/files/encoding_speed.pdf</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2019-09-19</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire</subfield>
    <subfield code="p">user-tei2019</subfield>
    <subfield code="o">oai:zenodo.org:3449241</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">University of Victoria HCMC</subfield>
    <subfield code="0">(orcid)0000-0002-3944-1116</subfield>
    <subfield code="a">Holmes, Martin</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">How we tripled our encoding speed in the Digital Victorian Periodical Poetry project</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-tei2019</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;The Digital Victorian Periodical Poetry (DVPP) project is a SSHRC-funded digital humanities&lt;br&gt;
project based at the University of Victoria. With the guidance of principal investigator Dr. Alison&lt;br&gt;
Chapman, the DVPP team is creating a digital index of British periodical poetry from the long&lt;br&gt;
nineteenth century. In addition to uncovering periodical poems, writing descriptive metadata, and&lt;br&gt;
compiling prosopographical research, we are currently using TEI and CSS to encode a statistically-&lt;br&gt;
representative sample of indexed poems, looking for quantitative evidence of literary change over&lt;br&gt;
time. Such an endeavour requires a large, robust dataset covering a range of periodicals throughout&lt;br&gt;
the period.&lt;br&gt;
At the time of writing, there are more than 13,000 poems in the database, and we expect that total&lt;br&gt;
to reach 20,000. Of these, around 2,000 will be encoded, focusing on the decade years (1820, 1830,&lt;br&gt;
1840, and so on).&lt;br&gt;
Journal of the Text Encoding Initiative,&lt;br&gt;
1How we tripled our encoding speed in the Digital Victorian Periodical Project&lt;br&gt;
In this presentation, we will showcase the various strategies and tools we have used to speed up&lt;br&gt;
our encoding process. We combine simple tricks like keyboard shortcuts with more sophisticated&lt;br&gt;
processes to minimize drudgery and increase accuracy. Among the more interesting techniques&lt;br&gt;
are:&lt;br&gt;
&amp;bull; Auto-tagging of a complete poem in lines and linegroups using a Schematron QuickFix;&lt;br&gt;
&amp;bull; Use of advanced CSS selectors in the rendition/@selector attribute to reduce encoding&lt;br&gt;
clutter in the poem itself;&lt;br&gt;
&amp;bull;&lt;br&gt;
A keyboard shortcut to tag rhymes which detects whether the tagged text is a masculine&lt;br&gt;
or feminine rhyme and provides the appropriate attribute value;&lt;br&gt;
&amp;bull;&lt;br&gt;
Auto-detection of cases where a new line-end rhymes with a previously-encoded rhyme,&lt;br&gt;
and should, therefore, be labelled to match it, leveraging our growing dataset of nearly&lt;br&gt;
30,000 rhymes;&lt;br&gt;
&amp;bull;&lt;br&gt;
Instant access to to a rendering of the poem which provides a visualization of the rhyme&lt;br&gt;
structure, auto-detection of anaphora, epistrophe and other refrain-like forms, and other&lt;br&gt;
diagnostic feedback.&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.3449240</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.3449241</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">presentation</subfield>
  </datafield>
</record>
539
66
views
downloads
All versions This version
Views 539539
Downloads 6666
Data volume 147.6 MB147.6 MB
Unique views 524524
Unique downloads 6464

Share

Cite as