The Alliance of Digital Humanities Organizations The Association for Literary and Linguistic Computing The Association for Computers and the Humanities The Australasian Association for Digital Humanities centerNet The Society for Digital Humanities – Société pour l’étude des médias interactifs Digital Humanities 2012 Conference Abstracts University of Hamburg, Germany July 16–22, 2012 Hamburg University Press Publishing House of the Hamburg State and University Library Carl von Ossietzky Printed with the Support of the German Research Foundation Editor Jan Christoph Meister Editorial Assistant Katrin Schönert Editorial Staff Bastian Lomsché Wilhelm Schernus Lena Schüch Meike Stegkemper Technical Support Benjamin W. Bohl Daniel Röwenstrunk Bibliographic information published by the Deutsche Nationalbibliothek (German National Library). The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at https://portal.dnb.de/ The Deutsche Nationalbibliothek stores this online publication on its Archive Server. The Archive Server is part of the deposit system for long-term availability of digital publications. Also available open access on the Internet at: Hamburg University Press – http://hup.sub.uni-hamburg.de PURL: http://hup.sub.uni-hamburg.de/HamburgUP/DH2012_Book_of_Abstracts For an elaborated version of this Book of Abstracts with color photos and zoom function, please visit: www.dh2012.uni-hamburg.de ISBN 978-3-937816-99-9 (printed version) © 2012 Hamburg University Press, publishing house of the Hamburg State and University Library Carl von Ossietzky, Germany Cover design: Turan Usuk Cover illustration: Dagmar Schwelle/laif Printing house: Elbepartner, BuK! Breitschuh & Kock GmbH, Hamburg, Germany The 24th Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities and The 5th Joint International Conference of the Association for Literary and Linguistic Computing, the Association for Computers and the Humanities, the Society for Digital Humanities – Société pour l’étude des médias interactifs, for the first time organized together with the Australasian Association for Digital Humanities and centerNet International Programme Committee • • • • • • • • • • Susan Brown (SDH/SEMI – Vice Chair) Arianna Ciula (ALLC) Tanya Clement (ACH) Michael Eberle-Sinatra (SDH/SEMI) Dot Porter (ACH) Jan Rybicki (ALLC) Jon Saklofske (SDH/SEMI) Paul Spence (ALLC – Chair) Tomoji Tabata (ALLC) Katherine Walter (ACH) Local Organizing Committee • • • • • • • • • • • Peer Bolten Imke Borchers Evelyn Gius Mareike Höckendorff Bastian Lomsché Jan Christoph Meister Marco Petris Wilhelm Schernus Lena Schüch Katrin Schönert Meike Stegkemper Digital Humanities 2012 Welcome DH2012 from the Vice President Research, University of Hamburg Hans Siegfried Stiehl University of Hamburg, Germany With about 40,000 students, 680 professors and 4200 research staff, the University of Hamburg is one of Germany’s largest universities which comprises six schools: Law; Business, Economics and Social Sciences; Medicine; Education, Psychology and Human Movement; Humanities; Mathematics, Informatics and Natural Sciences. As of this spring, these schools are home to about 300 collaborative research projects funded by the German Research Foundation (DFG), the Federal Ministry of Education and Research and the Seventh Framework Programme of the European Commission. This includes 28 DFG-Collaborative Research Centres, DFG-Research Training Groups, and DFG-Research Groups, as well as the Excellence Cluster ‘Integrated Climate System Analysis and Prediction (CLISAP).’ From the mid-1990s on, researchers in the School of Humanities and the Department of Informatics at University of Hamburg began to explore the potential for cooperation in an emerging field then still referred to as ‘Computational Philology.’ Eventually in the School of Humanities the ‘Arbeitsstelle Computerphilologie’, one of the first institutions of its kind in Germany, was established. Today the use of eScience and eHumanities approaches and technology has become part of the daily routine of an ever rising number of scholars and students. ‘Digital Diversity: Cultures, Languages and Methods’, the motto for this year’s Digital Humanities conference, relates methodical and technical innovation to the traditional research agenda of the Humanities – a relation fostering the novel research paradigm DH that is of particular interest to University of Hamburg. Indeed, Digital Humanities methods play a vital role in some of our most advanced and prestigious research initiatives, such as the DFG-funded Collaborative Research Centre ‘Manuscript Cultures in Asia, Africa and Europe.’ In the context of this interdisciplinary research project the traditional focus on cultural diversity, which has been characteristic for our university from its very beginning in 1919, goes hand in hand with methodical and technical innovation. Projects like this demonstrate the relevance of spurring further exchange among the research paradigms of the humanities, informatics and of computational science. I am certain that the current conference is bound to make a significant contribution to further building bridges. The University of Hamburg is therefore delighted to host the Digital Humanities 2012 conference, and it gives me great pleasure to welcome you to our university as well as to the Free and Hanseatic City of Hamburg. For us, the DH 2012 is one of the most important academic events in this year’s calendar, and I wish the conference every success! VII Digital Humanities 2012 Chair of International Programme Committee Paul Spence Department of Digital Humanities, King’s College London, UK A recurring theme at Digital Humanities conferences in recent years has been the high number of submissions, and this year has continued the upward trend, with close to 400 submissions across the different categories: pre-conference workshops/tutorials, long papers, short papers, posters and multipaper sessions (including panels). I take this as a sign that the field continues to grow and develop, and the quality of the submissions this year certainly made the job of the International Programme Committee challenging (in a good way), although thanks to the excellent facilities provided by our Hamburg hosts we have been able to expand the conference to five strands this year, meaning that this year’s conference has more contributions, and by more participants, than most DH conferences in the past. The theme for this year’s conference was ‘Digital Diversity: Cultures, languages and methods’ and the conference schedule includes contributions on a wide range of topics, reflecting the increasing breadth in the field on all levels. The conference offers opportunities to explore new themes, acknowledges the increasing linguistic diversity of the field and reflects the growth of digital humanities centres and networks in new regions of the world. Both of our keynote speakers reflect on this diversity: Claudine Moulin will explore the challenges in developing interdisciplinary and transnational research structures, with particular consideration for the role of digital humanities; Masahiro Shimoda contemplates the relationship of the field to the wider humanities from a historical and cultural perspective. I would like to thank all those who submitted proposals this year and all those who agreed to act as reviewers – your contributions on both fronts ensured that the conference continues to reflect the very best of digital scholarship in the humanities at this moment in time. We enlarged our group of reviewers this year, both in anticipation of increased submissions and in a concerted effort to build on the good work of previous PC chairs in broadening the geographic coverage of our reviewer pool. I would like to give my thanks to the members of the International Programme Committee, who this year included: Arianna Ciula (ALLC), Tanya Clement (ACH), Michael Eberle-Sinatra (SDH-SEMI), Dot Porter (ACH), Jan Rybicki (ALLC), Jon Saklofske (SDH-SEMI), Tomoji Tabata (ALLC) and Katherine Walter (ACH). I would particularly like to thank the Vice Chair Susan Brown (SDH-SEMI) whose advice and good judgement were a great help throughout. Finally, I wish to thank the local organizers, in particular Jan Christoph Meister and Katrin Schönert, for their hard work and support in finding rooms, making practical suggestions and showing the energy and creativity which promise to make this an outstanding conference. IX Digital Humanities 2012 Welcome ashore! Jan Christoph Meister and Katrin Schönert University of Hamburg, Germany The use of maritime metaphor is tempting in Hamburg. Our harbor is 824 years old this year, counts no. 3 in Europe and ranks among the top 15 in the world. From the days of the Hanseatic League to present, it has been the main driver of local economy and become Germany’s ‘Gateway to the World’. However, this year the Freie und Hansestadt Hamburg, the Free and Hanseatic City of Hamburg, is also the port of call for digital humanists. A hearty ‘Welcome!’ from your local organizers – drop anchor and come ashore: This is where DH 2012 happens! Present day activity in Hamburg’s port is all about cargo, but until the mid-20th Century its piers were also lined by ocean steamers that carried immigrants to the New World. Does the exchange of goods come before or after the exchange of people and ideas? In our globalized world where cultures meet and mingle across all domains – commerce, education, politics, knowledge – the philosophical question of primacy seems a bit old fashioned. As 21st century humanists we will of course not deny the importance of the material realm, and as digital humanists the relevance of technological advancement is part of our credo anyhow. On the other hand, we are by definition traditionalists. For our ultimate interest rests with people and cultures, their past, their present and their future, and with the various symbolic languages – natural, textual, visual, musical, material, abstract, concrete or performative – through which we communicate and reflect our human concerns. These are still the essential questions that motivate our endeavor, whether digital or traditional. At the DH 2011 we gathered under the colorful California ‘Big Tent’ erected by our Stanford hosts – it was a marvelous experience to see how many we have become, how the field has grown into one of the most vibrant scientific communities, and how we collaborate in building bridges between the digital and the humanities. We speak one language! And at the same time, we speak many. We’re part of a unique intellectual culture that is only possible because of the multitude of human cultures that we come from and from which we contribute to our common cause. We carry intellectual cargo to a communal agora of ideas – ideas and concepts shipped from the many facetted cultural, philosophical, epistemological and methodological contexts and domains of the humanities. The annual DH conference is the biggest intellectual market place of our scientific community, and it is hard to imagine a venue where more could be on offer. Our conference motto ‘Digital Diversity – Cultures, languages and methods’ underlines that which motivates and, at the same time, makes this intellectual exchange worthwhile: Diversity. It took us some time to discover its worth: For about two decades, when DH was still called Humanities Computing, we discussed whether what we practiced would not perhaps justify the formation of a discipline. For some of us this seemed a desirable status to attain; for others not. However, in today’s perspective one thing is obvious: the reality of Digital Humanities cuts across traditional conceptions of carefully delineated disciplines and their traditional definition in terms of object domain, methodology, institution, degree course, journal, etc. Conceptually as well as institutionally, DH thrives on diversity, and what we do cannot be reduced to a single purpose, object domain or method. Conformity is absolutely not what DH is about, and it is puzzling why scholars outside our community still feel tempted to reduce DH to what the field itself has long transcended in theory as well as in practice: the mere application of computer technology. One of the aims of DH conferences is of course to show case the many facets of contemporary digital humanities. Still, not every traditional humanist will be easily convinced. On that score, the history of Hamburg and its university may perhaps offer a good example for how single-mindedness, attributed or professed, can in the end be nevertheless subverted and brought to fruit. Hamburg politics at the beginning of the 20th century was dominated by the interests of merchants, bankers, and owners of shipping companies (mind you, it still is). Intellectual capital, it was held, was cheaper bought in than locally produced; so why invest in a university? It was only in 1919 when Werner von Melle, First Mayor XI Digital Humanities 2012 and an ardent educationalist, convinced his colleagues in the Senate that their business – the exchange of goods and money between nations – would thrive if the Humanities’ business was looked after: the exchange of knowledge and ideas between cultures. The intellectual encounter with other cultures and languages through academic education, von Melle successfully argued, was necessary to sustain and further develop commerce with other nations. This eventually lead to the founding of the very university which, almost a century later, is today proud to host this DH conference. Our wish as local organizers is that the DH 2012 may present an equally persuasive example to those scholars who question the need to explore what at first glance might appear to be a foreign methodological paradigm. Today Hamburg University is Germany’s fifth largest in terms of student intake, and at the same time one which, true to the meaning of universitas, engages in teaching and research across the entire spectrum of the human, the social, the natural and the medical sciences. In the Faculty of the Humanities over one hundred degree courses are on offer, many of which focus on foreign languages and cultures. This diversity of cultures, languages and methods makes for an intellectual environment attracting over ten thousand students to the Humanities, that is close to one third of the university’s total student population. What if Digital Humanities became a topic of interest to each and every one of these students, if only in passing? And what if one would achieve this elsewhere too, at universities across the world? Ten years ago this would have sounded like a lunatic vision, but today DH certainly enjoys a strong increase in attention that has put its methodology on many people’s agenda. This development has been strongly supported by research funding agencies and institution building, such as the formation of ADHO and of new DH associations across the world. Incidentally, by the time you read this the inaugural meeting of the DHD as a German chapter should be under way: it is scheduled to take place immediately prior to the DH 2012, and also at Hamburg University. Perhaps even more impressive than the rise in external support is the tremendous internal dynamics of the DH community. The volume which you currently hold in hand (or read on screen) contains the abstracts of some 200 papers, panel sessions, posters and workshops. The Call for Papers attracted a record number of proposals, and even though the conference program was eventually extended to five parallel tracks, only half of the close to four hundred submissions could in the end be accommodated. The International Program Committee chaired by Paul Spence worked extremely hard in order to announce the results earlier than usual, but in the end our ambitious deadline could not be met: not only because of numbers involved, but more importantly because of the high quality of submissions which confronted reviewers with many hard choices. It is sad that so many excellent proposals could not be accepted, but this painstaking process also testifies to the very high standard that DH research has nowadays attained. As local organizers our deepest gratitude goes to everyone who submitted a proposal, whether successful or not, to the army of reviewers who dedicated their time to careful reviewing, and to Paul and the program committee for drawing up what is certainly one of the richest conference agendas in our field. Thanks are also due to ADHO’s conference coordinating committee and its chair, John Unsworth. The guidance that John and his team provide comes in two forms: That of carefully thought out and well documented protocols, and that of John’s concise and immediate response to a frantic organizers’ plea for help, mercy and redemption. The former is available to everyone via the ADHO website. The latter is reserved for those who have qualified through an act of folly committed about two years prior, an act usually referred to as a ‘conference bid’. However, once you find yourself on the spot you may rest assured that advice and support will be granted magnanimously. The conference as well as the program committee and their chairs will be with you all the way, as they were with us. Both Paul and John have served as local organizers themselves, and through ADHO invaluable ‘institutional knowledge’ such as theirs can be passed on to others – not only the type of knowledge that is formalized in protocols, but also the personal experiences that one needs to draw on in moments of crisis. This is where organization building provides tangible benefits to a scientific community as ours. Can one also say ‘Thank you!’ to a piece of software? It feels a bit like talking to your hammer, but then who of us knows the inventor of the hammer him- or herself? We do, however, know the person who invented ConfTool, the conference management software provided by ADHO which many previous DH organizers have used and which we have found to be absolutely indispensible. Incidentally, Harald Weinreich, its creator, lives and works in – Hamburg. (Honestly, we didn’t know that when we bid XII Digital Humanities 2012 for DH 2012!) While ConfTool came to our aid as an administrative backend, the design of the public image, of the DH2012 logo and website, were the work of Leon and his developers. Indeed, making a DH conference happen is like producing a movie, and at the end you wish you could run a long list of credits. Our list would include Basti, Benjamin, Daniel, Lena, Meike and Willi who took on the technical production of the Book of Abstracts; Marco who contributed sysad wizzadry and firefighting, and Evelyn and Imke who specialized in what is perhaps best and with intentional opacity referred to as ‘administrative diplomacy’. Many others deserve mentioning, but cannot be named here – so please just imagine those movie credits, accompanied by a suitably dramatic sound track (‘Don’t dream it’s over’ by Crowded House of 1986 will do just fine). To all of you whose names appear on that long, long list of names and who helped us make DH 2012 happen we extend a big, a VERY BIG thank you! An important goal of this year’s DH was to make sure that the conference motto would become meaningful in as many facets of conference reality as possible – through the composition of the academic and social program, through the choice of keynote speakers and their topics, and through regional representation in our team of international student conference assistants. Applications for the bursaries that were on offer were scrutinized by Elisabeth Burr, Chair of ADHO’s Multi Cultural and Multi Lingual committee, and her team of reviewers, and awards were then made to twelve young DH scholars from all over the world. Each of them partners with one Hamburg student and we trust that this experience will inspire them to engage in joint, international DH student projects in the future. To the award recipients as well as to their local partners we say: welcome! It’s great to have you on our team! This ambitious project in particular could not have been realized without the very generous financial support of the DH 2012 granted by the Deutsche Forschungsgemeinschaft (DFG), and the equally generous support granted by the University of Hamburg whose President, Prof. Dieter Lenzen and VicePresident Research, Prof. Hans Siegfried Stiehl, were both extremely supportive. A welcome note is not the place to talk money; suffice it to say that without this support, the conference fee would probably have trebled. Our gratitude to both institutions is to that order, and more. We also thank Google Inc. who sponsor part of the social program. In the light of the brief episode about our university’s history mentioned above we particularly thank the Senator of Science and Research, Dr. Dorothee Stapelfeldt, for inviting the DH 2012 conference attendees to a welcome reception at Hamburg’s City Hall. This is a great and exceptional honor not often bestowed on academic conferences in Hamburg, and it is a sign of acknowledgment of the role and importance of the humanities at large that is very much appreciated. And with that there’s only one more person we haven’t said ‘Thank you!’ to. That person is – you. It’s great you came and moored your vessel. Whether by boat or ship, the cargo you have brought to DH 2012 is what this convention is all about: The exchange of ideas. So discharge your bounty, come ashore, and help us hoist the banner of Digital Diversity at the DH 2012! Your local organizers Chris Meister, Katrin Schönert & team XIII Digital Humanities 2012 Obituary for Gerhard Brey (1954-2012) Harold Short Dept of Digital Humanities King’s College London, UK; Visiting Professor University of Western Sydney, Australia We mourn the loss of our friend and colleague Gerhard Brey, who died in February this year after a short illness. Gerhard was a remarkable man, with interests and expertise that bridged the humanities and the sciences, including computation. His humanistic background was extremely rich and varied. It included his fluency in several western European languages, including classical Greek and Latin, and also Sanskrit and Arabic. Part of the bridge was his interest and wide reading in the history of science. In terms of computation he developed considerable expertise across a range of programming languages and tools, from digital editing and typesetting to databases to text mining. Gerhard was born in a small German town near the Austrian border, close to Salzburg. His early academic and professional career was in Germany, mainly in Munich, apart from a year spent studying in France. This was personally as well as professionally important, because it was at the university in ClermontFerrand that he met Gill, his wife and life-long companion. In 1996 Gerhard and Gill moved to England, and in 2001 he began working with Dominik Wujastyk at the Wellcome Institute. (For a longer and more detailed obituary, see the one by Wujastyk on the ALLC website at allc.org.) Gerhard came to work at what is now the Department of Digital Humanities at King’s College London in 2004, first as a part-time consultant, then as Research Fellow and Senior Research Fellow. His wide-ranging expertise in both humanities disciplines and technical matters meant that he was not only directly involved in numerous research projects, but was also consulted by colleagues in many more. While it is right to acknowledge Gerhard’s formidable humanistic and technical range, it is the human being who is most sorely missed. He was deeply interested in people, and had a real gift for communication and engagement. We remember him especially for his calmness, patience, ready good humour and his self-deprecating sense of fun. As his head of department, I valued particularly his willingness to lend a hand in a crisis. It was also Gerhard who took the lead in nurturing links with Japanese colleagues that led to a joint research project with the University of Osaka, and in the last years of his life he developed a passionate interest in Japanese culture, drawn in part by the work of our colleagues in Buddhist Studies, which spoke to his love of Sanskrit and Arabic literatures. As a colleague remarked, his gentle, quiet manner concealed a fine mind and wider and deeper learning than even his friends tended to expect. He will be more sorely missed than any of us can say. XV Digital Humanities 2012 Bursary Winners DH 2012 Student Conference Assistant Bursaries Kollontai Cossich Diniz (University of Sao Paulo, Sao Paulo, Brazil) Peter Daengeli (National University of Ireland Maynooth, Dublin, Ireland) Elena Dergacheva (University of Alberta, Edmonton, Canada) Ali Kira Grotkowski (University of Alberta, Edmonton, Canada) Dagmara Hadyna (Jagiellonian University, Kielce, Poland) Biljana Djordje Lazic (University of Belgrade, Belgrade, Serbia) Yoshimi Iwata (Doshisha University, Kyoto, Japan) Jie Li Kwok (Dharma Drum Buddhist College, New Taipei City, Taiwan) Jose Manuel Lopez Villanueva (Universidad Nacional Autonoma de Mexico, Mexico City, Mexico) Alex Edward Marse (2CINEM, Atlanta, USA) Gurpreet Singh (Morph Academy, Fatehgarh Sahib, India) Rosa Rosa Souza Rosa Gomes (University of Sao Paulo, Sao Paulo, Brazil) XVII Digital Humanities 2012 Table of Contents List of Reviewers ........................................................................................................ 1 Plenary Sessions Dynamics and Diversity: Exploring European and Transnational Perspectives on Digital Humanities Research Infrastructures Moulin, Claudine ......................................................................................................................... 9 Embracing a Distant View of the Digital Humanities Shimoda, Masahiro .................................................................................................................... 11 Pre-conference Workshops Digital Methods in Manuscript Studies Brockmann, Christian; Wangchuk, Dorji ................................................................................. 15 Introduction to Stylomatic Analysis using R Eder, Maciej; Rybicki, Jan ........................................................................................................ 16 NeDiMAH workshop on ontology based annotation Eide, Øyvind; Ore, Christian-Emil; Rahtz, Sebastian ............................................................. 18 Service-oriented Architectures (SOAs) for the Humanities: Solutions and Impacts Hinrichs, Erhard; Neuroth, Heike; Wittenburg, Peter ........................................................... 20 Here and There, Then and Now – Modelling Space and Time in the Humanities Isaksen, Leif; Day, Shawn; Andresen, Jens; Hyvönen, Eero; Mäkelä, Eetu .......................... 22 Crowdsourcing meaning: a hands-on introduction to CLÉA, the Collaborative Literature Éxploration and Annotation Environment Petris, Marco; Gius, Evelyn; Schüch, Lena; Meister, Jan Christoph ..................................... 24 Learning to play like a programmer: web mash-ups and scripting for beginners Ridge, Mia ................................................................................................................................. 25 Introduction to Distant Reading Techniques with Voyant Tools, Multilingual Edition Sinclair, Stéfan; Rockwell, Geoffrey ........................................................................................ 26 Towards a reference curriculum for the Digital Humanities Thaller, Manfred ....................................................................................................................... 27 Free your metadata: a practical approach towards metadata cleaning and vocabulary reconciliation van Hooland, Seth; Verborgh, Ruben; De Wilde, Max ........................................................... 28 Panels Text Analysis Meets Text Encoding Bauman, Syd; Hoover, David; van Dalen-Oskam, Karina; Piez, Wendell ............................ 33 Designing Interactive Reading Environments for the Online Scholarly Edition Blandford, Ann; Brown, Susan; Dobson, Teresa; Faisal, Sarah; Fiorentino, Carlos; Frizzera, Luciano; Giacometti, Alejandro; Heller, Brooke; Ilovan, Mihaela; Michura, Piotr; Nelson, Brent; Radzikowska, Milena; Rockwell, Geoffrey; Ruecker, Stan; Sinclair, Stéfan; Sondheim, Daniel; Warwick, Claire; Windsor, Jennifer ............................................ 35 XIX Digital Humanities 2012 Developing the spatial humanities: Geo-spatial technologies as a platform for cross-disciplinary scholarship Bodenhamer, David; Gregory, Ian; Ell, Paul; Hallam, Julia; Harris, Trevor; Schwartz, Robert ......................................................................................................................................... 41 Prosopographical Databases, Text-Mining, GIS and System Interoperability for Chinese History and Literature Bol, Peter Kees; Hsiang, Jieh; Fong, Grace ............................................................................ 43 Future Developments for TEI ODD Cummings, James; Rahtz, Sebastian; Burnard, Lou; Bauman, Syd; Gaiffe, Bertrand; Romary, Laurent; Bański, Piotr ............................................................................................... 52 Compiling large historical reference corpora of German: Quality Assurance, Interoperability and Collaboration in the Process of Publication of Digitized Historical Prints Geyken, Alexander; Gloning, Thomas; Stäcker, Thomas ....................................................... 54 Computational models of narrative structure Löwe, Benedikt; Físseni, Bernhard; León, Carlos; Bod, Rens ................................................ 57 Approaches to the Treatment of Primary Materials in Digital Lexicons: Examples of the New Generation of Digital Lexicons for Buddhist Studies Nagasaki, Kiyonori; Tomabechi, Toru; Wangchuk, Dorji; Takahashi, Koichi; Wallman, Jeff; Muller, A. Charles ............................................................................................................. 61 Topic Modeling the Past Nelson, Robert K.; Mimno, David; Brown, Travis ................................................................. 64 Facilitating Research through Social-Document Networks Pitti, Daniel; Simon, Agnès; Vitali, Stefano; Arnold, Kerstin ................................................. 70 Digital Humanities as a university degree: The status quo and beyond Thaller, Manfred; Sahle, Patrick; Clavaud, Florence; Clement, Tanya; Fiormonte, Domenico; Pierazzo, Elena; Rehbein, Malte; Rockwell, Geoffrey; Schreibman, Susan; Sinclair, Stéfan .......................................................................................................................... 72 Papers Exploring Originality in User-Generated Content with Network and Image Analysis Tools Akdag Salah, Alkim Almila; Salah, Albert Ali; Douglass, Jeremy; Manovich, Lev ............... 79 Patchworks and Field-Boundaries: Visualizing the History of English Alexander, Marc ........................................................................................................................ 82 Developing Transcultural Competence in the Study of World Literatures: Golden Age Literature Glossary Online (GALGO) Alonso Garcia, Nuria; Caplan, Alison ..................................................................................... 84 Trees of Texts – Models and methods for an updated theory of medieval text stemmatology Andrews, Tara Lee; Macé, Caroline ........................................................................................ 85 Mapping the Information Science Domain Arazy, Ofer; Ruecker, Stan; Rodriguez, Omar; Giacometti, Alejandro; Zhang, Lu; Chun, Su ............................................................................................................................................... XX 88 Digital Humanities 2012 Words made Image. Towards a Language-Based Segmentation of Digitized Art Collections Armaselu, Florentina ................................................................................................................. 91 HisDoc: Historical Document Analysis, Recognition, and Retrieval Baechler, Micheal; Fischer, Andreas; Naji, Nada; Ingold, Rolf; Bunke, Horst; Savoy, Jacques ....................................................................................................................................... 94 Research infrastructures for Digital Humanities: The local perspective Bärenfänger, Maja; Binder, Frank .......................................................................................... 97 Pelagios: An Information Superhighway for the Ancient World Barker, Elton; Simon, Rainer; Isaksen, Leif ........................................................................... 99 Putting TEI Tite to use – generating a database resource from a printed dictionary or reference type publication Barner-Rasmussen, Michael ................................................................................................... 102 Digital Humanities in the Classroom: Introducing a New Editing Platform for Source Documents in Classics Beaulieu, Marie-Claire; Almas, Bridget ................................................................................. 105 DiaView: Visualise Cultural Change in Diachronic Corpora Beavan, David .......................................................................................................................... 107 Catch + Release: Research and Creation of a Digital New Media Exhibition in the Context of a Cultural and Heritage Museum Beer, Ruth ................................................................................................................................ 109 Opportunity and accountability in the ‘eResearch push’ Bellamy, Craig .......................................................................................................................... 111 Connecting European Women Writers. The Selma Lagerlöf Archive and Women Writers Database Bergenmar, Jenny; Olsson, Leif-Jöran ................................................................................... 113 Stylometric Analysis of Chinese Buddhist texts: Do different Chinese translations of the ‘Gandhavyūha’ reflect stylistic features that are typical for their age? Bingenheimer, Marcus; Hung, Jen-Jou; Hsieh, Cheng-en .................................................... 115 Information Extraction on Noisy Texts for Historical Research Blanke, Tobias; Bryant, Michael; Speck, Reto; Kristel, Conny ............................................. 117 Modeling Gender: The ‘Rise and Rise’ of the Australian Woman Novelist Bode, Katherine ........................................................................................................................ 119 Contextual factors in literary quality judgments: A quantitative analysis of an online writing community Boot, Peter ................................................................................................................................ 121 Violence and the Digital Humanities Text as Pharmakon Bradley, Adam James ............................................................................................................. 123 Towards a bibliographic model of illustrations in the early modern illustrated book Bradley, John; Pigney, Stephen .............................................................................................. 124 Automatic Mining of Valence Compounds for German: A Corpus-Based Approach Brock, Anne; Henrich, Verena; Hinrichs, Erhard; Versley, Yannick ................................... 126 XXI Digital Humanities 2012 Networks of networks: a critical review of formal network methods in archaeology through citation network analysis and close reading Brughmans, Tom ..................................................................................................................... 129 On the dual nature of written texts and its implications for the encoding of genetic manuscripts Brüning, Gerrit; Henzel, Katrin; Pravida, Dietmar .............................................................. 131 Automatic recognition of speech, thought and writing representation in German narrative texts Brunner, Annelen ..................................................................................................................... 135 Bringing Modern Spell Checking Approaches to Ancient Texts – Automated Suggestions for Incomplete Words Büchler, Marco; Kruse, Sebastian; Eckart, Thomas .............................................................. 137 Designing a national ‘Virtual Laboratory’ for the humanities: the Australian HuNI project Burrows, Toby Nicolas ............................................................................................................ 139 Beyond Embedded Markup Buzzetti, Dino; Thaller, Manfred ............................................................................................ 142 Myopia: A Visualization Tool in Support of Close Reading Chaturvedi, Manish; Gannod, Gerald; Mandell, Laura; Armstrong, Helen; Hodgson, Eric ........................................................................................................................................... 148 Translation Arrays: Exploring Cultural Heritage Texts Across Languages Cheesman, Tom; Thiel, Stephan; Flanagan, Kevin; Zhao, Geng; Ehrmann, Alison; Laramee, Robert S.; Hope, Jonathan; Berry, David M. ........................................................ 151 Constructing a Chinese as Second Language Learner Corpus for Language Learning and Research Chen, Howard .......................................................................................................................... 154 Social Curation of large multimedia collections on the cloud Chong, Dazhi; Coppage, Samuel; Gu, Xiangyi; Maly, Kurt; Wu, Harris; Zubair, Mohammad .............................................................................................................................. 155 Sounding for Meaning: Analyzing Aural Patterns Across Large Digital Collections Clement, Tanya; Auvil, Loretta; Tcheng, David; Capitanu, Boris; Monroe, Megan; Goel, Ankita ....................................................................................................................................... 158 The Programming Historian 2: A Participatory Textbook Crymble, Adam H.; MacEachern, Alan; Turkel, William J. ................................................. 162 Multilingual and Semantic Extension of Folk Tale Catalogues Declerck, Thierry; Lendvai, Piroska; Darányi, Sándor ........................................................ 163 Digital Language Archives and Less-Networked Speaker Communities Dobrin, Lise M. ........................................................................................................................ 167 Language Documentation and Digital Humanities: The (DoBeS) Language Archive Drude, Sebastian; Trilsbeek, Paul; Broeder, Daan ............................................................... 169 The potential of using crowd-sourced data to re-explore the demography of Victorian Britain Duke-Williams, Oliver William ............................................................................................... XXII 173 Digital Humanities 2012 Sharing Ancient Wisdoms: developing structures for tracking cultural dynamics by linking moral and philosophical anthologies with their source and recipient texts Dunn, Stuart; Hedges, Mark; Jordanous, Anna; Lawrence, Faith; Roueche, Charlotte; Tupman, Charlotte; Wakelnig, Elvira .................................................................................... 176 Recovering the Recovered Text: Diversity, Canon Building, and Digital Studies Earhart, Amy ........................................................................................................................... 179 Mind your corpus: systematic errors in authorship attribution Eder, Maciej ............................................................................................................................. 181 Underspecified, Ambiguous or Formal. Problems in Creating Maps Based on Texts Eide, Øyvind ............................................................................................................................. 185 A Frequency Dictionary of Modern Written and Oral Media Arabic Elmaz, Orhan ........................................................................................................................... 188 Texts in Motion – Rethinking Reader Annotations in Online Literary Texts Fendt, Kurt E.; Kelley, Wyn; Zhang, Jia; Della Costa, Dave ................................................ 190 May Humanists Learn from Artists a New Way to Interact with Digital Technology? Franchi, Stefano ....................................................................................................................... 192 A flexible model for the collaborative annotation of digitized literary works Gayoso-Cabada, Joaquin; Ruiz, Cesar; Pablo-Nuñez, Luis; Sarasa-Cabezuelo, Antonio; Goicoechea-de-Jorge, Maria; Sanz-Cabrerizo, Amelia; Sierra-Rodriguez, Jose-Luis ......... 195 HyperMachiavel: a translation comparison tool Gedzelman, Séverine; Zancarini, Jean-Claude ...................................................................... 198 Discrimination sémantique par la traduction automatique, expériences sur le dictionnaire français de Littré Glorieux, Frédéric; Jolivet, Vincent ....................................................................................... 202 The Myth of the New: Mass Digitization, Distant Reading and the Future of the Book Gooding, Paul Matthew; Warwick, Claire; Terras, Melissa ................................................ 204 Designing Navigation Tools for an Environmental Humanities Portal: Considerations and Critical Assessments Graf von Hardenberg, Wilko; Coulter, Kimberly ................................................................. 206 Processing Email Archives in Special Collections Hangal, Sudheendra; Chan, Peter; Lam, Monica S.; Heer, Jeffrey ..................................... 208 The Stylometry of Collaborative Translation Heydel, Magda; Rybicki, Jan ................................................................................................. 212 Focus on Users in the Open Development of the National Digital Library of Finland Hirvonen, Ville; Kautonen, Heli Johanna .............................................................................. 215 The Rarer They Are, the More There Are, the Less They Matter Hoover, David .......................................................................................................................... 218 XXIII Digital Humanities 2012 Experiments in Digital Philosophy – Putting new paradigms to the test in the Agora project Hrachovec, Herbert; Carusi, Annamaria; Huentelmann, Raphael; Pichler, Alois; Antonio, Lamarra; Cristina, Marras; Alessio, Piccioli; Lou, Burnard ................................. 221 Information Discovery in the Chinese Recorder Index Hsiang, Jieh; Kong, Jung-Wei; Sung, Allan ......................................................................... 224 Complex Network Perspective on Graphic Form System of Hanzi Hu, Jiajia; Wang, Ning .......................................................................................................... 228 A Computer-Based Approach for Predicting the Translation Time Period of Early Chinese Buddhism Translation Hung, Jen-Jou; Bingenheimer, Marcus; Kwok, Jieli ............................................................ 230 Bridging Multicultural Communities: Developing a Framework for a European Network of Museum, Libraries and Public Cultural Institutions Innocenti, Perla; Richards, John; Wieber, Sabine ................................................................ 232 Ptolemy’s Geography and the Birth of GIS Isaksen, Leif ............................................................................................................................. 236 Tracing the history of Noh texts by mathematical methods. Validitating the application of phylogenetic methods to Noh texts Iwata, Yoshimi ........................................................................................................................ 239 Computing and Visualizing the 19th-Century Literary Genome Jockers, Matthew .................................................................................................................... 242 Using the Google Ngram Corpus to Measure Cultural Complexity Juola, Patrick .......................................................................................................................... 245 ‘All Rights Worth Recombination’: Post-Hacker Culture and ASCII Literature (1983-1993) Katelnikoff, Joel ....................................................................................................................... 247 Evaluating Unmasking for Cross-Genre Authorship Verification Kestemont, Mike; Luyckx, Kim; Daelemans, Walter; Crombez, Thomas ............................ 249 Literary Wikis: Crowd-sourcing the Analysis and Annotation of Pynchon, Eco and Others Ketzan, Erik ............................................................................................................................. 252 Social Network Analysis and Visualization in ‘The Papers of Thomas Jefferson’ Klein, Lauren Frederica .......................................................................................................... 254 VariaLog: how to locate words in a French Renaissance Virtual Library Lay, Marie Hélène ................................................................................................................... 256 DeRiK: A German Reference Corpus of Computer-Mediated Communication Lemnitzer, Lothar; Beißwenger, Michael; Ermakova, Maria; Geyken, Alexander; Storrer, Angelika ................................................................................................................................... 259 Estimating the Distinctiveness of Graphemes and Allographs in Palaeographic Classification Levy, Noga; Wolf, Lior; Dershowitz, Nachum; Stokes, Peter .............................................. XXIV 264 Digital Humanities 2012 Academic Research in the Blogosphere: Adapting to New Opportunities and Risks on the Internet Littauer, Richard; Winters, James; Roberts, Sean; Little, Hannah; Pleyer, Michael; Benzon, Bill .............................................................................................................................. 268 Feeling the View: Reading Affective Orientation of Tagged Images Liu, Jyi-Shane; Peng, Sheng-Yang ......................................................................................... 270 Characterizing Authorship Style Using Linguistic Features Lucic, Ana; Blake, Catherine .................................................................................................. 273 Investigating the genealogical relatedness of the endagered Dagon languages Moran, Steven; Prokic, Jelena ............................................................................................... 276 Landscapes, languages and data structures: Issues in building the Placenames Database of Ireland Měchura, Michal Boleslav ...................................................................................................... 278 Interoperability of Language Documentation Tools and Materials for Local Communities Nakhimovsky, Alexander; Good, Jeff; Myers, Tom .............................................................. 280 Content Creation by Domain Experts in a Semantic GIS System Nakhimovsky, Alexander; Myers, Tom ................................................................................. 283 From Preserving Language Resources to Serving Language Speakers: New Prospects for Endangered Languages Archives Nathan, David John ................................................................................................................ 286 Retrieving Writing Patterns From Historical Manuscripts Using Local Descriptors Neumann, Bernd; Herzog, Rainer; Solth, Arved; Bestmann, Oliver; Scheel, Julian ........... 288 Distractorless Authorship Verification Noecker Jr., John; Ryan, Michael ......................................................................................... 292 Cataloguing linguistic diversity: Glottolog/Langdoc Nordhoff, Sebastian; Hammarström, Harald ....................................................................... 296 Geo-Temporal Interpretation of Archival Collections Using Neatline Nowviskie, Bethany; Graham, Wayne; McClure, David; Boggs, Jeremy; Rochester, Eric ........................................................................................................................................... 299 Enriching Digital Libraries Contents with SemLib Semantic Annotation System Nucci, Michele; Grassi, Marco; Morbidoni, Christian; Piazza, Francesco .......................... 303 The VL3: A Project at the Crossroads between Linguistics and Computer Science Nuñez, Camelia Gianina; Mavillard, Antonio Jiménez ........................................................ 306 ‘Eric, you do not humble well’: The Image of the Modern Vampire in Text and on Screen Opas-Hänninen, Lisa Lena; Hettel, Jacqueline; Toljamo, Tuomo; Seppänen, Tapio .......... 308 Electronic Deconstruction of an argument using corpus linguistic analysis of its on-line discussion forum supplement O’Halloran, Kieran Anthony ................................................................................................... 310 Citygram One: Visualizing Urban Acoustic Ecology Park, Tae Hong; Miller, Ben; Shrestha, Ayush; Lee, Sangmi; Turner, Jonathan; Marse, Alex ........................................................................................................................................... 313 XXV Digital Humanities 2012 Towards Wittgenstein on the Semantic Web Pichler, Alois; Zöllner-Weber, Amélie .................................................................................... 318 Uncovering lost histories through GeoStoryteller: A digital GeoHumanities project Rabina, Debbie L.; Cocciolo, Anthony ................................................................................... 322 Workflows as Structured Surfaces Radzikowska, Milena; Ruecker, Stan; Rockwell, Geoffrey; Brown, Susan; Frizzera, Luciano; INKE Research Group ............................................................................................. 324 Code-Generation Techniques for XML Collections Interoperability Ramsay, Stephen; Pytlik-Zillig, Brian ................................................................................... 327 Uncertain Date, Uncertain Place: Interpreting the History of Jewish Communities in the Byzantine Empire using GIS Rees, Gethin Powell ................................................................................................................. 329 Code sprints and Infrastructure Reside, Doug; Fraistat, Neil; Vershbow, Ben; van Zundert, Joris Job ................................. 331 Digital Genetic Criticism of RENT Reside, Doug ............................................................................................................................ 333 On the Internet, nobody knows you’re a historian: exploring resistance to crowdsourced resources among historians Ridge, Mia ............................................................................................................................... 335 Formal Semantic Modeling for Human and Machine-based Decoding of Medieval Manuscripts Ritsema van Eck, Marianne Petra; Schomaker, Lambert .................................................... 336 The Swallow Flies Swiftly Through: An Analysis of Humanist Rockwell, Geoffrey; Sinclair, Stéfan ...................................................................................... 339 The Digital Mellini Project: Exploring New Tools & Methods for Art-historical Research & Publication Rodríguez, Nuria; Baca, Murtha; Albrezzi, Francesca; Longaker, Rachel .......................... 342 Intertextuality and Influence in the Age of Enlightenment: Sequence Alignment Applications for Humanities Research Roe, Glenn H.; The ARTFL Project ........................................................................................ 345 Engaging the Museum Space: Mobilising Visitor Engagement with Digital Content Creation Ross, Claire Stephanie; Gray, Steven; Warwick, Claire; Hudson Smith, Andrew; Terras, Melissa ..................................................................................................................................... 348 Aiding the Interpretation of Ancient Documents Roued-Cunliffe, Henriette ........................................................................................................ 351 The Twelve Disputed ‘Federalist’ Papers: A Case for Collaboration Rudman, Joseph ...................................................................................................................... 353 Writing with Sound: Composing Multimodal, Long-Form Scholarship Sayers, Jentery ........................................................................................................................ 357 Intra-linking the Research Corpus: Using Semantic MediaWiki as a lightweight Virtual Research Environment Schindler, Christoph; Ell, Basil; Rittberger, Marc ................................................................ XXVI 359 Digital Humanities 2012 Corpus Coranicum: A digital landscape for the study of the Qu’ran Schnöpf, Markus ..................................................................................................................... 362 The MayaArch3D Project: A 3D GIS Web System for Querying Ancient Architecture and Landscapes Schwerin, Jennifer von; Richards-Rissetto, Heather; Agugiaro, Giorgio; Remondino, Fabio; Girardi, Gabrio ............................................................................................................ 365 Multi-dimensional audio-visual technology: Evidence from the endangered language documentation Sharma, Narayan P. ............................................................................................................... 368 Contours of the Past: Computationally Exploring Civil Rights Histories Shaw, Ryan Benjamin ............................................................................................................ 370 Notes from the Collaboratory: An Informal Study of an Academic DH Lab in Transition Siemens, Lynne; Siemens, Raymond ..................................................................................... 373 XML-Print: an Ergonomic Typesetting System for Complex Text Structures Sievers, Martin; Burch, Thomas; Küster, Marc W.; Moulin, Claudine; Rapp, Andrea; Schwarz, Roland; Gan, Yu ...................................................................................................... 375 Federated Digital Archives and Disaster Recovery: The Role of the Digital Humanities in Post-earthquake Christchurch Smithies, James Dakin ........................................................................................................... 380 Modeling Medieval Handwriting: A New Approach to Digital Palaeography Stokes, Peter ............................................................................................................................ 382 A Digital Geography of Hispanic Baroque Art Suárez, Juan-Luis; Sancho-Caparrini, Fernando ................................................................. 385 Approaching Dickens’ Style through Random Forests Tabata, Tomoji ........................................................................................................................ 388 Interfacing Diachrony: Visualizing Linguistic Change on the Basis of Digital Editions of Serbian 18th-Century Texts Tasovac, Toma; Ermolaev, Natalia ....................................................................................... 392 Promise and Practice of Enhanced Publications to Complement ConventionallyPublished Scholarly Monographs Tatum, Clifford; Jankowski, Nicholas; Scharnhorst, Andrea .............................................. 394 Culpeper’s legacy: How title pages sold books in the 17th century Tyrkkö, Jukka Jyrki Juhani; Suhr, Carla Maria; Marttila, Ville ......................................... 396 The Differentiation of Genres in Eighteenth- and Nineteenth-Century English Literature Underwood, Ted; Sellers, Jordan; Auvil, Loretta; Capitanu, Boris ..................................... 397 Digital editions with eLaborate: from practice to theory van Dalen-Oskam, Karina; van Zundert, Joris Job ............................................................. 400 Delta in 3D: Copyists Distinction by Scaling Burrows’s Delta van Zundert, Joris Job; van Dalen-Oskam, Karina ............................................................. 402 Wiki Technologies for Semantic Publication of Old Russian Charters Varfolomeyev, Aleksey; Ivanovs, Aleksandrs ........................................................................ 405 XXVII Digital Humanities 2012 L’histoire de l’art à l’ère numérique – Pour une historiographie médiologique Welger-Barboza, Corinne ....................................................................................................... 407 Benefits of tools and applications for a digitized analysis of Chinese Buddhist inscriptions Wenzel, Claudia ........................................................................................................................ 411 The ARTeFACT Movement Thesaurus: toward an open-source tool to mine movement-derived data Wiesner, Susan L.; Bennett, Bradford; Stalnaker, Rommie L. ............................................. 413 The electronic ‘Oxford English Dictionary’, poetry, and intertextuality Williams, David-Antoine ......................................................................................................... 415 Reasoning about Genesis or The Mechanical Philologist Wissenbach, Moritz; Pravida, Dietmar; Middell, Gregor ..................................................... 418 The Digital Daozang Jiyao – How to get the edition into the Scholar’s labs Wittern, Christian ................................................................................................................... 422 Posters A Digital Approach to Sound Symbolism in English: Evidence from the Historical Thesaurus Alexander, Marc; Kay, Christian ........................................................................................... 427 Collaborative Video and Image Annotation Arnold, Matthias; Knab, Cornelia; Decker, Eric ................................................................... 429 Le Système modulaire de gestion de l’information historique (SyMoGIH): une plateforme collaborative et cumulative de stockage et d’exploitation de l’information géo-historique Beretta, Francesco; Vernus, Pierre; Hours, Bernard ............................................................ 431 Realigning Digital Humanities Training: The Praxis Program at the Scholars’ Lab Boggs, Jeremy; Nowviskie, Bethany; Gil, Alexander; Johnson, Eric; Lestock, Brooke; Storti, Sarah; Swafford, Joanna; Praxis Program Collaborators ....................................... 433 Supporting the emerging community of MEI: the current landscape of tools for note entry and digital editing Bohl, Benjamin W.; Röwenstrunk, Daniel; Viglianti, Raffaele ............................................. 435 ‘The Past Is Never Dead. It’s Not Even Past’: The Challenge of Data Provenance in the e-Humanities Clark, Ashley M.; Holloway, Steven W. ................................................................................ 438 The Social Edition: Scholarly Editing Across Communities Crompton, Constance; Siemens, Raymond; The Devonshire MS Editorial Group ............... 441 Courting ‘The World’s Wife’: Original Digital Humanities Research in the Undergraduate Classroom Croxall, Brian .......................................................................................................................... 443 The Academy’s Digital Store of Knowledge Czmiel, Alexander; Jürgens, Marco ....................................................................................... 445 Building a TEI Archiving, Publishing, and Access Service: The TAPAS Project Flanders, Julia; Hamlin, Scott; Alvarado, Rafael; Mylonas, Elli ........................................ 448 Author Consolidation across European National Bibliographies Freire, Nuno ............................................................................................................................ XXVIII 450 Digital Humanities 2012 Historical Events Versus Information Contents – A Preliminary Analysis of the National Geographic Magazine Fujimoto, Yu ............................................................................................................................ 453 ‘Tejiendo la Red HD’ – A case study of building a DH network in Mexico Galina, Isabel; Priani, Ernesto; López, José; Rivera, Eduardo; Cruz, Alejandro ................ 456 Adaptive Automatic Gesture Stroke Detection Gebre, Binyam Gebrekidan; Wittenburg, Peter .................................................................... 458 Towards a Transnational Multilingual Caribbean Digital Humanities Lab Gil, Alexander .......................................................................................................................... 462 NUScholar: Digital Methods for Educating New Humanities Scholars Graff, Ann-Barbara; Lucas, Kristin; Blustein, James; Gibson, Robin; Woods, Sharon ...... 463 Latent Semantic Analysis Tools Available for All Digital Humanities Projects in Project Bamboo Hooper, Wallace Edd; Cowan, Will; Jiao, David; Walsh, John A. ....................................... 465 Machine Learning for Automatic Annotation of References in DH scholarly papers Kim, Young-Min; Bellot, Patrice; Faath, Elodie; Dacos, Marin ........................................... 467 An Ontology-Based Iterative Text Processing Strategy for Detecting and Recognizing Characters in Folktales Koleva, Nikolina; Declerck, Thierry; Krieger, Hans-Ulrich ................................................. 470 Integrated multilingual access to diverse Japanese humanities digital archives by dynamically linking data Kuyama, Takeo; Batjargal, Biligsaikhan; Kimura, Fuminori; Maeda, Akira ..................... 473 Linguistic concepts described with Media Query Language for automated annotation Lenkiewicz, Anna; Lis, Magdalena; Lenkiewicz, Przemyslaw .............................................. 477 Virtual Reproduction of Gion Festival Yamahoko Parade Li, Liang; Choi, Woong; Nishiura, Takanobu; Yano, Keiji; Hachimura, Kozaburo ............ 480 Complex entity management through EATS: the case of the Gascon Rolls Project Litta Modignani Picozzi, Eleonora; Norrish, Jamie; Monteiro Vieira, Jose Miguel ............ 483 TextGrid Repository – Supporting the Data Curation Needs of Humanities Researchers Lohmeier, Felix; Veentjer, Ubbo; Smith, Kathleen M.; Söring, Sibylle ................................ 486 RIgeo.net – A Lab for Spatial Exploration of Historical Data Loos, Lukas; Zipf, Alexander ................................................................................................. 488 Automatic Topic Hierarchy Generation Using Wordnet Monteiro Vieira, Jose Miguel; Brey, Gerhard † .................................................................... 491 Hypotheses.org, une infrastructure pour les Digital Humanities Muscinesi, Frédérique ............................................................................................................. 494 TXSTEP – an integrated XML-based scripting language for scholarly text data processing Ott, Wilhelm; Ott, Tobias; Gasperlin, Oliver ......................................................................... 497 XXIX Digital Humanities 2012 Exploring Prosopographical Resources Through Novel Tools and Visualizations: a Preliminary Investigation Pasin, Michele .......................................................................................................................... 499 Heterogeneity and Multilingualism vs. Usability – Challenges of the Database User Interface ‘Archiv-Editor’ Plutte, Christoph ...................................................................................................................... 502 Medievalists’ Use of Digital Resources, 2002 and 2012 Porter, Dot ............................................................................................................................... 505 Cross-cultural Approaches to Digital Humanities – Funding and Implementation Rhody, Jason; Kümmel, Christoph; Effinger, Maria; Freedman, Richard; Magier, David; Förtsch, Reinhard ................................................................................................................... 506 CWRC-Writer: An In-Browser XML Editor Rockwell, Geoffrey; Brown, Susan; Chartrand, James; Hesemeier, Susan ........................ 508 The Musici Database Roeder, Torsten; Plutte, Christoph .......................................................................................... 511 The TEICHI Framework: Bringing TEI Lite to Drupal Schöch, Christof; Achler, Stefan ............................................................................................. 514 What Has Digital Curation Got to Do With Digital Humanities? Schreibman, Susan; McCadden, Katiet Theresa; Coyle, Barry ............................................ 516 Orbis Latinus Online (OLO) Schultes, Kilian Peter; Geissler, Stefan .................................................................................. 518 Semantically connecting text fragments – Text-Text-Link-Editor Selig, Thomas; Küster, Marc W.; Conner, Eric Sean ............................................................ 520 The Melesina Trench Project: Markup Vocabularies, Poetics, and Undergraduate Pedagogy Singer, Kate ............................................................................................................................. 522 Digital Edition of Carl Maria von Weber’s Collected Works Stadler, Peter ........................................................................................................................... 525 Data sharing, virtual collaboration, and textual analysis: Working on ‘Women Writers In History’ van Dijk, Suzan; Hoogenboom, Hilde; Sanz, Amelia; Bergenmar, Jenny; Olsson, LeifJöran ........................................................................................................................................ 527 Storage Infrastructure of the Virtual Scriptorium St. Matthias Vanscheidt, Philipp; Rapp, Andrea; Tonne, Danah .............................................................. 529 Digital Emblematics – Enabling Humanities Research of a Popular Early Modern Genre Wade, Mara R.; Stäcker, Thomas; Stein, Regine; Brandhorst, Hans; Graham, David ....... 532 DTAQ – Quality Assurance in a Large Corpus of Historical Texts Wiegand, Frank ....................................................................................................................... 535 The Digital Averroes Research Environment – Semantic Relations in the Editorial Sciences Willems, Florian; Gärtner, Mattias ....................................................................................... XXX 537 Digital Humanities 2012 AV Processing in eHumanities – a paradigm shift Wittenburg, Peter; Lenkiewicz, Przemyslaw; Auer, Erik; Lenkiewicz, Anna; Gebre, Binyam Gebrekidan; Drude, Sebastian ................................................................................. 538 XXXI Digital Humanities 2012 List of Reviewers - Susan Brown - Hiroyuki Akama - Marjorie Burghart - Marc Alexander - Lou Burnard - Peter Roger Alsop - Elisabeth Burr - Deborah Anderson - Toby Nicolas Burrows - Vadim Sergeevich Andreev - Dino Buzzetti - Tara Lee Andrews - Olivier Canteaut - Simon James Appleford - Paul Caton - Stewart Arneil - Hugh Cayless - Rolf Harald Baayen - Tom Cheesman - Drew Baker - Shih-Pei Chen - David Bamman - Paula Horwarth Chesley - Piotr Bański - Tatjana Chorney - Brett Barney - Neil Chue Hong - Sabine Bartsch - Arianna Ciula - Patsy Baudoin - Florence Clavaud - Syd Bauman - Frédéric Clavert - Ryan Frederick Baumann - Tanya Clement - David Beavan - Claire Clivaz - Craig Bellamy - Louisa Connors - Hans Bennis - Paul Conway - Anna Bentkowska-Kafel - Charles M. Cooney - Alejandro Bia - David Christopher Cooper - Hanno Biber - Hugh Craig - Marcus Bingenheimer - Tim Crawford - Tobias Blanke - James C. Cummings - Gabriel Bodard - Richard Cunningham - Jeremy Boggs - Alexander Czmiel - Peter Kees Bol - Marin Dacos - Geert E. Booij - Stefano David - Peter Boot - Rebecca Frost Davis - Lars Broin - John Dawson - Federico Boschetti - Marilyn Deegan - Arno Bosse - Janet Delve - Matthew Bouchard - Kate Devlin - William Bowen - Joseph DiNunzio - John Bradley - Quinn Anya Dombrowski - David Prager Branner - Jeremy Douglass - Gerhard Brey - J. Stephen Downie - Anne-Laure Brisac - David S. Dubin - Marco Büchler 1 Digital Humanities 2012 - Stuart Dunn - Nathalie Groß - Alastair Dunning - Gretchen Mary Gueguen - Amy Earhart - Carolyn Guertin - Michael Eberle-Sinatra - Ann Hanlon - Thomas Eckart - Eric Harbeson - Maciej Eder - Katherine D. Harris - Jennifer C. Edmond - Kevin Scott Hawkins - Gabriel Egan - Sebastian Heath - Øyvind Eide - Mark Hedges - Paul S. Ell - Serge Heiden - Deena Engel - Gerhard Heyer - Maria Esteva - Timothy Hill - Martin Everaert - Brett Hirsch - Kurt E. Fendt - Martin Holmes - Franz Fischer - David L. Hoover - Kathleen Fitzpatrick - Xiao Hu - Julia Flanders - Lorna Hughes - Dominic Forest - Barbara Hui - Fenella Grace France - Claus Huitfeldt - Amanda French - Lásló Hunyadi - Christiane Fritze - Leif Isaksen - Chris Funkhouser - Aleksandrs Ivanovs - Jonathan Furner - Fotis Jannidis - Richard Furuta - Matthew Jockers - Isabel Galina Russell - Lars Johnsen - Liliane Gallet-Blanchard - Ian R. Johnson - David Gants - Patrick Juola - Susan Garfinkel - Samuli Kaislaniemi - Kurt Gärtner - Sarah Whitcher Kansa - Richard Gartner - John Gerard Keating - Alexander Gil - Margaret Kelleher - Joseph Gilbert - Kimon Keramidas - Sharon K. Goetz - Katia Lida Kermanidis - Mattew K. Gold - Erik Ketzan - Joel Goldfield - Foaad Khosmood - Sean Gouglas - Douglas Kibbee - Ann Gow - Gareth Knight - Stefan Gradmann - Fabian Körner - Wayne Graham - Kimberly Kowal - Harriett Elisabeth Green - Steven Krauwer - Jan Gregory - William Kretzschmar 2 Digital Humanities 2012 - Michael Adam Krot - Kiyonori Nagasaki - Christoph Kuemmel - Brent Nelson - Maurizio Lana - John Nerbonne - Lewis Rosser Lancaster - Greg T. Newton - Anouk Lang - Angel David Nieves - John Lavagnino - Bethany Nowviskie - Alexei Lavrentiev - Julianne Nyhan - Séamus Lawless - Daniel Paul O’Donnell - Katharine Faith Lawrence - Kazushi Ohya - Domingo Ledezma - Mark Olsen - Caroline Leitch - Lisa Lena Opas-Hänninen - Piroska Lendvai - Christian-Emil Ore - Richard J. Lewis - Espen S. Ore - Thomas Lippincott - John Paolillo - Eleonora Litta Modignani Picozzi - Brad Pasanek - Clare Llewellyn - Michele Pasin - Dianella Lombardini - Susan Holbrook Perdue - Elizabeth Losh - Santiago Perez Isasi - Ana Lucic - Elena Pierazzo - Harald Lüngen - Wendell Piez - Kim Luychx - Daniel Pitti - Akira Maeda - Dorothy Carr Porter - Simon Mahony - Andrew John Prescott - Martti Makinen - Ernesto Priani - Kurt Maly - Michael John Priddy - Worthy N. Martin - Brian L. Pytlik Zillig - Javier Martín Arista - Sebastian Rahtz - Jarom Lyle McDonald - Michael John Rains - Stephanie Meece - Stephen Ramsay - Federico Meschni - Andrea Rapp - Adrian Miles - Gabriela Gray Redwine - Maki Miyake - Dean Rehberger - Jose Miguel Monteiro Viera - Georg Rehm - Ruth Mostern - Allen H. Renear - Stuart Moulthrop - Doug Reside - Martin Mueller - Jason Rhody - A. Charles Muller - Allen Beye Riddell - Trevor Muñoz - Jim Ridolfo - Orla Murphy - David Robey - Frédérique Muscinesi - Peter Robinson - Elli Mylonas - Geoffrey Rockwell 3 Digital Humanities 2012 - Nuria Rodríguez - Paul Joseph Spence - Glenn H. Roe - Michael Sperberg-McQueen - Torsten Roeder - Lisa Spiro - Augusta Rohrbach - Peter Anthony Stokes - Matteo Romanello - Suzana Sukovic - Laurent Romary - Chris Alen Sula - Lisa Rosner - Takafumi Suzuki - Charlotte Roueché - Elizabeth Anne Swanstrom - Henriette Roued-Cunliffe - Tomoji Tabata - Joseph Rudman - Toma Tasovac - Stan Ruecker - Aja Teehan - Angelina Russo - Elke Teich - Jan Rybicki - Melissa Terras - Patrick Sahle - Manfred Thaller - Patrick Saint-Dizier - Ruck John Thawonmas - Jon Saklofske - Christopher Theibault - Gabriele Salciute-Civiliene - Amalia Todirascu - Manuel Sánchez-Quero - Kathryn Tomasek - Concha Sanz - Marijana Tomić - Jentery Sayers - Charles Bartlett Travis - Torsten Schaßan - Thorsten Trippel - Stephanie Schlitz - Charlotte Tupman - Desmond Schmidt - Kirsten Carol Uszkalo - Harry Schmidt - Ana Valverde Mateos - Sara A. Schmidt - Karina van Dalen-Oskam - Christof Schöch - Ron van den Branden - Susan Schreibman - H. J. van den Herik - Charlotte Schubert - Bert van Elsacker - Paul Anthony Scifleet - Marieke van Erp - Tapio Seppänen - Seth van Hooland - Ryan Benjamin Shaw - Joris Job van Zundert - William Stewart Shaw - Tomás Várdi - Lynne Siemens - John T. Venecek - Raymond George Siemens - Silvia Verdu Ruiz - Gary F. Simons - Christina Vertan - Stéfan Sinclair - Paul Vetch - Kate Natalie Singer - Raffaele Viglianti - Natasha Smith - John Walsh - James Dakin Smithies - Katherine L. Walter - Lisa M. Snyder - Claire Warwick - Małgorzata Sokoł - Robert William Weidman 4 Digital Humanities 2012 - Corinne Welger-Barboza - Willeke Wendrich - Susan L. Wiesner - Matthew Wilkens - Perry Willett - William Winder - Andreas Witt - Christian Wittern - Mark Wolff - Glen Worthey - Clifford Edward Wulfman - Vika Zafrin - Douwe Zeldenrust - Matthew Zimmerman - Amélie Zöllner-Weber 5 Plenary Sessions Digital Humanities 2012 Dynamics and Diversity: Exploring European and Transnational Perspectives on Digital Humanities Research Infrastructures Moulin, Claudine moulin@uni-trier.de Trier Centre for Digital Humanities, University of Trier, Germany In my talk I would like to reflect on Digital Humanities and Research infrastructures from an international and interdisciplinary perspective. Preserving and documenting cultural and linguistic variety is not only one of the most important challenges linked to the development and future impact of Digital Humanities on scholarly work and methodologies, it is also one of the key elements to be considered in the field of policy making and research funding at the European and international level. I will explore this by reflecting on Digital Humanities and its outcomes from the perspectives of cultural history and linguistic and interdisciplinary diversity; I will also tackle key questions related to building multi-level inter- and transdisciplinary projects and transnational research infrastructures. In addition to considering how Digital Humanities can extend and transform existing scholarly practice I will also consider how it is fostering the emergence of new cultural practices that look beyond established academic circles, for example, interactions between Digital Humanities and works of art. Biographical Note Claudine Moulin studied German and English philology in Brussels and Bamberg, receiving postdoctoral research grants for manuscript studies in Oxford. She was a Heisenberg fellow of the Deutsche Forschungsgemeinschaft (DFG); since 2003 she holds the chair for German Philology/ Historical Linguistics at the University of Trier/Germany and is the Scientific Director of the Trier Centre for Digital Humanities. She has published monographs and articles in the domain of text editing, digital lexicography, glossography, historical linguistics, and manuscript and annotation studies. She is a founding member of the Historisch-Kulturwissenschaftliches Forschungszentrum (HKFZ Trier). She is a member of the Standing Committee for the Humanities of the European Science Foundation (ESF) and speaker of the ESF-Expert Group on Research Infrastructures in the Humanities. She was recipient of the Academy Award of Rhineland-Palatinate in 2010, Academy of Sciences and Literature, Mainz. C. Moulin is co-editor of the linguistic journal Sprachwissenschaft and of the series ‘Germanistische Bibliothek‘; together with Julianne Nyhan and Arianna Ciula (et al.) she has published in 2011 the ESF-Science Policy Briefing on Research Infrastructures in the Humanities ( http://www.esf.org/research-areas/humanities/strategic-activities/research-infrast ructures-in-the-humanities.html ). 9 Digital Humanities 2012 Embracing a Distant View of the Digital Humanities Shimoda, Masahiro shimoda@l.u-tokyo.ac.jp Department of Indian Philosophy and Buddhist Studies/ Center for Evolving Humanities, Graduate School of Humanities and Sociology, the University of Tokyo, Japan How should cultures transmit what they believe to be of vital importance from their own culture in its period of decline to another culture on the rise? This question, taken as one of the most challenging by contemporary historians, might well be posed themselves by DH scholars for the purpose of recognizing the magnitude of the problem they have been confronting and the significance of the endeavor they have been undertaking in the domain of the humanities. The variety of efforts that cannot but be included with the aims of each individual project, when combined together in a single arena such as DH 2012, will in the end be found to have been dedicated to the larger project of the constructing of ‘another culture on the rise’ on a global scale in this age of drastic transformation of the medium of knowledge. To keep this sort of ‘distant view’ of DH in mind, while it seems to have no immediate, sensible influence on our dayto-day ‘close views,’ though, would be inevitable. Inevitable not only in the making of untiring efforts to improve research environments amidst bewildering changes of technologies, but also in positively inviting new, unknown, unprecedented enterprises into the domain of DH. DH in its diverse forms should therefore assimilate into its identity the ostensively incommensurate aspects of steadfastness and flexibility. Among the numerous approaches that might be taken for understanding this peculiar identity of DH, I would like to demonstrate the significance of Digital Humanities research of Buddhist scriptures appropriately placed in the longer view of the history of the humanities. Biographical Note Masahiro Shimoda is a Professor in Indian Philosophy and Buddhist Studies with a cross appointment in the Digital Humanities Section of the Center for Evolving Humanities at the University of Tokyo. He has been Visiting Professor at the School of Oriental and African Studies, University College London (2006), Visiting Professor at Stanford University (2010), and is presently Visiting Research Fellow at University of Virginia (2012). He is the president of Japanese Association for Digital Humanities established last September (2011), and the chair of the trans-school program of Digital Humanities at the University of Tokyo, which has started on the 1st April 2012 in the collaborative program among the Graduate School of Interdisciplinary Information Studies, the Graduate School of Humanities and Sociology, and the Center for Structuring Knowledge. As his main project, Shimoda has launched since 2010 with government granted budget ‘the construction of academic Buddhist knowledge base in international alliance.’ This multi-nodal project, comprising seven major projects of self-financed agencies (Indo-Tibetan Lexical Resources at University of Hamburg, Hobogirin project at École Français d’Étrême Orient, Pali Text Compilation Project at Dhammacai Institute in Thai, Digital Dictionary of Buddhism in Tokyo etc.) with SAT (Chinese Buddhist Text Corpus Project at the University of Tokyo) placed as their hub, aims at providing a variety of research resources for Buddhist studies such as the primary sources, secondary resources, catalogues, dictionaries, lexicons and translations, all databases interlinked to each other at a deep structural level. 11 Pre-conference Workshops Digital Humanities 2012 Digital Methods in Manuscript Studies Brockmann, Christian christian.brockmann@uni-hamburg.de University og Hamburg, Germany Wangchuk, Dorji dorji.wangchuk@uni-hamburg.de University of Hamburg, Germany The workshop will consist of brief introductory presentations on current developments in these areas by international experts, short hands-on and demonstration units on multispectral imaging and computer-assisted script and feature analysis as well as discussions on expected future developments, application perspectives, challenges and possible fields of cooperation. Joint speakers/demonstrators: Jost Gippert (University of Frankfurt) Lior Wolf (Tel-Aviv University) Manuscript Studies constitute one of the main research areas in the Humanities at the University of Hamburg. This has been underlined recently when the large multidisciplinary Centre for the Study of Manuscript Cultures (Sonderforschungsbereich 950, http://www.manuscript-cultures.uni-ha mburg.de/ ) won a substantial grant from the Deutsche Forschungsgemeinschaft in May 2011. The centre can draw on experience aggregated by several other projects in Hamburg such as ‘Forschergruppe Manuskriptkulturen in Asien und Afrika’, ‘Teuchos. Zentrum für Handschriften- und Textforschung’ and ‘Nepalese-German Manuscript Cataloguing Project (NGMCP)’. Manuscripts as the central and most important media for the dissemination of writing have influenced and even shaped cultures worldwide for millennia, and only the relatively recent advent of the printed book has challenged their predominance. In the past few years, Manuscript Studies have profited greatly from the use of digital methods and technologies, ranging e.g. from the creation of better and more accessible images to electronic editions and from a broad range of special databases supporting this research to analytical tools. Whereas such areas as digital cataloguing and editing have received more extensive coverage, methods more specific to the study of manuscripts in particular deserve broader attention. Lorenzo Perilli (University of Rome, Tor Vergata) Domenico Fiormonte (Università di Roma Tre) Agnieszka Helman-Wazny (University of Hamburg) / Jeff Wallman (Tibetan Buddhist Resource Center, New York) / Orna Almogi (University of Hamburg, Centre for the Study of Manuscript Cultures) Boryana Pouvkova / Claire MacDonald (University of Hamburg, Centre for the Study of Manuscript Cultures) Daniel Deckers (University of Hamburg) Arved Solth / Bernd Neumann (University of Hamburg, Centre for the Study of Manuscript Cultures) Ira Rabin, Oliver Hahn, Emanuel Kindzorra (Centre for the Study of Manuscript Cultures, Federal Institute for Materials Research and Testing) This workshop focuses on Manuscript Studies as a distinctive field, i.e. the study of manuscripts as a characteristic feature and expression of those cultures that are built on their use. It will examine recent developments in digital methods that can be applied across various manuscript cultures worldwide, and aim to make awareness and discussion of these accessible to a broader group of scholars. It focuses exclusively on new developments in its subject fields that rely on the digital medium or on recent advances in technology as applied to the study of manuscripts, with a penchant towards aspects beyond the scope of the individuals fields concerned with just one particular manuscript culture. 15 Digital Humanities 2012 Introduction to Stylomatic Analysis using R Eder, Maciej maciejeder@gmail.com Pedagogical University, Kraków, Poland Rybicki, Jan jkrybicki@gmail.com Jagiellonian University, Kraków, Poland 1. Brief Description Stylometry, or the study of measurable features of (literary) style, such as sentence length, vocabulary richness and various frequencies (of words, word lengths, word forms, etc.), has been around at least since the middle of the 19th century, and has found numerous practical applications in authorship attribution research. These applications are usually based on the belief that there exist such conscious or unconscious elements of personal style that can help detect the true author of an anonymous text; that there exist stylistic fingerprints that can betray the plagiarist; that the oldest authorship disputes (St. Paul’s epistles or Shakespeare’s plays) can be settled with more or less sophisticated statistical methods. While specific issues remain largely unresolved (or, if closed once, they are sooner or later reopened), a variety of statistical approaches has been developed that allow, often with spectacular precision, to identify texts written by several authors based on a single example of each author’s writing. But even more interesting research questions arise beyond bare authorship attribution: patterns of stylometric similarity and difference also provide new insights into relationships between different books by the same author; between books by different authors; between authors differing in terms of chronology or gender; between translations of the same author or group of authors; helping, in turn, to find new ways of looking at works that seem to have been studied from all possible perspectives. Nowadays, in the era of ever-growing computing power and of evermore literary texts available in electronic form, we are able to perform stylometric experiments that our predecessors could only dream of. This half-day workshop is a hands-on introduction to stylometric analysis in the programming language R, using an emerging tool, a collection of Maciej Eder’s and Jan Rybicki’s scripts, which perform multivariate analyses of the frequencies of the most frequent words, the most frequent word n-grams, and the most frequent letter n-grams. 16 One of the scripts produces Cluster Analysis, Multidimensional Scaling, Principal Component Analysis and Bootstrap Consensus Tree graphs based on Burrows’s Delta and other distance measures; it applies additional (and optional) procedures, such as Hoover’s ‘culling’ and pronoun deletion. As by-products, it can be used to generate various frequency lists; a stand-alone word-frequencymaker is also available. Another script provides insight into state-of-the-art supervised techniques of classification, such as Support Vector Machines, kNearest Neighbor classification, or, more classically, Delta as developed by Burrows. Our scripts have already been used by other scholars to study Wittgenstein’s dictated writings or, believe it or not, DNA sequences! The workshop will be an opportunity to see this in practice in a variety of text collections, investigated for authorial attribution, translatorial attribution, genre, gender, chronology. Text collections in a variety of languages will be provided; workshop attendees are welcome to bring even more texts (in either plain text format or tei-xml). No previous knowledge of R is necessary: our script is very userfriendly (and very fast)! 2. Tutorial Outline During a brief introduction, (1) R will be installed on the users’ laptops from the Internet (if it has not been already installed); (2) participants will receive CDs/ pendrives with the script(s), a short quickstart guide and several text collections prepared for analysis; (3) some theory behind this particular stylometric approach will be discussed, and the possible uses of the tools presented will be summarized. After that and (4) a short instruction, participants will move on to (5) hands-on analysis to produce as many different results as possible to better assess the various aspects of stylometric study; (6) additional texts might be downloaded from the Internet or added by the participants themselves. The results, both numeric and visualizations, will be analyzed. For those more advanced in R (or S, or Matlab), details of the script (R methods, functions, and packages) will be discussed. 3. Special Requirements Participants should come with their own laptops. We have versions of scripts for Windows, MacOS and Linux. The workshop also requires a projector and Internet connection in the workshop room. Digital Humanities 2012 References Baayen, H. (2008). Analyzing Linguistic Data: A Practical Introduction to Statistics using R. Cambridge: Cambridge UP. Rybicki, J., and M. Eder (2011). Deeper Delta across genres and languages: do we really need the most frequent words?. Literary and Linguistic Computing 26(3): 315-321. Burrows, J. (1987). Computation into Criticism: A Study of Jane Austen’s Novels and an Experiment in Method. Oxford: Clarendon Press. Burrows, J. F. (2002). ‘Delta’: a measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing 17(3): 267-287. Craig, H. (1999). Authorial attribution and computational stylistics: if you tell authors apart, have you learned anything about them? Literary and Linguistic Computing 14(1): 103-113. Craig, H., and A. F. Kinney, eds. (2009). Shakespeare, Computers, and the Mystery of Authorship. Cambridge: Cambridge UP. Eder, M. (2010). Does size matter? Authorship attribution, small samples, big problem. Digital Humanities 2010: Conference Abstracts. King’s College London, pp. 132-135. Eder, M. (2011). Style-markers in authorship attribution: a cross-language study of the authorial fingerprint. Studies in Polish Linguistics 6: 101-116. Eder, M., and J. Rybicki (2011). Stylometry with R. Digital Humanities 2011: Cconference Abstracts. Stanford University, Stanford, pp. 308-311. Eder, M., and J. Rybicki (2012). Do birds of a feather really flock together, or how to choose test samples for authorship attribution. Literary and Linguistic Computing 27 (in press). Hoover, D. L. (2004). Testing Burrows’s Delta. Literary and Linguistic Computing 19(4): 453-475. Jockers, M. L., and D. M. Witten (2010). A comparative study of machine learning methods for authorship attribution. Literary and Linguistic Computing 25(2): 215-223. Koppel, M., J. Schler, and S. Argamon (2009). Computational methods in authorship attribution. Journal of the American Society for Information Science and Technology, 60(1): 9-26. Rybicki, J. (2012). The great mystery of the (almost) invisible translator: stylometry in translation. In M. Oakley and M. Ji (eds.), Quantitative Methods in Corpus-Based Translation Studies. Amsterdam: John Benjamins. Oakes, M., and A. Pichler (2012). Computational Stylometry of Wittgenstein’s Diktät für Schlick. Bergen Language and Linguistic (Bells) Series, (in press). 17 Digital Humanities 2012 NeDiMAH workshop on ontology based annotation Eide, Øyvind oyvind.eide@kcl.ac.uk King’s College, London, UK Ore, Christian-Emil c.e.s.ore@iln.uio.no University of Oslo, Norway Rahtz, Sebastian sebastian.rahtz@oucs.ox.ac.uk University of Oxford, UK The aim of this workshop is to present and discuss current ontology based annotation in text studies and to give participants an introduction and updated insight to the field. One of the expected outcomes from the workshop is to throw light on the consequences and experiences of a renewed database approach to computer assisted textual work, based on the developments over the last decade in text encoding as well as in ontological systems. 1. The NeDiMAH Network The Network for Digital Methods in the Arts and Humanities (NeDiMAH) is a research network running from 2011 to 2015, funded by the European Science Foundation, ESF. The network will examine the practice of, and evidence for, advanced ICT methods in the arts and humanities across Europe, and articulate these findings in a series of outputs and publications. To accomplish this, NeDiMAH provides a locus of networking and interdisciplinary exchange of expertise among the trans-European community of digital arts and humanities researchers, as well as those engaged with creating and curating scholarly and cultural heritage digital collections. NeDiMAH will work closely with the EC funded DARIAH and CLARIN e-research infrastructure projects, as well as other national and international initiatives. NeDiMaH includes the following Working Groups: 1. Spatial and temporal modelling, 2. Information visualisation, 3. Linked data and ontological methods, 4. Developing digital data 5. Using large scale text collections for research 6. Scholarly digital editions The WGs will examine the use of formal computationally-based methods for the capture, investigation, analysis, study, modelling, 18 presentation, dissemination, publication and evaluation of arts and humanities materials for research. To achieve these goals the WGs will organise annual workshops and whenever possible, the NeDiMAH workshops will be organised in connection with other activities and initiatives in the field. The NeDiMAH WG3, Linked data and ontological methods, proposes to organise a preconference workshop ‘Ontology based annotation’ in connection with the Digital Humanities 2011 in Hamburg. 2. Motivation and background The use of computers as tools in the study of textual material in the humanities and cultural heritage goes back to the late 1940s, with links back to similar methods used without computer assistance, such as word counting in the late nineteenth century and concordances from the fourteenth century onwards. In the sixty years of computer assisted text research, two traditions can be seen. One is that which includes corpus linguistics and the creation of digital scholarly editions, while the other strain is related to museum and archival texts. In the former tradition, texts are commonly seen as first class feasible objects of study, which can be examined by the reader using aesthetic, linguistic or similar methods. In the latter tradition, texts are seen mainly as a source for information; readings concentrate on the content of the texts, not the form of their writing. Typical examples are museum catalogues and historical source documents. These two traditions will be called form and content oriented, respectively. It must be stressed that these categories are not rigorous; they are points in a continuum. Tools commonly connected to museum and archive work, such as computer based ontologies, can be used to investigate texts of any genre, be it literary texts or historical sources. Any analysis of a text is based on a close reading of it. The same tools can also be used to study texts which are read according to both the form oriented and the content oriented way (Eide 2008; Zöllner-Weber & Pichler 2007). The novelty of the approach lies in its focus on toolsets for modelling such readings in formal systems. Not to make a clear, coherent representation of a text, but rather to highlight inconsistencies as well as consistencies, tensions as well as harmonies, in our readings of the texts. The tools used for such modelling can be created to store and show contradictions and inconsistencies, as well as providing the user with means to detect and examine such contradictions. Such tools are typically used in an iterative way in which results from one experiment may lead to adjustments in the model or in the way it is interpreted, similar to modelling as it is described Digital Humanities 2012 by McCarty (2005). The source materials for this type of research are to be found in the results of decades of digital scholarly editing. Not only in the fact that a wide variety of texts exist in digital form, but also that many of these texts have been encoded in ways which can be used as starting points for the model building. Any part of the encoding can be contested, in the modelling work as well as in the experiments performed on the model. The methods developed in this area, which the TEI guidelines are an example of, provide a theoretical basis for this approach. In the end of the 1980ies Manfred Thaller developed Kleio, a simple ontological annotation system for historical texts. Later in the 1990s hypertext, not databases, became the tool of choice for textual editions (Vanhoutte 2010: 131). The annotation system Pliny by John Bradley (2008) was design both as a practical tool for scholars abut also because Bradley was interested in how scholars work when studying a text. One of the expected outcomes from this workshop is to throw light on the consequences and experiences of a renewed database approach in computer assisted textual work, based on the development in text encoding over the last decade as well as in ontological systems. A basic assumption is that reading a text includes a process of creating a model in the mind of the reader. This modelling process of the mind works in similar ways for all texts, being it fiction or nonfictions (see Ryan 1980). Reading a novel and reading a historical source document both result in models. These models will be different, but they can all be translated into ontologies expressed in computer formats. The external model stored in the computer system will be a different model from the one stored in the mind, but it will still be a model of the text reading. By manipulating the computer based model new things can be learned about the text in question. This method represents an answer to Shillingsburg’s call for editions which are open not only for reading by the reader, but also for manipulation (Shillingsburg 2010: 181), and to Pichler’s understanding of digital tools as means to document and explicate our different understandings and interpretations of a text (Zöllner-Weber & Pichler 2007). A digital edition can be part of the text model stored in the computer system. As tools and representation shape thinking not only through the conclusions they enable but also through the metaphors they deploy (Galey 2010: 100), this model will inevitably lead to other types of question asked to the text. A hypothesis is that these new questions will lead to answers giving new insight into the texts of study. Some of these insights would not have been found using other methods. There is a movement in the humanities from seeking local knowledge about specific cases (McCarty, Willard. Humanities Computing. Basingstoke: Palgrave Macmillan, 2005) which in this respect are traditional humanities investigations into specific collections of one or a limited number of texts. The general patterns sought may rather be found on a meta-research level where one investigate into new ways in which research that has a traditional scope can be performed. 3. A description of target audience Scholars interested in online and shared annotation of texts and media based on ontologies. Practice in the field is not a requirement. Knowlegde of the concept ‘ontology’ or ‘conceptual model’ can be an advantage. The aim of this workshop is to present and discuss current ontology based annotation in text studies and to give the participant an introduction and updated insight in the field and also bringing together researchers. One of the expected outcomes from this workshop is to throw light on the consequences and experiences of a renewed database approach in computer assisted textual work, based on the developments over the last decade in text encoding as well as in ontological systems. References Bradley, J. (2008). Pliny: A model for digital support of scholarship. Journal of Digital Information 9(1). http://journals.tdl.org/jod i/article/view/209/198 . Last checked 2011-11-01 Crane, G. (2006). What Do You Do with a Million Books? D-Lib Magazine 12(3). URL: http://www .dlib.org/dlib/march06/crane/03crane.html . (checked 2011-11-01). Eide, Ø. (2008). The Exhibition Problem. A Reallife Example with a Suggested Solution. Lit Linguist Computing 23(1): 27-37. Galey, A. (2010). The Human Presence in Digital Artefacts’. In W. McCarty (ed.), Text and Genre in Reconstruction: Effects of Digitalization on Ideas, Behaviours, Products and Institutions. Cambridge: Open Book Publishers, pp. 93-117. McCarty, W. (2005). Humanities Computing. Basingstoke: Palgrave Macmillan. Moretti, F. (2005). Graphs, maps, trees: abstract models for a literary history. London: Verso. Ryan, M.-L. (1980). Fiction, non-factuals, and the principle of minimal departure. Poetics 9: 403-22. 19 Digital Humanities 2012 Shillingsburg, P. (2010). How Literary Works Exist: Implied, Represented, and Interpreted. In W. McCarty (ed.), Text and Genre in Reconstruction: Effects of Digitalization on Ideas, Behaviours, Products and Institutions. Cambridge: Open Book Publishers, pp. 165-82. Kleio-system, http://www.hki.uni-koeln. de/kleio/old.website/welcome.html , checked 2011-11-01. Zöllner-Weber, A., and A. Pichler (2007). Utilizing OWL for Wittgenstein’s Tractatus. In H. Hrachovec, A. Pichler and J. Wang (eds.), Philosophie der Informationsgesellschaft / Philosophy of the Information Society. Contributions of the Austrian Ludwig Wittgenstein Society. Kirchberg am Wechsel: ALWS, pp. 248-250. Vanhoutte, E. (2010) Defining Electronic Editions: A Historical and Functional Perspective. In W. McCarty (ed.), Text and Genre in Reconstruction: Effects of Digitalization on Ideas, Behaviours, Products and Institutions. Cambridge: Open Book Publishers, pp. 119-44. Service-oriented Architectures (SOAs) for the Humanities: Solutions and Impacts Hinrichs, Erhard erhard.hinrichs@uni-tuebingen.de Eberhard Karls University Tübingen, Germany Neuroth, Heike neuroth@sub.uni-goettingen.de Unversity of Göttingen, Germany Wittenburg, Peter Peter.Wittenburg@mpi.nl Max-Planck Institute for Psycholinguistics, Nijmegen, The Netherlands Large research infrastructure projects in the Humanities and Social Sciences such as Bamboo ( http://www.projectbamboo.org/ ), CLARIN ( http://www.clarin.eu ), DARIAH ( http://ww w.dariah.eu/ ), eAqua ( http://www.eaqua.net /index.php ), Metanet ( http://www.meta-net. eu ), and Panacea ( http://www.panacea-lr.eu ) increasingly offer their resources and tools as web applications or web services via the internet. Examples of this kind include: - Bamboo Technology Project ( http://www.proje ctbamboo.org/infrastructure/ ) - eAqua Portal ( http://www.eaqua.net/portal/ ) - Language Technology World Portal of MetaNet ( http://www.lt-world.org ) - PANACEA platform ( http://www.panacea-lr.e u/en/project/the-platform ) - TextGrid – eScience methods in Arts and Humanities ( http://www.textgrid.de/en.html ) - VLO – Virtual Language Observatory ( http://ww w.clarin.eu/vlw/observatory.php ) - WebLicht – Web Based Linguistic Chaining Tool ( http://https://weblicht.sfs.uni-tuebing en.de/ ) Such web-based access has a number of crucial advantages over traditional means of service provision via downloadable resources or desktop applications. Since web applications can be invoked from any browser, downloading, installation, and configuration of individual tools on the user’s local computer is avoided. Moreover, users of web applications will be ensured to always use the latest 20 Digital Humanities 2012 version of the software, since it will be updated on the host computer. It is exactly this ease of use that is of crucial advantage for eHumanities researchers, since configuration and updates of software often require computational skills that can ordinarily not be expected from humanities researchers. The paradigm of service-oriented architectures (SOA) is often used as a possible architecture for bundling web applications and web services. While the use of web services and SOAs is quickly gaining in popularity, there are still a number of open technology and research questions which await more principal answers: - Currently, web services and SOAs in the Digital Humanities often concentrate on written material. Will the current technology scale up to accommodate multimodal data like speech or video data as well? - Currently, web services and SOAs typically process data in a synchronous fashion. How can very large data sets such as multimodal resources be processed in an asynchronous fashion? - Currently, web services and SOAs tend to deliver analysis or search results in a non-interactive fashion, allowing user input only to initiate processing and to react to the processing result. How can the current applications be extended so as to allow dynamic user interaction during processing? Such considerations are of crucial importance for the eHumanities in order to support, inter alia, interactive annotation of text corpora, a desideratum for all text-oriented disciplines such as literary studies, history, and linguistics. 1. Invited Speaker - Eric Nyburg (Carnegie Mellon University, Pittsburgh): A Service-Oriented Architecture for Rapid Development of Language Applications 2. Accepted Papers - Tara L. Andrews, Moritz Wissenbach, Joris J. Van Zundert and Gregor Middell – Embracing research: consolidating innovation through sustainable development of infrastructure - Dorothee Beermann, Pavel Mihaylov and Han Sloetjes – Linking annotations Steps towards tool-chaining in Language Documentation - Andre Blessing, Jens Stegmann and Jonas Kuhn – SOA meets Relation Extraction: Less may be more in Interaction - Michael Scott Cuthbert, Beth Hadley, Lars Johnson and Christopher Reyes – Interoperable Digital Musicology Research via music21 Web Applications - Emanuel Dima, Erhard Hinrichs, Marie Hinrichs, Alexander Kislev, Thorsten Trippel and Thomas Zastrow – Integration of WebLicht into the CLARIN Infrastructure - Rüdiger Gleim, Alexander Mehler and Alexandra Ernst – SOA implementation of the eHumanities Desktop - Thomas Kisler, Florian Schiel and Han Sloetjes – Signal processing via web services: the use case WebMAUS - Will web-based access over time completely replace stand-alone (downloadable) desktop or CLI applications, or will there always be a need for both: local and web-based applications? - Chiara Latronico, Nuno Freire, Shirley Agudo and Andreas Juffinger – The European Library: A Data Service Endpoint for the Bibliographic Universe of Europe - What is the impact of emerging technologies such as web sockets or cloud computing on existing web service environments? - Przemyslaw Lenkiewicz, Dieter van Uytvanck, Sebastian Drude and Peter Wittenburg – Advanced Web-services for Automated Annotation of Audio and Video Recordings - Currently, SOAs tend to be application or domain specific, catering to the data formats and services most relevant to particular user communities. What are the possibilities for generalizing such current practice and developing generic execution models and standards? - How to generate knowledge from data, e.g. developing new digital methods and concepts such as new and adapted data structures, hierarchical data storage, data modeling, sorting and search algorithms, selection of data via metadata, and visualization tools? - Scott Martens – TüNDRA: TIGERSearch-style treebank querying as an XQuery-based web service - Christoph Plutte – How to Turn a Desktop Application into a Web-Interface? – ArchivEditor as an Example of Eclipse RCP and RAP Single Sourcing - Thomas Zastrow and Emanuel Dima – Workflow Engines in Digital Humanities 21 Digital Humanities 2012 Here and There, Then and Now – Modelling Space and Time in the Humanities Isaksen, Leif leifuss@googlemail.com University of Southampton, UK Day, Shawn day.shawn@gmail.com Digital Humanities Observatory, Ireland Andresen, Jens jens.andresen@hum.au.dk University of Aarhus, Denmark Hyvönen, Eero eero.hyvonen@tkk.fi Aalto University, Finland Mäkelä, Eetu eetu.makela@aalto.fii Aalto University, Finland Spatio-temporal concepts are so ubiquitous that it is easy for us to forget that they are essential to everything we do. All expressions of Human culture are related to the dimensions of space and time in the manner of their production and consumption, the nature of their medium and the way in which they express these concepts themselves. This workshop seeks to identify innovative practices among the Digital Humanities community that explore, critique and re-present the spatial and temporal aspects of culture. Although space and time are closely related, there are significant differences between them which may be exploited when theorizing and researching the Humanities. Among these are the different natures of their dimensionality (three dimensions vs. one), the seemingly static nature of space but enforced ‘flow’ of time, and the different methods we use to make the communicative leap across spatial and temporal distance. Every medium, whether textual, tactile, illustrative or audible (or some combination of them), exploits space and time differently in order to convey its message. The changes required to express the same concepts in different media (between written and performed music, for example), are often driven by different spatiotemporal requirements. Last of all, the impossibility (and perhaps undesirability) of fully representing a four-dimensional reality (whether real or fictional) mean that authors and artists must decide how to collapse this reality into the spatio-temporal 22 limitations of a chosen medium. The nature of those choices can be as interesting as the expression itself. This workshop allows those working with digital tools and techniques that manage, analyse and exploit spatial and temporal concepts in the Humanities to present a position paper for the purposes of wider discussion and debate. The position papers will discuss generalized themes related to use of spatio-temporal methods in the Digital Humanities with specific reference to one or more concrete applications or examples. Accepted papers have been divided into three themed sessions: Tools, Methods and Theory. This workshop is part of the ESFfunded NEDIMAH Network and organised by its Working Group on Space and Time. The group will also present its findings from the First NeDiMAH Workshop on Space and Time. 1. About NeDiMAH NeDiMAH is examining the practice of, and evidence for, advanced ICT methods in the Arts and Humanities across Europe, and will articulate these findings in a series of outputs and publications. To accomplish this, NeDiMAH assists in networking initiatives and the interdisciplinary exchange of expertise among the trans-European community of Digital Arts and Humanities researchers, as well as those engaged with creating and curating scholarly and cultural heritage digital collections. NeDiMAH maximises the value of national and international e-research infrastructure initiatives by helping Arts and Humanities researchers to develop, refine and share research methods that allow them to create and make best use of digital methods and collections. Better contextualization of ICT Methods also builds human capacity, and is of particular benefit for early stage researchers. For further information see http:// www.nedimah.eu. The workshop will also be aligned and coordinated with ongoing work at the DARIAH Project (cf. http://www.dariah.eu ). DARIAH is a large-scale FP7-project that aims to prepare the building of digital research infrastructure for European Arts and Humanities researchers and content/data curators. 2. Papers 2.1. Tools Shoichiro Hara & Tatsuki Skino – Spatiotemporal Tools for Humanities David McClure – The Canonical vs. The Contextual: Neatline’s Approach to Connecting Archives with Spatio-Temporal Interfaces Digital Humanities 2012 Roxana Kath – eAQUA/Mental Maps: Exploring Concept Change in Time and Space - Shawn Day, Digital Humanities Observatory Kate Byrne – The Geographic Annotation Platform: A New Tool for Linking and Visualizing Places References in the Humanities - Leif Isaksen, University of Southampton - Eero Hyvönen, Aalto University - Eetu Mäkelä, Aalto University 2.2. Methods William A. Kretzschmar, Jr. & C. Thomas Bailey – Computer Simulation of Speech in Cultural Interaction as a Complex System Karl Grossner –Event Objects for Placial History Charles Travis – From the Ruins of Time and Space: The Psychogeographical GIS of Postcolonial Dublin in Flann O’Brien’s At Swim Two Birds (1939) 2.3. Theory Maria Bostenaru Dan – 3D conceptual representation of the (mythical) space and time of the past in artistic scenographical and garden installations Eduard Arriaga-Arango – Multiple temporalities at crossroads: Artistic Representations of Afro in Latin America and the Hispanic World in the current stage of Globalization (Mapping Cultural emergences through Networks) Kyriaki Papageorgiou – Time, Space, Cyberspace and Beyond, On Research Methods, Delicate Empiricism, Labyrinths and Egypt Patricia Murrieta-Flores – Finding the way without maps: Using spatial technologies to explore theories of terrestrial navigation and cognitive mapping in prehistoric societies 2.4. Discussion Objectives - Bring together the experiences of researchers developing or using spatial or temporal methods in the Digital Humanities. - Evaluate the impact of such methods in terms of addressing traditional Humanities questions and posing new ones. - Explore non-investigative benefits, such as the use of spatial and temporal tools and visualization as means for contextualization. - Identify where tools developed for spatial analysis may be applicable to temporal analysis (and vice versa). 2.5. Program Committee - Jens Andresen, University of Aarhus 23 Digital Humanities 2012 Crowdsourcing meaning: a hands-on introduction to CLÉA, the Collaborative Literature Éxploration and Annotation Environment Petris, Marco marco.petris@uni-hamburg.de University of Hamburg, Germany Gius, Evelyn evelyn.gius@uni-hamburg.de University of Hamburg, Germany Schüch, Lena lena.schuech@googlemail.com University of Hamburg, Germany Meister, Jan Christoph jan-c-meister@uni-hamburg.de University of Hamburg, Germany 1. Context and description Humanities researchers in the field of literary studies access and read literary texts in digital format via the web in increasing numbers – but, apart from search and find, the cognitive processing of a text still takes place outside the digital realm. The interest essentially motivating human encounters with literature hardly seems to benefit from the new paradigm: hermeneutic, i.e. ‘meaning’ oriented highorder interpretation that transcends a mere decoding of information. The main reason for this might be that hermeneutic activity is not deterministic, but explorative: in the scholarly interpretation of literature we are not looking for the right answer, but for new, plausible and relevant answers. Thus highorder hermeneutic interpretation requires more than the automated string- or word-level pattern analysis of the source object provided by most digital text analysis applications so far, namely the ability to add semantic markup and to analyse both the objectdata and the metadata in combination. This requires markup that goes beyond the distinction between procedural vs. descriptive of Coombs et al. (1987) and even beyond the subdivision of descriptive markup into genuinely descriptive vs. performative introduced by Renear (2004). By semantic markup we rather mean a true hermeneutic markup as defined by Pietz (2010: paragraph 1): By ‘hermeneutic’ markup I mean markup that is deliberately interpretive. It is not limited to 24 describing aspects or features of a text that can be formally defined and objectively verified. Instead, it is devoted to recording a scholar’s or analyst’s observations and conjectures in an open-ended way. As markup, it is capable of automated and semi-automated processing, so that it can be processed at scale and transformed into different representations. By means of a markup regimen perhaps peculiar to itself, a text will be exposed to further processing such as text analysis, visualization or rendition. Texts subjected to consistent interpretive methodologies, or different interpretive methodologies applied to the same text, can be compared. Rather than being devoted primarily to supporting data interchange and reuse – although these benefits would not be excluded – hermeneutic markup is focused on the presentation and explication of the interpretation it expresses. CLÉA (Collaborative Literature Éxploration and Annotation) was developed to support McGann’s (2004) open-ended, discontinuous, and nonhierarchical model of text-processing and allows the user to express many different readings directly in markup. The web based system not only enables collaborative research but it is based on an approach to markup that transcends the limitations of lowlevel text description, too.1 CLÉA supports high-level semantic annotation through TEI compliant, nondeterministic stand off markup and acknowledges the standard practice in literary studies, i.e. a constant revision of interpretation (including one’s own) that does not necessarily amount to falsification. CLÉA builds on our open source desktop application CATMA2. In our workshop, we will address some key challenges of developing and applying CLÉA: - We will discuss both the prerequisites mentioned above and their role in the development of CLÉA, - present interdisciplinary use cases where a complex tagset that operationalizes literary theory (namely narratology) is applied, - give a practical introduction in the use of CLÉA, and - provide a hands-on session where participants can annotate their own texts. Finally, we would like to engage participants in a design critique of CLÉA and a general discussion about requirements for text analysis tools in their fields of interest. References Coombs, J. H., A. H. Renear, and St. J. DeRose (1987). Markup Systems and the Future of Scholarly Digital Humanities 2012 Text Processing. Communications of the ACM (ACM) 30(11): 933–947. Available online at http://xml.co verpages.org/coombs.html (last seen 2011-10-31). McGann, J. (2004). Marking Texts of Many Dimensions. In S. Schreibman, R. Siemans, and J. Unsworth (eds.), A Companion to Digital Humanities, 2004. Oxford: Blackwell, pp. 218-239. Online at http://www.digitalhumanities.org/compani on/view?docId=blackwell/9781405103213/9781 405103213.xml&chunk.id=ss1-3-4&toc.depth=1 &toc.id=ss1-3-4&brand=9781405103213_brand= default (last seen 2011-10-31). Piez, W. (2010). Towards Hermeneutic Markup: An architectural outline. King’s College, DH 2010, London. Available from: http://piez.org/w endell/papers/dh2010/index.html (last seen 2011-10-31). Renear, A. H. (2004). Text Encoding. In S. Schreibman, R. Siemans, and J. Unsworth (eds.), A Companion to Digital Humanities, 2004. Oxford: Blackwell, pp. 218–239. Online at http://www.digitalhumanities.org/compani on/view?docId=blackwell/9781405103213/9781 405103213.xml&chunk.id=ss1-3-5&toc.depth=1 &toc.id=ss1-3-5&brand=default (last seen 2011-10-31). Notes 1. We define this distinction as follows: description cannot tolerate ambiguity, whereas an interpretation is an interpretation if and only if at least one alternative to it exists. Note that alternative interpretations are not subject to formal restrictions of binary logic: they can affirm, complement or contradict one another. In short, interpretations are of a probabilistic nature and highly context dependent. 2. CLÉA is funded by the European Digital Humanities Award 2010, see http://www.catma.de Learning to play like a programmer: web mashups and scripting for beginners Ridge, Mia m.ridge@open.ac.uk Open University, UK Have you ever wanted to be able to express your ideas for digital humanities data-based projects more clearly, or wanted to know more about hack days and coding but been too afraid to ask? In this hands-on tutorial led by an experienced web programmer, attendees will learn how to use online tools to create visualisations to explore humanities data sets while learning how computer scripts interact with data in digital applications. Attendees will learn the basic principles of programming by playing with small snippets of code in a fun and supportive environment. The instructor will use accessible analogies to help participants understand and remember technical concepts. Working in pairs, participants will undertake short exercises and put into practice the scripting concepts they are learning about. The tutorial structure encourages attendees to reflect on their experiences and consolidate what they have learned from the exercises with the goal of providing deeper insight into computational thinking. The tutorial aims to help humanists without a technical background understand more about the creation and delivery of digital humanities data resources. In doing so, this tutorial is designed to support greater diversity in the ‘digital’ part of the digital humanities community. Target audience: This tutorial is aimed at people who want to learn enough to get started playing with simple code to manipulate data, or gain an insight into how programming works. No technical knowledge is assumed. Attendees are asked to bring their own laptops or net books. 1. Tutorial structure The tutorial will include: - what a humanities data set is and how to access one - how web scripting languages JavaScript as an example) work (using - how to sketch out your ideas in pseudo-code 25 Digital Humanities 2012 - the value of visualisation tools in understanding the shape of a data set - prepared exercises: ‘hello world’, using script libraries for mash-ups, creating your first mashup using a live cultural dataset (e.g. a timeline or map), - how to find further resources and keep learning Introduction to Distant Reading Techniques with Voyant Tools, Multilingual Edition Sinclair, Stéfan sgsinclair@gmail.com McGill University, Canada Rockwell, Geoffrey grockwel@ualberta.ca University of Alberta, Canada You have a collection of digital texts, now what? This workshop provides a gentle introduction to text analysis in the digital humanities using Voyant Tools, a collection of free web-based tools that can handle larger collections of texts, be they digitized novels, online news articles, twitter feeds, or other textual content. This workshop will be a hands-on, practical guide with lots of time to ask questions, so participants are encouraged to bring their own texts. In the workshop we will cover the following: 1. A brief introduction to text analysis in the humanities; 2. Preliminary exploration techniques using Voyant; 3. Basic issues in choosing, compiling, and preparing a text corpus; 4. Text mining to identify themes in large corpora; 5. Ludic tools and speculative representations of texts; and 6. Integrating tool results into digital scholarship. This year’s workshop will pay special attention to certain multilingual issues in text analysis, such as character encoding, word segmentation, and available linguistic functionality for different languages. The instructors will present in English, but can also present or answer questions in French and Italian. This is intended as an introduction to text analysis and visualization. We hope for an audience with a range of interests and relevant competencies. Participants are expected to bring their own laptop and are encouraged to bring their own texts. 26 Digital Humanities 2012 Towards a reference curriculum for the Digital Humanities Thaller, Manfred manfred.thaller@uni-koeln.de Historisch-Kulturwissenschaftliche Informationsverarbeitung, Universität zu Köln, Germany In late 2009 the Cologne Centre for eHumanities started an initiative to improve the cooperation between (mainly) German universities actively offering degree courses in Digital Humanities. Within three meetings the concepts of the participating universities have been compared and a discussion on possibilities for closer cooperation has been started. As a result: of the degree courses. This process has so far been implemented mainly in Germany for pragmatic reasons: To make students aware of the existence of the field of Digital Humanities as a regular field of academic studies on the level of practical PR activities, you have to address a community which finds all of the participating universities as similarly logical choices for a place to study. It is also much easier to start the discussion of a core curriculum if during the first rounds all participants of the discussion operate under the same administrative rules for the establishment of degree courses. We will organize a workshop attached to the Digital Humanities 2012 at Hamburg in order to extend this discussion to representatives of other university landscapes. On the most fundamental level we would like to: - Present the results of the German initiative. - Invite presentations of the Digital Humanities degree courses existing within other countries. A ‘catalogue’ ( http://www.cceh.uni-koeln.de/ in the context Dokumente/BroschuereWeb.pdf of http://www.cceh.uni-koeln.de/dh-degrees2011 – German only, so far) to document the degree programs that have actively contributed to the common work has been prepared. It includes ten BA programs, twelve MA / MSc programs, two certificates of DH based training as professional add-on qualification on top of regular degrees plus one embedded degree within the regular teaching of a Humanities study program. The universities of Bamberg, Bielefeld, Darmstadt, Erlangen, Gießen, Göttingen, Graz, Groningen, Hamburg, Köln, Lüneburg, Saarbrücken and Würzburg have contributed to this catalogue. What started as an initiative of Cologne has in the meantime become an integral part of DARIAH-DE, a general framework of projects for the establishment of an infrastructure for the Digital Humanities within Germany. On the level of practical co-operation we intend to discuss: Parallel to that initiative, a discussion has been started which shall lead towards the identification of common elements of Digital Humanities curricula, making it easier for students to move between individual degrees and providing the ground work for the recognition of Digital Humanities as a general concept by the agencies responsible for the accreditation of university degrees within Germany. The German situation is probably different from that in many other countries, as two BA and one MSc program from the list above are offered not by Arts faculties, but by Computer Science faculties or institutes. - To initialize a discussion about possible terms of reference for Digital Humanities curricula, which transcend individual academic systems. Both activities are results of an underlying process of ‘comparing the notes’ between people responsible directly for conceptualization and implementation - The creation and maintenance of a database / catalogue of European degree courses in Digital Humanities. - The possibility for improved exchange activities within existing funding schemes, within Europe e.g. ERASMUS, between different degree courses. This will require, among other things, the identification of elements in different curricula which could substitute courses required at a home institution. - In a very exploratory way, the possibilities for the facilitation of genuine ‘European Master’ degrees in Digital Humanities, in the sense used by the European Commission. On the conceptual level we hope: - To arrive at a working definition for the field covered by such curricula. We are covering, e.g., degree courses which try to combine archaeology with computer science elements as well as degree courses, which are closely related to computational linguistics. As these communities are differently related within different university landscapes, a common conceptual reference framework should be used. As this is an initiative which emerges from ongoing work and is directed mainly at institutions and persons which have experience with the 27 Digital Humanities 2012 implementation of Digital Humanities degrees, we do not intend to rely primarily on a call for papers. We will not rely on a call for papers primarily. During April 2012 a set of documents will be sent to all institutions in Europe, and many beyond, which are known to organize a degree course in the Digital Humanities or a closely connected field, with an invitation to join the workshop. We hope for results leading to direct and practical cooperation within existing frameworks. So the primary target group of this workshop are the European academic institutions offering or planning degree courses in the Digital Humanities. This said, of course we also invite the wider community to join the conceptual discussions. Free your metadata: a practical approach towards metadata cleaning and vocabulary reconciliation van Hooland, Seth svhoolan@ulb.ac.be Université Libre de Bruxelles, Belgium Verborgh, Ruben ruben.verborgh@ugent.be Ghent University, Belgium De Wilde, Max Participation of institutions we are not aware of, particularly from those which are currently only in the planning stages of Digital Humanities’ degree courses, is very much hoped for. Please direct enquiries to manfred.thaller@uni-koeln.de to receive additional material from the preparatory round of discussions and supporting material before the start of the workshop. madewild@ulb.ac.be Université Libre de Bruxelles, Belgium The workshop will run for a full day. The following program is tentative, to be adapted to accommodate for proposals and explicit wishes from participants during the preparatory stage. The early-to-mid 2000s economic downturn in the US and Europe forced Digital Humanities projects to adopt a more pragmatic stance towards metadata creation and to deliver short-term results towards grant providers. It is precisely in this context that the concept of Linked and Open Data (LOD) has gained momentum. In this tutorial, we want to focus on metadata cleaning and reconciliation, two elementary steps to bring cultural heritage collections into the Linked Data cloud. After an initial cleaning process, involving for example the detection of duplicates and the unifying of encoding formats, metadata are reconciled by mapping a domain specific and/or local vocabulary to another (more commonly used) vocabulary that is already a part of the Semantic Web. We believe that the integration of heterogeneous collections can be managed by using subject vocabularies for cross linking between collections, since major classifications and thesauri (e.g. LCSH, DDC, RAMEAU, etc.) have been made available following Linked Data Principles. 09:00 – 10:30 Setting the agenda – reports on existing degree courses. 11:00 – 12:30 What do we have in common? I: Parallels and differences in the scope of individual degree courses. 14:00 – 15:30 What do we have in common? II: Parallels and differences in the concept of ‘Digital Humanities’ underlying the individual courses. 16:00 – 17:30 Are there synergies? Creating a work program to facilitate discussions of exchange facilities and curricular coordination across national boundaries. 1. Tutorial content and its relevance to the DH community Re-using these established terms for indexing cultural heritage resources represents a big potential of Linked Data for Digital Humanities projects, but there is a common belief that the application of LOD publishing still requires expert knowledge of Semantic Web technologies. This tutorial will therefore demonstrate how Semantic Web novices can start experimenting on their own with nonexpert software such as Google Refine. Participants of the tutorial can bring an export (or a subset) of metadata from their own projects or organizations. All necessary operations to reconcile metadata with 28 Digital Humanities 2012 controlled vocabularies which are already a part of the Linked Data cloud will be presented in detail, after which participants will be given time to perform these actions on their own metadata, under assistance of the tutorial organizers. Previous tutorials have mainly relied on the use of the Library of Congres Subject Headings (LCSH), but for the DH2012 conference we will test out beforehand SPARQL endpoints of controlled vocabularies in German (available for example on http://wiss-ki. eu/authorities/gnd/ ), allowing local participants to experiment with metadata in German. This tutorial proposal is a part of the Free your Metadata research project.1 The website offers a variety of video’s, screencasts and documentation on how to use Google Refine to clean and reconcile metadata with controlled vocabularies already connected to the Linked Data cloud. The website also offers an overview of previous presentations. Google Refine currently offers one of the best possible solutions on the market to clean and reconcile metadata. The open-source character of the software makes it also an excellent choice for training and educational purposes. Both researchers and practitioners from the Digital Humanities are within cultural heritage projects inevitably confronted with issues of bad quality metadata and the interconnecting with external metadata and controlled vocabularies. This tutorial will therefore provide both practical hands-on information and an opportunity to reflect on the complex theme of metadata quality. 2. Outline of the tutorial After a break, the participants will have the opportunity to work individually or in group on their own metadata and to experiment with the different operations showcased during the first half of the tutorial. The tutorial organizers will guide and assist the different groups during this process. Participants will be given 60 minutes for their own experimenting and during a 45 minutes wrap-up, participants will be asked to share their the outcomes of the experimentation process. This tutorial will also explicitly try to bring together Digital Humanists will similar interests in Linked Data and in this way stimulate future collaborations between institutions and projects. 3. Target audience The target audience consists both of practitioners and researchers from the Digital Humanities field who focus on the management of cultural heritage resources. 4. Special requests/equipment needs Participants should preferably bring their own laptop and, if possible, have installed Google Refine. Intermediate knowledge of metadata creation and management is required. Notes 1. See the projects website on http://freeyourmetadata.o rg . During this half day tutorial, the organizers will present each essential step of the metadata cleaning and reconciliation process, before focusing on a hands-on session during which each participant will be asked to work on his or her own metadata set (but default metadata sets will also be provided). The overview of the different features will approximately take 60 minutes: - Introduction: Outline regarding the importance of metadata quality and the concrete possibilities offered by Linked Data for cultural heritage collections - Metadata cleaning: Insight into the features of Google Refine and how to apply filters and facets to tackle metadata quality issues. - Metadata reconciliation: Use of the RDF extension which can be in- stalled to extend Google Refine’s reconciliation capabilities. Overview of SPARQL endpoints with interesting vocabularies available for Digital Humanists, in different languages. 29 Panels Digital Humanities 2012 Text Analysis Meets Text Encoding Bauman, Syd Syd_Bauman@Brown.edu Brown University, USA Hoover, David david.hoover@nyu.edu New York University, USA van Dalen-Oskam, Karina karina.van.dalen@huygens.knaw.nl Huygens Institute, The Netherlands Piez, Wendell wapiez@mulberrytech.com Mulberry Technologies, Inc., USA 1. Aim and Organization The main aim of this panel discussion is to bring together text encoding specialists and text analysis researchers. Recent DH conferences have comprised, in addition to other activities, two distinct subconferences – one focusing on text encoding in general and TEI in particular, and the other on text analysis, authorship attribution, and stylistics. The separation between the two is so extreme that their participants often meet only at breaks and social events. This is reflected in the way text encoding specialists and text analysis scholars do their work as well: they hardly ever work together on the same projects. Because of this lack of connection, some of the long-promised benefits of markup for analysis remain unrealized. This panel takes a step toward bridging the gap between markup and analysis. We focus on both the causes for the gap and possible solutions. What could and should markup do that it doesn’t currently do? Why do analysts rarely work with the huge number of texts already encoded? How can text encoders and those who process encoded texts make their work more useful to text analysts, and how can text analysis specialists help encoders make their texts more useful? What opportunities currently exist for collaboration and cooperation between encoders and analysts, and how can more productive opportunities be created? 2. Panel Topic The reasons for the present gap between markup and analysis are partly technical and partly nontechnical, and arise from the disparate aims and methods of the two camps. While markup systems have generally been designed to meet the needs of (scholarly) publishing, markup adherents have often claimed that their markup is also useful for analytic purposes. However, the very concept of ‘markup’ itself is different for the two constituencies. XML, in particular, isn’t ‘markup’ in the full sense, as used by specialists in text processing. Rather, it is a data structuring methodology that imposes a single unitary hierarchy upon the text. Consequently, it is a poor instrument for the complete interpretive loop or spiral, where we start with text (with or without markup), perform analysis, use markup to record or ‘inscribe’ our findings into the text, and then return to analysis at a higher level. This is largely because the inscription step is usually inhibited by any prior (XML) markup. Consider two flowcharts of document processing workflows at http://piez.or g/wendell/papers/dh2010/743_Fig2a.jpg (what XML provides) and http://piez.org/wendell/ papers/dh2010/743_Fig2b.jpg (what we need). (As noted on the page at http://piez.org /wendell/papers/dh2010 , these images were presented as part of a paper delivered at Digital Humanities 2010 in London Piez 2010].) The difference between these is essentially that in the current (XML-based) architecture, extending and amending our document schemas and processing require re-engineering the system itself; the stable system (which is designed to support publishing not research) does not naturally sustain that activity. A system that supported markup in the sense that text analysis requires – which would support, among other possibilities, multiple concurrent overlapping hierarchies (including rhetorical, prosodic, narrative and other organizations of texts) and arbitrary overlap (including overlap between similar types of elements or ranges) – would also support incremental development of processing to take advantage of any and all markup that researchers see fit to introduce. Part of the solution to this problem is in the emergence of standard methodologies for encoding annotations above or alongside one or more ‘base’ layers of markup, perhaps using standoff markup or alternatives to XML. The details are less important than the capabilities inherent in a data model not limited to unitary trees (see Piez 2010 for discussion; for a more wide-ranging critique of markup, see Schmidt 2010). Over the long term, given suitable utilities and interfaces, textual analysts may be able to use such systems productively; in the medium term this is more doubtful. This leads to the non-technical part of the problem: largely because XML is not very well suited to their requirements, text analysis tools typically cannot handle arbitrary encoding in XML along with the texts themselves, while at the same time there is 33 Digital Humanities 2012 not yet a viable alternative encoding technology, specified as a standard and suitable for supporting interchange. And so an analyst must begin by processing the encoded text into a usable form. While the markup specialist may be able (perhaps easily) to perform such a transformation on his or her XML texts using XSLT, this is usually more difficult for the text analyst, for several reasons. Moreover, we submit that many or most of these difficulties are not simply due to known limitations of current text-encoding technologies as described above, but will also persist in new, more capable environments. Markup processing technologies such as XSLT are rarely among the text analyst’s armamentarium, and the benefits of XSLT (and we imagine this would be the case with any successor as well) are often not clear enough to the text analyst to justify its significant learning curve. XSLT need not be difficult, but it can be challenging to assimilate – especially because it seems superficially similar to text analysis. Those who do learn XSLT will find some of the tasks most useful to them relatively easy (e.g., filtering markup and splitting/aggregating texts). XSLT 2.0 also includes regular expressions, grouping, and stylesheet functions, and handles plain text input gracefully, making it a much more hospitable environment than XSLT 1.0 for text analysis. Yet more often than not, these features serve only to make XSLT tantalizing as well as frustrating to those for whom markup processing cannot be a core competency. Moreover, intimate familiarity with the encoding system is typically necessary to process it correctly, yet the text analyst is frequently working with texts that he or she was not a party to creating. Many texts are encoded according to the TEI Guidelines, but analysts are often not experts in TEI in general, let alone expert at a particular application of TEI by a particular project. But such expertise is often required, as even ‘simple’ word extraction from an arbitrary TEI text can be problematic. Certainly, plenty of TEI encodings make the task non-trivial. Consider in particular the task of ingesting an arbitrary unextended TEI P5 text and emitting a word list, assuming that we can ignore one important detail: word order. What assumptions are necessary? What information will the data provider need to provide to the transformation process? Below are some preliminary considerations toward a tool that would present the user with a form to fill out and return an XSLT stylesheet for extracting words (note that many of these considerations would benefit from collaboration between the text encoder and the text analyst): 34 - What metadata can be ignored? E.g., if a default rendition of display:none applies to an element, are its contents words? - Which element boundaries always/never/ sometimes imply word breaks? When ‘sometimes’, how can we tell? - Which hierarchical content structures (if any) should be ignored? (E.g,. colophons, forwards, prefaces) Which trappings? (E.g., , ) - Which elements pointed to by an element (or child of a ) get ignored, and which get included? The complexity of this analysis compounds the difficulty already described. And it must be performed anew for every encoded text or set of texts the analyst wishes to consider. Furthermore, the publishing focus noted above means that text-encoding projects rarely encode the elements most useful for analysis, and you can only get out of markup what the encoder puts in. An analyst interested in how names are used in literary texts, for example, will find that even projects that encode proper names often encode only the names of authors and publishers (not very interesting for text analysis), or only personal names (the analyst may also be interested in place names and names of other things). Likewise, adding the desired markup to encoded texts requires that the analyst conform to (or perhaps modify) the existing encoding scheme, imposing another learning curve above that for the XSLT needed to extract the names once they have been encoded. Consequently, scholars interested in text analysis typically find it is more efficient to use plain texts without any markup, and then apply an ad hoc system of manual tagging, or named entity recognition (NER) tools in combination with manual correction of tagging. The analyst who wants to extract the speeches of characters in alreadyencoded plays will sometimes discover that the speeches are encoded in such a way that automatic extraction is quite problematic (e.g., at Documenting the American South). It is a rare literary encoding project that provides texts with the dialog encoded for speaker so that the speech of each character can be extracted (even apart, again, from the overlap problem) – a kind of analysis made popular by Burrows’s pioneering work (Burrows 1987), but very labor-intensive. Finally, even if texts with the desired encoding are available and the analyst is willing to learn the basics of XSLT, typically the XSLT has to be rewritten or tweaked for each new collection of texts that is examined because of differences in encoding schemes. And it is just those ambitious encoding projects that are likely to encode more elements of Digital Humanities 2012 interest to the text analyst that are also more likely to have complex, individualized, and difficult-toprocess encoding (e.g., the Brown University Women Writers Project). From the analyst’s point of view, the process of using existing encoding may be more time-consuming and frustrating than doing the work manually. Surely this state of affairs is undesirable. Designing Interactive Reading Environments for the Online Scholarly Edition Yet there is another way forward besides text analysts learning new skills or text encoders offering their own text-extraction tools. While the problems described here add up to a formidable challenge, the very fact that we can enumerate them suggests that we are not entirely helpless. There is much work that can be done both to refine our understanding, and to develop tools and methodologies that will help to bridge this divide. We suggest beginning with closer collaboration between the two camps. If each supports, works with, and learns from the other, both sides will benefit, and we will have a better foundation of understanding on which to build the next generation of technologies – technologies that will be valuable for both camps. Blandford, Ann References carlosf@ualberta.ca University of Alberta, Canada Willa Cather Archive http://cather.unl.edu/ The Brown University Women Project. http://www.wwp.brown.edu/ Writers a.blandford@ucl.ac.uk University College London, UK Brown, Susan sbrown@uoguelph.ca University of Guelph, Canada Dobson, Teresa teresa.dobson@ubc.ca University of British Columbia, Canada Faisal, Sarah s.faisal@cs.ucl.ac.uk University College London, UK Fiorentino, Carlos Frizzera, Luciano dosreisf@ualberta.ca University of Alberta, Canada Documenting the American South. http://do Giacometti, Alejandro Piez, W. (2010). Towards hermeneutic markup: an architectural outline. DH2010, King’s College London, July 9. Heller, Brooke csouth.unc.edu/ Burrows, J. (1987). Computation into Criticism. Oxford: Clarendon P. Schmidt, D. (2010). The inadequacy of embedded markup for cultural heritage texts. LLC 25: 337-356. Burnard, L., and S. Bauman (eds.). TEI P5: Guidelines for Electronic Text Encoding and Interchange. Version 1.7.0. 2010-07-06. http://www.tei-c.org/release/doc /tei-p5-doc/en/html/index.html alejandro.giacometti.09@ucl.ac.uk University College London, UK brooke.heller@gmail.com University of British Columbia, Canada Ilovan, Mihaela ilovan@ualberta.ca University of Alberta, Canada Michura, Piotr zemichur@cyf-kr.edu.pl Academy of Fine Arts in Krakow, Poland Nelson, Brent brent.nelson@usask.ca University of Saskatchewan, Canada Radzikowska, Milena mradzikowska@gmail.com Mount Royal University, Canada Rockwell, Geoffrey grockwel@ualberta.ca University of Alberta, Canada Ruecker, Stan sruecker@id.iit.edu IIT Institute of Design, USA 35 Digital Humanities 2012 Sinclair, Stéfan PAPER 2 Sondheim, Daniel Digital Scholarly Editions: A Functional Perspective stefan.sinclair@mcgill.ca McMaster University, Canada sondheim@ualberta.ca University of Alberta, Canada Warwick, Claire c.warwick@ucl.ac.uk University College London, UK Windsor, Jennifer jwindsor@ualberta.ca University of Alberta, Canada Sondheim, Daniel sondheim@ualberta.ca University of Alberta, Canada Rockwell, Geoffrey grockwel@ualberta.ca University of Alberta, Canada Ruecker, Stan sruecker@id.iit.edu IIT Institute of Design, USA PAPER 1 Introduction to Designing Interactive Reading Environments for the Online Scholarly Edition In this panel, members of the Interface Design team of the Implementing New Knowledge Environments (INKE) project will present a set of ideas, designs, and prototypes related to the next generation of the online scholarly edition, by which we mean a primary text and its scholarly apparatus, intended for use by people studying a text. We begin with ‘Digital Scholarly Editions: An Evolutionary Perspective’, which proposes a taxonomy of design features and functions available in a wide range of existing projects involving scholarly editions. ‘Implementing Text Analysis Ereader Tools to Create Ad-hoc Scholarly Editions’ draws on this taxonomy and examines how the strategies of ubiquitous text analysis can be applied to digital texts as a means of providing new affordances for scholarship. ‘Visualizing Citation Patterns in Humanist Scholarship’ looks at the use of citations in scholarly writing and proposes visual models of possible benefit for both writers and readers. ‘The Dynamic Table of Contexts: User Experience and Future Directions’ reports our pilot study of a tool that combines the conventional table of contents with semantic XML encoding to produce a richprospect browser for books, while ‘The Usability of Bubblelines: A Comparative Evaluation of Two Prototypes’ provides our results in looking at a case where two distinct prototypes of a visualization tool for comparative search results were created from a single design concept. 36 Ilovan, Mihaela ilovan@ualberta.ca University of Alberta, Canada Frizzera, Luciano dosreisf@ualberta.ca University of Alberta, Canada Windsor, Jennifer jwindsor@ualberta.ca University of Alberta, Canada Definitions of the scholarly edition at times seem as diverse as their content; definitions may highlight usefulness (e.g., Hjørland, n.d.), reliability (e.g. Lyman, 2009), or editorial expertise (e.g. Price 2008). Nevertheless, it is clear that scholarly editions ‘comprise the most fundamental tools in literary studies’ (McGann 2001: 55). The move from print to screen has offered scholars an opportunity to remediate scholarly editions and to improve upon them (Werstine 2008; Shillingsburg 2006), perhaps in the hope of producing their ‘fullest realization’ (Lyman 2009: iii). Online, distinctions between traditional types of scholarly editions are blurred. Variorum editions are increasingly becoming the norm (Galey 2007), since the economy of space is no longer as important a variable. The notion of a ‘best version’ of a text is also becoming less of an issue, as what constitutes ‘the best’ is open to interpretation and is often irrelevant with regard to questions that scholars would like to ask (Price 2008). Rather than categorizing electronic scholarly editions, we propose to evaluate a selection of influential and/or representative examples on the basis of a series of functional areas that have been noted in relevant literature as having undergone substantial changes in their move to the digital Digital Humanities 2012 environment. These areas include (1) navigation, including browsing and searching; (2) knowledgesharing, including public annotation; (3) textual analysis, including graphs and visualizations; (4) customizability of both interface and content; (5) side-by-side comparisons of multiple versions; and (6) private note-taking and markup. This study reveals that although all of the functionalities available in the digital medium are not implemented in every digital scholarly edition, it is nevertheless the case that even the simplest amongst them offer affordances that are different than their printed counterparts. The fact remains, however, that due to variety of functionalities implemented in digital scholarly editions, we are still negotiating what Lyman’s ‘fullest realization’ could be. PAPER 3 Implementing Text Analysis E-reader Tools to Create Ad-hoc Scholarly Editions Windsor, Jennifer jwindsor@ualberta.ca University of Alberta, Canada they are insufficient to support in-depth scholarly research. To alleviate this difficulty, we propose integrating Voyant tools, a user-friendly, flexible and powerful web-based text analysis environment developed by Stéfan Sinclair and Geoffrey Rockwell, with current e-reader technology, such as that used in the Internet Archive reader. In this design, Voyant functions as a sidebar to ereaders, allowing the text to remain visible during analysis. Choosing from a list of tools allows scholars to create a custom text analysis tool palette and having more than one tool open allows crossreferencing between results. Tools can be dragged into a custom order and a scroll bar allows navigation between several tools at once. A Voyant tutorial is available and each individual tool also has its own instructions for use, as well as a help feature and an option to export results. We anticipate this tool being of use to scholars in various fields who wish to use quantitative methods to analyze text. By implementing tools for textual analysis in an online reading environment, we are in effect offering scholars the ability to choose the kinds of analysis and depth of study that they wish; they will in effect be able to produce ad-hoc customized editions of digital text. Ilovan, Mihaela ilovan@ualberta.ca University of Alberta, Canada Sondheim, Daniel sondheim@ualberta.ca University of Alberta, Canada Frizzera, Luciano dosreisf@ualberta.ca University of Alberta, Canada Ruecker, Stan sruecker@id.iit.edu IIT Institute of Design, USA Sinclair, Stéfan stefan.sinclair@mcgill.ca McMaster University, Canada Rockwell, Geoffrey grockwel@ualberta.ca University of Alberta, Canada Figure 1: Integration of Voyant tools with the Internet Archive e-reader PAPER 4 Visualizing Citation Patterns in Humanist Monographs Ilovan, Mihaela With the proliferation of e-readers, the digitization efforts of organizations such as Google and Project Gutenberg, and the explosion of e-book sales, digital reading has become a commonplace activity. Although resources for casual reading are plentiful, ilovan@ualberta.ca University of Alberta, Canada 37 Digital Humanities 2012 Frizzera, Luciano dosreisf@ualberta.ca University of Alberta, Canada Michura, Piotr zemichur@cyf-kr.edu.pl Academy of Fine Arts in Krakow, Poland Rockwell, Geoffrey grockwel@ualberta.ca University of Alberta, Canada view of the visualization tool represents the internal structure of complex footnotes (see Figure 3) and visually highlights the relationship between different citations included in the same note, as well as their function in relation to the argument of the citing text. In this presentation we will introduce the visualization tool, demonstrate its functionalities and provide the results of initial testing performed. We will also discuss the lessons learned from this early stage of the project. Ruecker, Stan sruecker@id.iit.edu IIT Institute of Design, USA Sondheim, Daniel sondheim@ualberta.ca University of Alberta, Canada Windsor, Jennifer jwindsor@ualberta.ca University of Alberta, Canada This paper documents the design and programming of a visualization tool for the analysis of citation patterns in extended humanist works such as monographs and scholarly editions. Traditional citation analysis is widely acknowledged as being ineffective for examining citations in the humanities, since the accretive nature of research (Garfield 1980), the existence of multiple paradigms, the preference for monographs (Thompson 2002), and the richness of non-parenthetical citations all present problems for interpretation and generalization. To address this situation, we employ non-traditional methods of content and context analysis of citations, like functional classification (Frost 1979) and the exploration of the way in which sources are introduced in the flow of the argument (Hyland 1999). Acknowledging the richness of citations in humanist research, we employ graphic visualization for display and data inquiry, which allows us to carry out visual analytical tasks of the referencing patterns of full monographs. We opted to provide three distinct views of the analyzed monograph. The first one – a faceted browser of the references, affords the comparison of different aspects of the references included and introduces the user to the structure and content of the monograph’s critical apparatus. The second view contextualizes individual citations and highlights the fragments of text they support (see Figure 2); by representing both supported and non-supported portions of the monograph in their natural order, the view facilitates the linear reading of the way in which argument is built in the citing work. Finally, the third 38 Figure 2: ‘Contextualize’ view (citations in context) Figure 3: Displaying complex footnotes PAPER 5 The Dynamic Table of Contexts: User Experience and Future Directions Dobson, Teresa teresa.dobson@ubc.ca University of British Columbia, Canada Heller, Brooke brooke.heller@gmail.com University of British Columbia, Canada Ruecker, Stan sruecker@id.iit.edu IIT Institute of Design, USA Digital Humanities 2012 Radzikowska, Milena mradzikowska@gmail.com Mount Royal University, Canada Brown, Susan sbrown@uoguelph.ca University of Guelph, Canada The Dynamic Table of Contexts (DToC) is a text analysis tool that combines the traditional concepts of the table of contents and index to create a new way to manipulate digital texts. More specifically, DToC makes use of pre-existing XML tags in a document to allow users to dynamically incorporate links to additional categories of items into the table of contents. Users may load documents they have tagged themselves, further increasing the interactivity and usefulness of the tool (Ruecker et al. 2009). DToC includes four interactive panes, and one larger pane to view the text (see Figure 4). The first two panes are the table of contexts and the XML tag list: the former changes when a particular tag is selected in the latter, allowing the user to see where tokens of the tag fall in the sequence of the text. The third and fourth panes operate independently of each other but with the text itself: the ‘Excerpts’ pane highlights a selected token in the text and returns surrounding words, while the ‘Bubbleline’ maps multiple instances of a token across a predesignated section (such as a chapter, or scene). in XML; screen captures are supplementing our understanding of the participants’ experience. In this paper we will report results and discuss future directions for development of the prototype. PAPER 6 The Usability of Bubblelines: A Comparative Evaluation of Two Prototypes Blandford, Ann a.blandford@ucl.ac.uk University College London, UK Faisal, Sarah s.faisal@cs.ucl.ac.uk University College London, UK Fiorentino, Carlos carlosf@ualberta.ca University of Alberta, Canada Giacometti, Alejandro alejandro.giacometti.09@ucl.ac.uk University College London, UK Ruecker, Stan sruecker@id.iit.edu IIT Institute of Design, USA Sinclair, Stéfan stefan.sinclair@mcgill.ca McMaster University, Canada Warwick, Claire c.warwick@ucl.ac.uk University College London, UK Figure 4: The Dynamic Table of Contexts, showing three of four interactive panes at left along with the text viewing pane at right. With twelve pairs of participants (n = 24), we completed a user experience study in which the participants were invited: 1) to complete in pairs a number of tasks in DToC, 2) to respond individually to a computer-based survey about their experience, and 3) to provide more extensive feedback during an exit interview. Their actions and discussion while working with the prototype were recorded. Data analysis is presently underway: transcribed interview data has been encoded for features of experience In this paper, we explore the idea that programming is itself an act of interpretation, where the user experience can be profoundly influenced by seemingly minor details in the translation of a visual design into a prototype. We argue that the approach of doing two parallel, independent implementations can bring the trade-offs and design decisions into sharper relief to guide future development work. As a case study, we examine the results of the comparative user evaluation of two INKE prototypes for the bubblelines visualization tool. Bubblelines allows users to visually compare search results across multiple text files, such as novels, or pieces of text files, such as chapters. In this case, we had a somewhat rare opportunity in that both implementations of bubblelines began from a single 39 Digital Humanities 2012 design sketch, and the two development teams worked independently to produce online prototypes. The user experience study involved 12 participants, consisting of sophisticated computer users either working on or already having completed computer science degrees with a focus on human-computer interaction. They were given a brief introduction, exploratory period, assigned tasks, and an exit interview. The study focused on three design aspects: the visual representation, functionality and interaction. All users liked the idea of bubblelines as a tool for exploring text search results. Some users wanted to use the tools to search and make sense of their own work, e.g. research papers and computer programming codes. The study has shown that there was no general preference for one prototype over the other. There was however, general agreements of preferred visual cues and functionalities that users found useful from both prototypes. Similarly, there was an overall consensus in relation to visual representations and functionalities that users found difficult to use and understand in both tools. Having the ability to compare two similar yet somehow different prototypes has assisted us in fishing out user requirements summarized in the form of visual cues, functionalities and interactive experiences from a dimension that that would have been difficult to reach if we were testing a single prototype. to evaluate with expert users where our focus would be on the ability of the tool in assisting experts in making sense of the data. References Garfield, E. (1980). Is Information Retrieval in the Arts and Humanities Inherently Different from That in Science? The Effect That ISI®’S Citation Index for the Arts and Humanities Is Expected to Have on Future Scholarship. The Library Quarterly 50(1): 40–57. Galey, A. (2007) How to Do Things with Variants: Text Visualization in the Electronic New Variorum Shakespeare. Paper presented at the Modern Language Association Annual Convention, Chicago. Frost, C.O. (1979). The Use of Citations in Literary Research: A Preliminary Classification of Citation Functions. The Library Quarterly 49(4): 399–414. Hyland, K. (1999). Academic attribution: citation and the construction of disciplinary knowledge. Applied Linguistics 20(3): 341–367. Lyman, E. (2009). Assistive potencies: Reconfiguring the scholarly edition in an electronic setting. United States-Virginia: University of Virginia. Hjørland, B. (n.d.). Scholarly edition. Available at: http://www.iva.dk/bh/core%20concepts%20in% 20lis/articles%20a-z/scholarly_edition.htm (accessed 20 October, 2011). Lyman, E. (2009). Assistive potencies: Reconfiguring the scholarly edition in an electronic setting. United States-Virginia: University of Virginia. http://www.proquest.com.login.ezpro xy.library.ualberta.ca (accessed Sept, 2010). McGann, J. (2001). Radiant textuality: literature after the World Wide Web. New York: Palgrave. Figure 5: The two alternative implementations of Bubblelines (t1 & t2) In summary, most participants preferred the visual appearance and simplicity of t1, but the greater functionality offered by t2. Apparently incidental design decisions such as whether or not search was case sensitive, and whether strings or words were the objects of search (e.g. whether a search for ‘love’ would highlight ‘lover’ or ‘Love’ as well as ‘love’) often caused frustration or confusion. We argue that the approach of doing two parallel, independent implementations can bring the tradeoffs and design decisions into sharper relief to guide future development work. Our plan is to take these requirements into account in order to generate a third prototype which we envision 40 Price, K. (2008). Electronic Scholarly Editions. In S. Schreibman and R. Siemens (eds.), A Companion to Digital Literary Studies. Oxford: Blackwell http://digitalhumanities.org/compa nion/view?docId=blackwell/9781405148641/97 81405148641.xml&chunk.id=ss1-6-5 (accessed 25 October, 2011) Radzikowska, M., S. Ruecker, S. Brown, P. Organisciak, and the INKE Research Group (2011). Structured Surfaces for JiTR. Paper presented at the Digital Humanities 2011 Conference, Stanford. Rockwell, G., S. Sinclair, S. Ruecker, and P. Organisciak (2010). Ubiquitous Text Analysis. Poetess Archive Journal 2(1): 1-18. Digital Humanities 2012 Ruecker, S., S. Brown, M. Radzikowska, S. Sinclair, T. Nelson, P. Clements, I. Grundy, S. Balasz, and J. Antoniuk (2009). The Table of Contexts: A Dynamic Browsing Tool for Digitally Encoded Texts. In L. Dolezalova (ed.), The Charm of a List: From the Sumerians to Computerised Data Processing. Cambridge: Cambridge Scholars Publishing, pp. 177-187. Developing the spatial humanities: Geo-spatial technologies as a platform for cross-disciplinary scholarship Shillingsburg, P. L. (2006). From Gutenberg to Google: electronic representations of literary texts. Cambridge, UK; New York: Cambridge UP. Bodenhamer, David Thompson, J. W. (2002). The Death of the Scholarly Monograph in the Humanities? Citation Patterns in Literary Scholarship. Libri 52(3): 121-136. Werstine, P. (2008). Past is prologue: Electronic New Variorum Shakespeares. Shakespeare 4: 208-220. intu100@iupui.edu The Polis Center at IUPUI, USA Gregory, Ian i.gregory@lancaster.ac.uk Lancaster University, UK Ell, Paul Paul.Ell@qub.ac.uk Centre for Data Digitisation and Analysis at Queens University of Belfast, Ireland Hallam, Julia J.Hallam@liverpool.ac.uk University of Liverpool, UK Harris, Trevor tharris2@wvu.edu West Virginia University, USA Schwartz, Robert rschwart@mtholyoke.edu Mount Holyoke College, USA 1. Significance Developments in Geographic Information Systems (GIS) over the past few decades have been nothing short of remarkable. So revolutionary have these advances been that the impact of GIS on many facets of government administration, industrial infrastructure, commerce, and academia has been likened to the discoveries brought about by the microscope, the telescope, and the printing press. While concepts of spatial science and spatial thinking provide the bedrock on which a broad range of geospatial technologies and methods have been founded, the dialog between the humanities and geographic information science (GISci) have thus far been limited and have largely revolved around the use of ‘off-the-shelf’ GIS in historical mapping projects. This limited engagement is in stark contrast to the substantive inroads that GIS has made in the sciences and social sciences, as captured by the growing and valuable field of a socialtheoretic informed Critical GIS. Not surprisingly, the humanities present additional significant challenges 41 Digital Humanities 2012 to GISci because of the complexities involved in meshing a positivist science with humanist traditions and predominantly literary and aspatial methods. And yet it is the potential dialogue and engagement between the humanities and GISci that promises reciprocal advances in both fields as spatial science shapes humanist thought and is in turn reshaped by the multifaceted needs and approaches represented by humanist traditions. We use the term spatial humanities to capture this potentially rich interplay between Critical GIS, spatial science, spatial systems, and the panoply of highly nuanced humanist traditions. The use of GIS in the humanities is not new. The National Endowment for the Humanities has funded a number of projects to explore how geo-spatial technologies might enhance research in a number of humanities disciplines, including but not limited to history, literary studies, and cultural studies. The National Science Foundation and National Institutes of Health also have supported projects related to spatial history, such as the Holocaust Historical GIS (NSF) and Population and Environment in the U.S. Great Plains (National Institute of Child Health and Human Development). Although successful by their own terms, these projects also have revealed the limits of the technology for a wider range of humanities scholarship, which an increasing body of literature discusses in detail. Chief among the issues are a mismatch between the positivist epistemology of GIS, with its demand for precise, measurable data, and the reflexive and recursive approaches favored by humanists and some social scientists (e.g. practitioners of reflexive sociology) who wrestle continually with ambiguous, uncertain, and imprecise evidence and who seek multivalent answers to their questions. The problem, it seems, is both foundational and technological: we do not yet have a well-articulated theory for the spatial humanities, nor do we have tools sufficient for the needs of humanists. Addressing these deficits is at the heart of much current work in GIScience and in the spatial humanities. The panel, composed of scholars who have successfully blended geo-spatial technologies with cross-disciplinary methods in a variety of projects, will discuss how these digital tools can be bent toward the needs of humanities scholars and serve as a platform for innovative work in humanities disciplines. Emphasis will be on three important themes from the spatial humanities that also address the broader interests of the digital humanities: 1. Exploring the epistemological frameworks of the humanities and GISc to locate common ground on which the two can cooperate. This step often has been overlooked in the rush to develop new technology but it is the essential point of 42 departure for any effort to bridge them. This venture is not to be confused with a more sweeping foundational analysis of ingrained methodological conceits within the sciences and the humanities, and certainly should not be misunderstood as a query about the qualitative approach versus the quantitative approach. Rather, what is desired here is for the technology itself to be interrogated as to its adaptability, in full understanding that the technology has, in its genesis, been epistemologically branded and yet still offers potential for the humanities. What is required is an appropriate intellectual grounding in the humanities and draws the technology further out of its positivistic homeland. The payoff for collaboration will be a humanities scholarship that integrates insights gleaned from spatial information science, spatial theory, and the Geospatial Web into scaled narratives about human lives and culture. 2. Designing and framing narratives about individual and collective human experience that are spatially contextualized. At one level, the task is defined as the development of reciprocal transformations from text to map and map to text. More importantly, the humanities and social sciences must position themselves to exploit the Geospatial Semantic Web, which in its extraordinarily complexity and massive volume, offers a rich data bed and functional platform to researchers who are able to effectively mine it, organize the harvested data, and contextualize it within the spaces of culture. Finding ways to make the interaction among words, location, and quantitative data more dynamic and intuitive will yield rich insights into complex socio-cultural, political, and economic problems, with enormous potential for areas far outside the traditional orbits of humanities research. In short, we should vigorously explore the means by which to advance translation from textual to visual communication, making the most of visual media and learning to create ‘fits’ between the messages of text (and numbers) and the capabilities of visual forms to express spatial relationships. 3. Building increasingly more complex maps (using the term broadly) of the visible and invisible aspects of a place. The spatial considerations remain the same, which is to say that geographic location, boundary, and landscape remain crucial, whether we are investigating a continental landmass or a lecture hall. What is added by these ‘deep maps’ is a reflexivity that acknowledges how engaged human agents build spatially framed identities and aspirations out of imagination and memory and how the multiple perspectives constitute a spatial narrative that complements Digital Humanities 2012 the verbal narrative traditionally employed by humanists. After an introductory framing statement by the moderator, panelists will each take no more than 10 minutes to offer reflections and experiences drawn from their own projects that will address one or more of these themes. Several questions will guide these presentations: Prosopographical Databases, TextMining, GIS and System Interoperability for Chinese History and Literature 1. What advantages did geo-spatial technologies bring to your research? What limitations? How did you overcome or work around the limits? Bol, Peter Kees 2. How did your project address the mismatch between the positivist assumptions of GIS and the multivalent and reflexive nature of humanities scholarship? Hsiang, Jieh 3. What lessons have you learned that would be helpful for other scholars who use these technologies? Following the presentations, the moderator will guide a discussion among the panelists and with audience members to explore these themes further in an effort to distill an agenda or, more modestly, a direction for future work. References Bodenhamer, D., J. Corrigan, and T. Harris, eds. (2010). The Spatial Humanities: GIS and the Future of Humanities Scholarship. Bloomington: Indiana UP. Daniels, S., D. DeLyser, J. Entrikin, and D. Richardson, eds. (2011). Envisioning Landscapes, Making Worlds: Geography and the Humanities. New York: Routledge. Dear, M., J. Ketchum, S. Luria, and D. Richardson (2011). GeoHumanities: Art, History, Text at the Edge of Place. New York: Routledge. Gregory, I., and P. Ell (2008). Historical GIS: Technologies, Methodologies, and Scholarship. Cambridge: Cambridge UP. Knowles, A., ed. (2008). Placing History: How Maps, Spatial Data, and GIS Are Changing Historical Scholarship. Redlands, CA: ESRI Press. pkbol@fas.harvard.edu Harvard University, USA jieh.hsiang@gmail.com National Taiwan University, Taiwan Fong, Grace grace.fong@mcgill.ca McGill University, Canada PAPER 1 Introduction 1. Overview Digital content and tools for Chinese studies have been developing very quickly. The largest digital text corpus currently has 400 million characters. There is now an historical GIS for administrative units and towns in China from 221 BCE to 1911. And there are general and specialized biographical and literary databases. This wealth of resources is constrained, however, by the histories of their development: all have been developed independently of each other and there has been no systematic effort to create interoperability between them. This panel brings together three innovators in Chinese digital humanities and shows how, by implementing system interoperability and content sharing, their separate approaches to content and tool development can be mutually supporting and result in more powerful applications for the study of China’s history and literature. The goal of this session is both to introduce the projects the presenters lead and to demonstrate the advantages of sharing and interoperability across systems. Moreover, because their outputs are multilingual (Chinese, English translation, and pinyin) they are making the data from China’s past accessible to a non-Chinese-reading public. The China Biographical Database (CBDB) has been accumulating biographical data on historical figures, mainly from the 7th through early 20th century. It populates the database by applying text-mining procedures to large bodies of digital texts. Users 43 Digital Humanities 2012 can query the system in terms of place, time, office, social associations and kinship, and export the results for further analysis with GIS, social networks, and statistical software. The Research Center for the Digital Humanities at National Taiwan University developed the Taiwan History Digital Library, a system for the spatial and temporal visualization of the distribution of Taiwan historical land deeds, creating a temporally-enabled map interface that allows users to discover relevant documents, examine them and track their frequency. Using CBDB code tables of person names, place names, and official titles for text mark-up, the Center is now applying this system to a compendium of 80,000 documents from the 10th through early 13th century that cover all aspects of government, including such diverse topics as religious activity, tax revenues and bureaucratic appointments. Users will be able to track the careers of individuals and call up their CBDB biographies through time and space as well as seeing when, where, and what the government was doing. This will be a model for incorporating the still larger compendia of government documents from later periods. Data will be discovered (e.g. the location and date of official appointments) and deposited into the CBDB. The Ming Qing Women’s Writings (MQWW) project is a multilingual online database of 5000 women writers from Chinese history which includes scans of hitherto unavailable works and analyses of their content. Users can query the database by both person and literary content. Using APIs to create system interoperability between MQWW and CBDB, MQWW is building into its interface the ability to call up CBDB data on kinship, associations, and careers of the persons in MQWW. For their part CBDB users will be able to discover the writings of women through the CBDB interface. PAPER 2 Chinese Biographical Data: Text-mining, Databases and System Interoperability Bol, Peter Kees pkbol@fas.harvard.edu Harvard University, USA Biography has been a major component of Chinese historiography since the first century BCE and takes up over half the contents in the twenty-five dynastic histories; biographical and prosopographical data also dominate the 8500 extant local histories. The http://isites.harvard.edu/icb/icb.do?ke 44 yword=k16229 is a collaborative project of Harvard University, Peking University, and Academia Sinica to collect biographical data on men and women in China’s history (Bol and Ge 2002-10). It currently holds data on about 130,000 people, mainly from the seventh through early twentieth century, and is freely available as a http://isites.harvard.edu/icb/i cb.do?keyword=k16229&pageid=icb.page76670 , as an http://cbdb.fas.harvard.edu/cbdbc/ cbdbedit , and as an online query system with both http://59.124.34.70/cbdbc/ttsweb ?@0:0:1:cbdbkmeng@@0.10566209097417267 and http://59.124.34.70/cbdbc/ttsweb?@0:0:1:cb dbkm@@0.10566209097417267 interfaces. Code and data tables cover the multiple kinds of names associated with individuals, the places with which they were associated at birth and death and during their careers, the offices they held, the educational and religious institutions with which they were associated, the ways in which they entered their careers (with special attention to the civil service examination), the people they were associated with through kinship and through non-kin social associations (e.g. teacher-student), social distinction, ethnicity, and writings (Fuller 2011). The purpose of the database is to enable users, working in Chinese or English, to use large quantities of prosopographical data to explore historical questions (Stone 1972). These may be straightforward (e.g. What was the spatial distribution of books by bibliographic class during a certain period? How did age at death vary by time, space, and gender?) or complex (e.g. What percentage of civil officials from a certain place during a certain period were related to each other through blood or marriage? How did intellectual movements spread over time?). The standalone database and the online query system also include routines for generating genealogies of any extent and social networks of up to five nodes, finding incumbents in a particular office, etc. The standalone database (and next year the online system) can also output queries in formats that can be analyzed with geographic information system and social network analysis software (Bol 2012). Historical social network analysis is challenging but rewarding (Padgett & Ansell 1993; Wetherhall 1998). We began to populate the database largely through text-mining and text-modeling procedures. We began with Named Entity Recognition procedures (e.g. finding text-strings that matched reign periods titles, numbers, years, months, and days to capture dates, with 99% recall and precision) written in Python. We then proceeded to write more complex Regular Expressions to identify, for example, the office a person assumed at a certain date, where the office title is unknown. Currently we are Digital Humanities 2012 implementing probabilistic procedures to identify the social relationship between the biographical subject and the names of people co-occurring in the biographical text (we have reached a 60% match with the training data). An important outcome for the humanities generally is Elif Yamangil’s (Harvard University: Computer Science) development of the ‘Regular Expression Machine.’ This is a graphical user interface (GUI) built within Java Swing library that allows a user to graphically design patterns for biographical texts, match these against the data and see results immediately via a user-friendly color-coding scheme. It consists of a workspace of three main components: (1) a view that displays the textual data currently used, (2) a list of active regular expressions that are matched against the data via color-coding, and (3) a list of shortcut regular expression parts that can be used as building-blocks in active regular expressions from (2). Additional facilities we have built into our product are (1) an XML export ability: Users can create XML files that flatten the current workspace (data and regular expression matched against) at the click of a button. This facilitates interfacing to other programs such as Microsoft Excel and Access for database input. (2) A save/load ability: Users can easily save/load the workspace state which includes the list of regular expressions and shortcuts and their color settings. (3) A handy list of pre-made regular expression examples: Numerous date patterns can be added instantly to any regular expression using the GUI menus. The point of building this application is to allow users with no prior experience with programming or computer science concepts such as regular expression scripting to experiment with datamining Chinese biographical texts at an intuitive template-matching understanding level only, yet still effectively. The CBDB project also accepts volunteered data. Our goal in this is social rather than technical. Researchers in Chinese studies have long paid attention to biography and in the course of their research have created tables, and occasionally databases, with biographical data. By offering standard formats and look-ups and queries for coding, we provide researchers with a permanent home for their data. Currently there are twelve collaborating research projects, among which are an extensive database of Buddhist monks, a collection of 5000 grave biographies, 30,000 individuals active in the ninth through tenth centuries. The more biographical data the project accumulates the greater the service to research and learning that explore the lives of individuals. making it accessible to others on the web. In Chinese studies – speaking now only of those online systems that include biographical data – these include the Tang Knowledge Database at the Center for Humanistic Research at Kyoto University, the Ming-Qing Archives at Academia Sinica and the National Palace Museum Taiwan, the databases for Buddhist Studies at Dharma Drum College, Ming Qing Women’s Writings Database at McGill University, and the University of Leuven’s database of writings by Christian missionaries and converts in China. The role of the China Biographical Database project in this environment is to provide an online resource for biographical data that others can draw on to populate fields in their own online systems. To this end we are currently developing (and will demonstrate this summer) a set of open Application Programming Interfaces that will allow other systems to incorporate data from the China Biographical Database on the fly. This will allow other projects to focus their resources on the issues that are of most interest to them while at the same time being able to incorporate any or all the data in CBDB within their own systems. Figure 1: Online search results for Zhu Xi Figure 2: Persons in CBDB from Jianzhou in the Song period Humanists are exploring ways in which they can use data in quantity and looking for ways of 45 Digital Humanities 2012 Stone, L. (1972). Prosopography. In F. Gilbert, E. J. Hobsbawm and St. Richards Graubard (eds.), Historical studies today., xxi, 469. New York: Norton. Wetherhall, Ch. (1998). Historical Social Network Analysis. In L. J. Griffin and M. van der Linden (eds.), New methods for social history. Cambridge, New York: Cambridge UP, p. 165. PAPER 3 Figure 3: Kin to Zhu Xi (5 generations up and down, collateral and marriage distance of 1) Context discovery in historical documents – a case study with Taiwan History Digital Library (THDL) Hsiang, Jieh jieh.hsiang@gmail.com National Taiwan University, Taiwan Figure 4: Civil Service Examination degree holders of 1148 and 1256 References Bol, P. K. (2007). Creating a GIS for the History of China. In A. Kelly Knowles and A. Hillier (eds.), Placing History: How Maps, Spatial Data, and GIS Are Changing Historical Scholarship. Redlands, CA: ESRI Press, pp. 25-57. Bol, P. K. (2012). GIS, Prosopography, and History. Annals of GIS 18(1): 3-15. Bol, P. K., and Ge Jianxiong 葛剑雄 (2002-10). China historical GIS [electronic resource] = Zhongguo li shi di li xin xi xi = 中国历史地理信息 系统. Version 1.0-5.0 Harvard University and Fudan University. Fuller, M. A. (2011). CBDB User’s Guide. 2nd edition. Harvard University: China Biographical Database. Padgett, J. F., and Chr. K. Ansell (1993). Robust Action and the Rise of the Medici, 1400-1434. American Journal of Sociology 98: 1259-319. 46 Research on pre-1900 Taiwanese history has suffered from the lack of primary historical documents. Most of the local government records, except for the Danxin Archives (an archive of about 20,000 official documents from the two northern Taiwan prefectures during the 19th century) are lost. Although there are quite a few records about Taiwan in the Imperial Court archives, they are kept in several different institutions, scattered among the volumes, and are hard to access. The situation with local documents such as land deeds is even worse. Some are published in local gazetteers or books, and some are family records kept by individual researchers for their own research use. There was an urgent need to collect primary documents and put them under one roof so that they can be better utilized by scholars. In 2002 the Council of Cultural Affairs of Taiwan commissioned the National Taiwan University Library to select imperial court documents relevant to Taiwan from the Ming and Qing court archives, and the National Taichung Library to collect local deeds (especially land deeds). More than 30,000 court documents and 19,000 local deeds were collected and transcribed into full text. Most of the court documents and a portion of the land deeds were then published, from 2003 to 2007, into a book series of 147 volumes. At the same time, CCA authorized the Research Center for Digital Humanities (RCDH) of NTU to Digital Humanities 2012 create a digital library so that the digital files of the found materials could be used on line. In addition to creating THDL, the Taiwan History Digital Library, RCDH also added a significant number of documents that were collected later. THDL now contains three corpuses, all in searchable full-text. The Ming and Qing Taiwan-related Court Document Collection, numbered at 45,722, are collected from about 300 different sources and includes memorials, imperial edicts, excerpts from personal journals of local officials, and local gazetteers. The Old Land Deeds Collection contains 36,535 pre-1910 deeds from over 100 sources, with new material added every month. The Danxin Archives includes about 1,151 government court cases with 19,557 documents. Together they form the largest collection of its kind, with over 100,000 documents, all with metadata and searchable full text. The three corpuses reflect 18th and 19th century Taiwan from the views of the central government, the local government, and grassroots respectively. THDL has been available for free over the Internet since 2009, and has already made an impact on research of Taiwanese history. When designing THDL we encountered a challenge. Unlike conventional archives whose documents have a clear organization, the contents in THDL, being from many different sources, do not have predefined contexts. Although one can easily create a searchengine-like system that returns documents when given a query, such a system is cumbersome since the user has to peruse the returned documents to find the relevant ones. In THDL we use a different approach. We assume that documents may be related, and treat a query return (or any subset of documents) as a sub-collection – a group of related documents (Hsiang et al. 2009). Thus in addition to returning a list of documents as the query result, the system also provides the user with as many contexts about the returned sub-collection as possible. This is done in a number of ways. The first one is to classify the query return according to attributes such as year, source, type, location, etc. Visualization tools are provided to give the user a bird’s eyes view of the distributions. Analyses of co-occurrences of names, locations and objects provide more ways to observe and explore the relationships among the actors and the collective meanings of the documents. For example, co-occurrence analysis of terms reveals the names of the associates mentioned most often in the memorials from a certain official, or the most frequently mentioned names during a historical event. Figure 1 is a snapshot of THDL after issuing the query ‘找洗字’. The map on the left shows the geographic locations of the documents in the query return. The top is a chronological distribution chart, and the left column is the post-classification of the query result according to year. We have also developed GIS techniques both for issuing a query and for analyzing/visualizing query results. For the land deeds, for instance, one can draw a polygon to query about all the land deeds appeared in that region (Ou 2011). (Such a query is quite impossible using keywords.) Figure 2 contains such an example. In order to go beyond syntactic queries, we further developed text mining techniques to explore the citation relations among the memorials and imperial edicts of the Ming Qing collection (Hsiang et al. 2012), and land transaction relations among the deeds in the land deed collection (Chen et al. 2011). Both projects have produced transitivity graphs that capture historical phenomena that would be difficult to discover manually. In order to accomplish the above we developed a number of text-mining techniques. Term extraction methods were developed to extract more than 90,000 name entities from the corpuses (Hsieh 2011), and information such as the locations, dates, the four reaches of the land deeds, prices, and patterns. Matching algorithms were designed to find the transaction, citation and other relations mentioned above (Huang 2009; Chen 2011). GISrelated spatial-temporal techniques have also been developed (Ou 2011). The contents of the China Biographical Database (CBDB) and Ming Qing Women’s Writers (MQWW) described in other parts of this panel share a number of commonalities with the contents of THDL. In additional to being written in classical Chinese, the documents in each of the corpuses were collected from many different sources and spanned over hundreds of years. Furthermore, there are no intrinsic relations among the documents other than the obvious. To use such loosely knitted collections effectively in research would require a system that can help the user explore contexts that may be hidden among the documents. This is exactly what THDL provides. The designing philosophy of treating a query return as a sub-collection and the mining and presentation technologies developed for THDL can be adapted in the other two collections as well. Indeed, THDL’s design, interface, and mining technologies are flexible enough to easily incorporate other Chinese language corpuses. The only assumption is that the documents in a collection should have well-structured metadata, which is important for the purpose of post-classification of a sub-collection. If the full-text of the content is also available, then more sophisticated analytical methods such as co-occurrence analysis can be deployed. We have worked with the CBDB team of Harvard and built, within a month, a fully functional system from the THDL shell for Song huiyao (宋會要), a compendium of government records between 10th and 13th century China. (The content was jointly 47 Digital Humanities 2012 developed at Harvard and the Academia Sinica of Taiwan.) The system has also incorporated 9,470 person names, 2,420 locations, and 3,366 official titles from CBDB and used them in co-occurrence analysis, classifications and other features. It also extracted, automatically, 11,901 additional terms from the corpus. To reduce the number of errors unavoidable from this automated process, we have designed a crowd-sourcing mechanism for users to make corrections and to add new terms they discovered when using the system. The new names obtained through this process will in turn be fed back to CBDB. We have also built a prototype for MQWW with post-classification features. Significant enhancement to this system is being planned once we receive enough users’ feedback. Incorporating the names and locations of CBDB into the system is also being studied. While the text mining techniques that we have described here are designed for the Chinese language, the retrieval methodology of treating a query return as a sub-collection and the mechanisms for discovering and representing its collective meanings is universal. It provides a way to present and analyze the query return that seems better suitable for scholars to explore digital archives than the more conventional search engine that treats a query return as a ranked list. Figure 1: Snapshot of THDL with query term “找洗字” Figure 2: Query resulted from issuing a polygon (region) as a query References Chen S. P. (2011). Information technology for historical document analysis, Ph.D. thesis, National Taiwan University, Taipei, Taiwan. 48 Digital Humanities 2012 Chen, S. P., Y. M. Huang, H. L. Ho, P. Y. Chen, and J. Hsiang (2011). Discovering land transaction relations from land deeds of Taiwan. Digital Humanities 2011 Conference, June 19-22, 2011. Stanford, CA, pp. 106-110. Hsiang, J., S. P. Chen, and H. C. Tu (2009). On building a full-text digital library of land deeds of Taiwan. Digital Humanities 2009 Conference. Maryland, June 22-25, 2009, pp. 85-90. Hsiang, J., S. P. Chen, H. L. Ho, and H. C. Tu (2012). Discovering relations from imperial court documents of Qing China. International Journal of Humanities and Arts Computing 6: 22-41. Hsieh, Y. P. (2012), Appositional Term Clip: a subject-oriented appositional term extraction algorithm. In J. Hsiang (ed.), New Eyes for Discovery: Foundations and Imaginations of Digital Humanities. Taipei: National Taiwan UP, pp. 133-164. Huang, Y. M. (2009). On reconstructing relationships among Taiwanese land deeds. MS Thesis, National Taiwan University, Taipei, Taiwan. Ou, C.-H. (2011). Creating a GIS for the history of Taiwan – a case study of Taiwan during the Japanese occupation era. MS Thesis, National Taiwan University, Taipei, Taiwan. PAPER 4 System Interoperability and Modeling Women Writers’ Life Histories of Late Imperial China Fong, Grace grace.fong@mcgill.ca McGill University, Canada Recent scholarship has shown that Chinese women’s literary culture began to flourish on an unprecedented level alongside the boom in printing industry in the late sixteenth century, and continued as a cultural phenomenon in the late imperial period until the end of the Qing dynasty (1644-1911) (Ko 1994; Mann 1997; Fong 2008; Fong & Widmer 2010). Over 4,000 collections of poetry and other writings by individual women were recorded for this period (Hu and Zhang 2008). These collections with their rich autobiographical and social contents open up gendered perspectives on and complicated many aspects of Chinese culture and society – unsuspected kinship and family dynamics, startling subject positions, new topoi and genres, all delivered from the experience of literate women. Yet, within the Confucian gender regime, women were a subordinated group and ideally to be located within the domestic sphere, and in many instances their writings had not been deemed worthy of systematic preservation. Perhaps less than a quarter of recorded writings in individual collections have survived the ravages of history; many more fragments and selections are preserved in family collections, local gazetteers, and various kinds of anthologies. Works by individual women have been difficult to access for research, as they have mostly ended up in rare book archives in libraries in China. Aimed at addressing the problem of accessibility, the Ming Qing Women’s Writings project (MQWW; http://digital.library.mcgill.ca/mingqing) is a unique digital archive and web-based database of historical Chinese women’s writing enhanced by a number of research tools. Launched in 2005, this collaborative project between McGill and Harvard University makes freely available the digitized images of Ming-Qing women’s texts held in the HarvardYenching Library, accompanied by the analyzed contents of each collection, which can be viewed online, and a wealth of literary, historical, and geographical data that can be searched (Fig. 1-4). However, MQWW is not a full-text database – we do not have the funding resources for that, but it is searchable based on an extensive set of metadata. In addition to identifying thematic categories, researchers can link between women based on exchange of their work and correspondence, obtain contextual information on family and friends, and note the ethnicity and marital status of the women writers, among other data fields. Each text within a collection is analyzed and identified according to author, title, genre and subgenre, whether it is a poem, a prose piece, or chapter of a verse novel. The MQWW digital archive contains more than 10,000 poems (majority by women) and also more than 10,000 prose pieces of varying lengths and genres (some by men), ranging from prefaces to epitaphs, and over 20,000 images of original texts. It clearly enables literary research; conferences, research papers, and doctoral dissertations have drawn on the resources provided, as well, the database has been serving as an important teaching resource in North America, China, Hong Kong, and Taiwan. While the MQWW database contains basic information on 5,394 women poets and other writers, it was not originally designed with biographical research in mind. Yet, continuing research has pointed to how family and kinship, geographical location and regional culture, and social networks 49 Digital Humanities 2012 and literary communities are significant factors affecting the education, marriage, and general life course of women as writers (Mann 2007; Widmer 2006). The current phase of the collaborative project with the China Biographical Database (CBDB), Harvard University, will develop the biographical data of MQWW for large-scale prosopographical study (Keats-Rohan 2007). It is guided by the potential for taking digital humanities research in the China field to a new stage of development by focusing on system interoperability. By this we mean that online systems are built so as to enable interaction with other online systems through an Application Programming Interface (API), which must be developed in both directions. Our methodological strength is based on the two robust databases with demonstrated long-term sustainability and scholarly significance. An API is being created for CBDB to enable it to interact with MQWW and vice versa. MQWW will retain its separate identity as a knowledge site with its unique digital archive of texts and multifunction searchable database. System interoperability will support cross-database queries on an ad hoc basis by individual users. We are developing an integrated search interface to query both databases. My paper will be an experiment based on a project in progress. With the massive biographical data available in CBDB, I will formulate and test queries for kinship and social network analysis for women writers in MQWW for a life history model, or a prosopographical study, of writing women in late imperial China. Some of the questions I will address are: and geographical experiences, and their subtle role in the social and political domain, and disclose unsuspected lines of family and social networks, political alliances, literati associations, and women’s literary communities for further study and analysis. The experiments can show the probabilities and possibilities of the new system interoperability for identifying not only normative patterns, but also ‘outlier’ cases in marginal economic, geographical, and cultural regions, for which social, political, and historical explanations can be sought through more conventional humanities research. Figure 1: Search poems of mourning by the keyword = dao wang (悼亡) using the English interface - Can we map historical changes statistically to test the ‘high tides’ of women’s writings (mid seventeenth century and late eighteenth century), which were arrived at through non-statistical observations? - What graphs do the data yield for women’s social associations when mapped according to geographical regions, historical periods, and male kin success in the civil examination system, and other defined parameters? - How do the official assignments of male kin affect groups of women in their life cycle roles: as daughter, wife, mother, mother-in-law, grandmother? - What social patterns can emerge, such as local and translocal marriage and female friendship, through social network analysis with data in CBDB? The rich data for male persons in CBDB can offer possibilities for mapping the life history of women writers: the circulation of their books, the physical ‘routes’ of movement in their lives, their temporal 50 Figure 2: Search results displaying poem titles with the keyword = dao wang (悼亡) with links to digitized text images Digital Humanities 2012 Widmer, E. (2006). The Beauty and the Book: Women and Fiction in Nineteenth-Century China. Cambridge, MA: Harvard University Asia Center. Figure 3: Digitized texts of the first set of dao wang (悼亡) poems in the result list Figure 4: Search results for the woman poet Xi Peilan (1760-after 1829) References Fong, G. S. (2008). Herself an Author: Gender, Agency, and Writing in Late Imperial China. Honolulu: U of Hawaii P. Fong, G. S., and E. Widmer, eds. (2010). The Inner Quarters and Beyond: Women Writers from Ming through Qing. Leiden: Brill. Hu, W., and H. Zhang (2008). Lidai funü zhuzuokao (zengdingben) (Catalogue of women’s writings through the ages [with supplements]). Shanghai: Shanghai guji chubanshe. Keats-Rohan, K. S. B., ed. (2007). Prosopography Approaches and Applications: A Handbook. Oxford: Unit for Prosopographical Research, Linacre College, University of Oxford. Ko, D. (1994). Teachers of the Inner Chambers: Women and Culture in Seventeenth-Century China. Stanford: Stanford UP. Mann, S. (1997). Precious Records: Women in China’s Long Eighteenth-Century. Berkeley, CA: U of California P. Mann, S. (2007). The Talented Women of the Zhang Family. Berkeley, CA: U of California P. 51 Digital Humanities 2012 Future Developments for TEI ODD Cummings, James James.Cummings@oucs.ox.ac.uk University of Oxford, UK Rahtz, Sebastian Sebastian.Rahtz@oucs.ox.ac.uk University of Oxford, UK Burnard, Lou lou.burnard@tge-adonis.fr TGE-Adonis, France Bauman, Syd Syd_Bauman@Brown.edu Brown University, USA Gaiffe, Bertrand bertrand.gaiffe@atilf.fr ATILF, France Romary, Laurent laurent.romary@inria.fr INRIA, France Bański, Piotr bansp@o2.pl HUB & IDS, Poland The purpose of this panel is to look at the application and future development of the literate programming system known as ODD which was developed for the Text Encoding Initiative (TEI) and underlies every single use of the TEI. Though strongly influenced by data modelling techniques characteristic of markup languages such as SGML and XML, the conceptual model of the Text Encoding Initiative is defined independently of any particular representation or implementation. The objects in this model, their properties, and their relationships are all defined using a special TEI vocabulary called ODD (for One Document Does-it-all); in this way, the TEI model is used to define itself and a TEI specification using that model is, formally, just like any other kind of resource defined using the TEI. An application selects the parts of the TEI model it wishes to use, and any modifications it wishes to make of them, by writing a TEI specification (a TEI ODD document), which can then be processed by appropriate software to generate instance documents relevant to the given application. Typically, these instance documents will consist of both user documentation, such as project manuals for human use, and system documentation, 52 such as XML schemas or DTDs, sets of Schematron constraints etc. for machine use. In this respect ODD is a sophisticated re-implementation of the ‘literate programming’ paradigm developed by Don Knuth in the 1970s reimagined as ‘literate encoding’. One of the requirements for TEI Conformance is that the TEI file ‘is documented by means of a TEI Conformant ODD file which refers to the TEI Guidelines’. In many cases users employ pregenerated schemas from exemplar customizations, but they are usually better served if they use the TEI ODD markup language, possibly through the Roma web application, to constrain what is available to them thus customizing the TEI model to reflect more precisely their encoding needs. Some of the mechanisms supporting this extensibility are relatively new in the TEI Guidelines, and users are only now beginning to recognize their potential. We believe that there is considerable potential for take-up of the TEI system beyond its original core constituencies in language engineering, traditional philology, digital libraries, and digital humanities in general. Recent additions to the TEI provide encoding schemes for database-like information about persons and places to complement their existing detailed recommendations for names and linguistic structures; they have always provided recommendations for software-independent means of authoring scientific documentation, but the ODD framework makes it easier for TEI documents to coexist with other specialist XML vocabularies as well as expanding it to encompass the needs of new specialised kinds of text. It has been successfully used for describing other XML schemas, notably the W3C ITS (Internationalisation Tagset) and ISO TC37 SC4 standards documents; more recently its facilities have greatly simplified the task of extending the TEI model to address the needs of other research communities, such as musicologists and genetic editors. We believe that the current ODD system could be further enhanced to provide robust support for new tools and services; an important step is to compare and contrast its features with those of other ‘metaencoding’ schemes and consider its relationship to ontological description languages such as OWL. The potential role of ODD in the development of the semantic web is an intriguing topic for investigation. This panel brings together some of the world’s most knowledgeable users and architects of the TEI ODD language, including several who have been responsible for its design and evolution over the years. We will debate its strengths, limitations, and future development. Each speaker will focus on one aspect, problem, or possible development relating to TEI ODD before responding to each Digital Humanities 2012 others suggestions and answering questions from the audience. Lou Burnard will introduce the history and practical use of ODD in TEI, and describe its relevance as a means of introducing new users to the complexity of the TEI. Sebastian Rahtz will talk about the processing model for ODD, and the changes required to the language to model genuinely symmetric, and chainable, specifications. Bertrand Gaiffe will look at some of the core mechanisms within ODD, and suggest that model classes that gather elements (or other model classes) for their use into content models could be better as underspecified bags instead of sets. Syd Bauman will discuss co-occurrence constraints, pointing out that ODD’s lack of support for this feature is a significant limitation, but also that it can often be worked around by adding Schematron to an ODD. Laurent Romary and Piotr Banski will describe issues in drafting ODD documents from scratch, in particular in the context of ISO standardisation work, introducing proposals to make ODD evolve towards a generic specification environment. We believe that the DH2012 conference offers a useful outreach opportunity for the TEI to engage more closely with the wider DH community, and to promote a greater awareness of the power and potential of the TEI ODD language. We also see this as an invaluable opportunity to obtain feedback about the best ways of developing ODD in the future, thereby contributing to the TEI Technical Council’s ongoing responsibility to maintain and enhance the language. Organization Organizer James james.cummings@oucs.ox.ac.uk, Oxford Cummings, University of References Burnard, L., and R. Rahtz (2004). RelaxNG with Son of ODD. Extreme Markup Languages 2004, http://citeseerx.ist.psu.edu/viewdoc/ download?doi=10.1.1.124.7139&rep=rep1&type =pdf Romary, L. (2009). ODD as a generic specification platform, Text encoding in the era of mass digitization – Conference and Members’ Meeting of the TEI Consortium http://hal.inria.fr/inria -00433433 TEI By Example (2010). Customising TEI, ODD, Roma. TEI By Example, http://tbe.kantl.be/TB E/modules/TBED08v00.htm TEI Consortium, eds. (2007). TEI P5 Guidelines Chapter on “Documentation Elements”. TEI P5: Guidelines for Electronic Text Encoding and Interchange. http://www.tei-c.org/release/do c/tei-p5-doc/en/html/TD.html TEI Consortium, eds. (2007). TEI P5 Guidelines Section on “Implementation of an ODD System”. TEI P5: Guidelines for Electronic Text Encoding and Interchange. http://www.tei-c.org/release/do c/tei-p5-doc/en/html/USE.html#IM TEI Consortium, eds. (2007). Getting Started with TEI P5 ODDs. TEI-C Website. http://www.te i-c.org/Guidelines/Customization/odds.xml The 5 speakers will each give a 15-minute introduction to a problem or possible development that relates to TEI ODD. After this the organizer will moderate discussion between members of the panel on a number of questions before opening the discussion to questions from the audience. Speakers - Lou Burnard, lou.burnard@tge-adonis.fr, TGEAdonis - Syd Bauman, syd_bauman@brown.edu, Brown University Center for Digital Scholarship - Bertrand Gaiffe, bertrand.gaiffe@atilf.fr, ATILF - Sebastian Rahtz, sebastian.rahtz@oucs.ox.ac.uk, University of Oxford - Laurent Romary & Piotr Bański (Piotr speaking), laurent.romary@inria.fr, bansp@o2.pl, Inria, HUB & IDS 53 Digital Humanities 2012 Compiling large historical reference corpora of German: Quality Assurance, Interoperability and Collaboration in the Process of Publication of Digitized Historical Prints 1. Despite the growing acceptance of annotation standards such as the TEI, there are different ‘encoding dialects’ and baseline formats used in different contexts (e.g. TextGrid Baseline Encoding, the TEI-Encoding recommended by HAB, the ›base format‹ of the DTA, and others). Neither one has gained wider acceptance nor have they been checked against each other. As a result, repositories of digital documents do not apply consistent, interoperable encoding. In addition, users cannot draw on these resources in an integrated way. Geyken, Alexander 2. There is no approved system of evaluation for both the quality of corpus texts and the quality of the encoding. In addition, there is no reputation system for crediting individual researchers’ accomplishments with respect to the production and annotation of corpus texts. As a consequence, there is no ‘culture of sharing corpus resources’ in a collaborative way in the research community. geyken@bbaw.de Berlin-Brandenburgische Akademie der Wissenschaften, Germany Gloning, Thomas Thomas.Gloning@germanistik.uni-giessen.de CLARIN-D, Germany Stäcker, Thomas staecker@hab.de Herzog August Bibliothek Wolfenbüttel, Germany PAPER 1 Introduction 1. Problem Statement It has been and still is one of the core aims in German Digital Humanities to establish large reference corpora for the historical periods of German, i.e. Old and Middle High German, Early New High German (ENHG), New High German and Contemporary German (NHG). This panel focusses on NHG and ENHG. A reference corpus of the 17th to 19th century German is currently being compiled by the Deutsches Textarchiv project at the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW). The Herzog August Bibliothek Wolfenbüttel (HAB) constructs text corpora encompassing the period between the 15th and the 18th century, with a focus on ENHG. Apart from core activities like these, that usually are carried out by institutions for long time research such as academies, research centers and research libraries, many digital resources are created by individual scholars or small project teams. Often, however, such resources never show up in the pool of publicly available research corpora. Building an integrated corpus infrastructure for large corpora of ENHG and NHG faces three main problems: 54 3. The vision of an integrated historical corpus of German and an integrated research platform is in conflict with the principle of local ascription of scholarly work. While the user would like to have one place to find all the available corpus texts and one platform to run the available corpus tools, each academy, each institute, each library, each research project and even individual researchers need to produce work and resources ascribable in a local or even in a personal way. 2. Panel Topics Well-established infrastructure projects such as TextGrid, CLARIN or DARIAH can contribute enormously to the process of integration by establishing methods of interoperation, a system of quality assurance and credit, and a set of technical practices that allow to integrate resources of different origin, to credit the producers of resources in an appropriate way, to ensure public access in a persistent way and thereby to involve a greater scholarly community. The proposed panel, organized by the Deutsches Textarchiv at the BBAW, the Special Interest Group ‘Deutsche Philologie’ (‘German Philology’) in CLARIN-D and the HAB Wolfenbüttel, will demonstrate how efforts for community-building, text aggregation, the technical realization of interoperability and the collaboration of two large infrastructure centers (BBAW and HAB) is set to work in the digital publication platforms of the Deutsches Textarchiv (BBAW) and AEDit (HAB). Technical requirements include tools for longterm archiving as well as the implementation and documentation of reference formats. Interfaces Digital Humanities 2012 need to be standardized, easy to handle and well documented to minimize the user’s effort to upload their resources. Transparent criteria have to be developed regarding the expected quality, the encoding level and requirements for interoperability of texts in the repository. The DTA provides a large corpus of historical texts of various genres, a heterogeneous text base encoded in one consistent format (i.e. DTA ‘base format’). Facilities for quality assurance (DTAQ) and for the integration of external text resources (DTAE) have been developed. PAPER 2 The DTA ‘base format’ Haaf, Susanne haaf@bbaw.de Berlin-Brandenburgische Akademie der Wissenschaften, Deutsches Textarchiv, Germany Geyken, Alexander geyken@bbaw.de Berlin-Brandenburgische Akademie der Wissenschaften, Deutsches Textarchiv, Germany The DTA ‘base format’ consists of about 80 TEI P5 -elements which are needed for the basic formal and semantic structuring of the DTA reference corpus. The purpose of developing the ›base format‹ was to gain coherence at the annotation level, given the heterogeneity of the DTA text material over time (1650-1900) and text types (fiction, functional and scientific texts). The restrictive selection of ‘base format’ elements with their corresponding attributes and values is supposed to cover all annotation requirements for a similar level of structuring of historical texts. We will illustrate this by means of characteristic annotation examples taken from the DTA reference corpus. We will compare the DTA ‘base format’ to other existing base formats considering their different purposes and discuss the usability of the DTA ‘base format’ in a broader context (DTAE, CLARIN-D). A special adaptation of the oXygen TEI-framework which supports the work with the DTA ‘base format’ will be presented. PAPER 3 DTAE Thomas, Christian thomas@bbaw.de Berlin-Brandenburgische Akademie der Wissenschaften, Deutsches Textarchiv, Germany DTAE is a software module provided by the DTA for external projects interested in integrating their historical text collections into the DTA reference corpus. DTAE provides routines for uploading metadata, text and images, as well as semiautomatic conversion tools from different source formats into the DTA ‘base format’. The text- and metadata are indexed for lemma-based full text search and processed with tools for presentation in order to offer parallel views of the source image, the XML/ TEI encoded text as well as a rendered HTML presentation layer. In addition, external contributors can integrate the processed text into their own web site via