Sunday 16 October 2011

The Role of Technology in Scholarly Editing

I just came back from sunny Würzburg where, facilitated by the impeccable organisation of Malte Rehbein and the splendid hospitality of Fotis Jannidis, I have spent a few days with colleagues and friends at the TEI Members' Meeting discussing over the future of the TEI and Philology in the Digital Age. If this Members Meeting will be remembered for something, it will certainly be the chocolate-flavoured keynote delivered by Edward Vanhoutte (the slides are also available).

I have given a quite controversial paper there and I have been asked to share slides and content of such paper. As I'm a bit lazy (a.k.a. busy), it will take a while until I'm able to write my considerations down (they will come, I promise!!), but at least I can share the slides:

You can read the abstract here (it will take a bit of scrolling, there is no direct linking to the single abstract, sorry, but you can take advantage of the many other interesting abstracts present there!), or down here:

In the past years two complementary but somewhat diverging tendencies have dominated the field of digital philology: the creation of models for analysis and encoding, such as the TEI, and the creation of tools or software to support the creation of digital editions for editing, publishing or both (Robinson 2005, Bozzi 2006).

These two tendencies are not necessarily mutually exclusive, as the creation of models can represent either the underlying structure or an exporting format for the development of tools. However, these two approaches have often been perceived in opposition, as a dichotomy. On the one hand we have the XML enthusiasts, the editors-as-encoders who apply XML markup to their texts and perhaps also develop publication strategies; on the other hand we have those who support out-of-the-box tools (the ‘magic’ or ‘black’ boxes), who proactively seek the development of fully comprehensive tools that present user-friendly interfaces with the explicit purpose of ‘covering the wires’, in particular hiding the much-abhorred angled brackets. But what are the implications of these positions with respect to the future development of digital (or computational) philology? How realistic is it to ask ‘traditional’ textual editors to turn into encoders? Conversely, how realistic and sustainable is the creation of ‘magic boxes’?

In the past I have studied the difficulties and theoretical implications of using a TEI-based editorial model for an editorial team that was highly geographically dispersed (Pierazzo 2010, but presented as a paper in 2008). On that occasion I argued that the development of ‘magic boxes’ is a very ambitious item to have on the digital philology agenda because every edition, every scholar needs a very specialized, tailored set of tools. In the same article I expressed the opinion that, even if the scholars do not feel comfortable in using tags-on-view XML and the TEI, this was the only reasonable approach for digital scholarly editions. A couple of year later, my judgment has been mitigated somewhat. This was brought about largely by the interesting article by Tim McLoughlin (2010, to be read in combination with Rehbein 2010) which presents in an insightful way the difficulties and resistances in turning a consolidated editorial model into a digital TEI-based one, combined with the experience I gained on some collaborative research projects at King’s College London’s Department of Digital Humanities: these together have triggered questions about the role of technology when it comes to digital scholarly editing. As a matter of fact, the evolution of the editor into an editor-encoder has yet to be investigated in full; at the moment it seems that the attention has been mostly devoted to the steep learning curve necessary to master the techniques of encoding in XML but without reflecting on the deep and sometimes unwelcome changes in the editorial work and workload once a new editorial model is undertaken, particularly when that model is based on TEI. This model sometimes sees the editor-as-encoder evolving also in the editor-as-programmer, the editor-as-web-designer and editor-as-(self-)publisher (Sutherland and Pierazzo 2011). These changes in the editorial work and role of the editors necessarily result in somewhat parallel changes in the final editorial products.

On the other hand the claim for the magic box seem to have receded somewhat, and we have witnessed the appearance of the interesting experience of creating configurable and standard-based tools that have the less ambitious goal of trying to help particular stages of the editorial work (collation, creation of stemmas and critical apparatus, transcription, annotation); this evolution is represented at best, in my opinion, by the tools developed within the Interedition (in particular with CollateX) and TextGrid projects.

This paper will briefly present the background outlined above, and then turn to fundamental issues that arise from it about the nature of editors and editing for digital editions. In particular, it will address the following questions:
  1. Which are the competencies necessary for digital editors?
  2. Which are the roles that digital editors are expected to cover?
  3. What do editors expect the technology to do for them?
  4. Which parts of the editors’ work should be assisted by the computer and which must still be performed in the traditional way?
  5. In which ways is digital editing different from traditional editing, if any?

Failing to understand how technology can really contribute to the editorial work will have serious consequences in the development and ultimately existence of digital editions.

The paper will address these theoretical and methodological questions making use of concrete examples, particularly from the Jane Austen Digital Edition and from the ongoing editorial experience of the Early English Laws project.

  • Bozzi, A. (2006). ‘Electronic Publishing and Computational Philology’. In The Evolution of Texts: Confronting Stemmatological and Genetical Methods, C. Macé, P. Baret, A. Bozzi and L. Cignoni (eds.). Pisa-Roma Istituti Editorali e Poligrafici Internazionali.
  • Pierazzo, E. (2010). ‘Editorial Teamwork in a Digital Environment: The Edition of the Correspondence of Giacomo Puccini’. In Rehbein, M. and Ryder, S. (eds.). Jahrbuch für Computerphilologie, vol. 10, pp. 91-110. Also available at: 
  • McLouglin, T. (2010). Bridging the Gap. In Rehbein, M. and Ryder, S. (eds.). Jahrbuch für Computerphilologie, vol. 10, pp. 37–54. Also available
  • Rehbein, M. (2010). ‘The Transition from Classical to Digital Thinking. Reflections on Tim McLoughlin, James Barry and Collaborative Work’. In Rehbein, M. and Ryder, S., (eds). Jahrbuch für Computerphilologie, vol. 10, pp. 55–67. Also available at: 
  • Robinson, P. M. W. (2005). ‘Current Issues in Making Digital Editions of Medieval exts  ¬– or, Do Electronic Scholarly Editions Have a Future?’. Digital Medievalist, 1(1). Available at: 
  • Sutherland, K., and Pierazzo, E. (2011). The Author’s Hand: from Page to Screen. In Deegan M., and McCarty W. (eds.), Collaborative Research in the Digital Humanities. Aldershot: Ashgate (forthcoming).
  • CollateX: 
  • Early English Laws:
  • Interedition:
  • Jane Austen Digital Edition:
  • TextGrid:

Monday 10 October 2011

What Texts (of Manuscripts) are, really

Ok, I give up. It is time for me to enter in the arena and give my definition of Text, as I gave it during my last paper at DH at Reading.
I argue that to determine what a text IS is not that complicated (?), what’s complicated is to establish how it works and how it relates to its “support”.
So, here is my definition.
A text is a linguistic architecture that conveys a meaning which is potentially understandable to at least one group of receivers which have the capabilities to decipher the code in which the message is encoded.
With this definition I connect the theory of text with the theory of communication, but, mind you I am only speaking of texts contained, bared by manuscripts.
Let's consider for a moment the following "classic" diagram of the theory of communication:

Source --> SENDER --> Channel --> RECEIVER --> Destination
Message                       Noise                             Message’

Is this model helpful to understand texts within manuscripts? Let's try to understand what these terms mean in our case (i.e. digitised texts contained in manuscripts). Let's then have for the Code
  • Language
  • Grammar
  • Syntax
  • Rhetoric
  • Orthography
  • Writing system, conventions…
Which are the factors that can make the code hard to decipher? The main are time and space, which is to say diachronic and diatopic variations which  force us to modify our diagramme as follows, inserting CODE' and CODE''.

                  CODE'           CODE         CODE''
Source --> SENDER --> Channel --> RECEIVER --> Destination
Message                          Noise                           Message’

Of course, this factors are only the most common one when talking of ancient texts transmitted by manuscripts, but they are not by any menas , the only one. In fact, this new diagramme looks familiar to humanities people, in particular if we substitute the CODE with the terminology introduce be De Saussure:

                 PAROLE       LANGUE      PAROLE
Source --> SENDER --> Channel --> RECEIVER --> Destination
Message                          Noise                           Message’

So, we have to add to differences in time and space also differences in understanding and personal usage of the language.
Let's now consider the Channel, which in our case can be understood as follows:
  • Scroll
  • Codex (Manuscript)
  • Printed book
  • The screen of a computer
  • The screen of a mobile phone
  • Audio
  • The eyes/brain (perceptive network)
And finally, let's consider the Noise, which, again in our case, can be found

  • In the writing system
  • In the writing conventions
  • In the style of writing
  • In the support
  • In the layout
  • In the screen colours
  • In the pronunciation

Is this schematisation helpful to understand what is going on when doing a transcription, when, that is, we separate the text from its support?

I think it shows, at very least, how much of interpretation and subjectivity this operation implies, pace all the supporters of the objectivity of the transcription. It also shows how many things can go wrong here and how much understanding and skills and business of transcription is...

How can we reduce the distance between Message and Message' ? Well, I think I'll keep it for another post!