Friday 11 May 2012

Genetic encoding at work

I get back to my blog after a long silence which has been determined by a rather busy month (March) with three papers given in three different continents on three different topics (1. Paris on Proust: see below; Providence on Modelling in Teaching; Canberra on the role of TEI on DH projects), al all of it in the middle of term. Nice. Than there was a rather deserved holiday (Australia seems to be better each time I go!), and MMSDA (April). Finally, catching up with loads of emails, deadlines, etc.

This post wants to relate on the content of the the first of the three conferences, i.e. the presentation that Julie André and I gave in Paris on the 1st of March Proust, l’œuvre des manuscrits. The conference was organised by the "Equipe PROUST" of ITEM-CNRS (Institut des Textes et Manuscrits modernes), with funding by the ANR Program CAHIERS-PROUST (Nathalie Mauriac Dyer, ITEM, dir.)

You can admire the prototype I have created at this address: You can download the XML and XSLT, if so you wish.

The idea that is at the base of this prototype is that in digital editions we have so far tried to reproduce print editions without engaging with the new medium in a fruitful or interesting way. Even the most sought-after type of online edition, such as the transcription presented side by side with the facsimile is not new at all, and shows quite a few limitations.
  1. It creates an alternative space which tries to mimic the original space, without ever being able to represent it in full; 
  2. It leaves to the user/reader the task of establishing the relationship between the transcribed and the inscribed text; 
  3. It is bound to present pages (and not, for instance, openings), given the constraint in width of the screen, an approach that, if applied to Proust’s Chaiers, will indeed falsify the documentary evidence which shows how Proust considered the double page as his writing space (have a look at these materials on Gallica: they are amazing!).
The normal type of publication format adopted for draft manuscripts is the ultra-diplomatic edition, which presents the transcribed text in a format that tries to mimic the layout of the manuscript page as much as possible. While this type of edition provides many advantages, it lacks one fundamental aspect: the dynamicity of the writing process.

So what, then? For the transcription we have used the new TEI elements for documentary transcription (I talk about this in another post), then I have used SVG to plot on top of the facsimile the transcribed zones of text, then I have used a bit of javascript to put a bit of animation into the output to reproduce the sequence of writing and the sequence of reading of such zones. I have also used color to mark uncertainty: are we sure about the temporal collocation of the sequences? the yellower the background the least sure we are.

I think this type of visualisation is definitely not perfect but it is interesting for many reasons: first, because it tries to do something that the print editions cannot do; second, because it doesn't present a coherent read-me-top-to-bottom type of text (which would be just wrong in this case); and third, because it takes the (facsimile of the) document as its structural support.
What's still missing? Quite a lot, actually, but in particular I can think of these few points now:

  • A way to represent the dynamic sequences across pages and documents: this can be easily doen in the XMl source, but not yet in the output
  • A way to drag the zones away in roder to read what's underneath
  • Microgenesis: timing writing and rewriting at word level.  
But this is for the next project!

Thursday 26 January 2012

Digital Humanities seen from the outside: a Fish out of water

On this post I would like to reflect on the way Digital Humanities are seen from the outside and the consequences of misunderstandings.

Apparently, seen from outside, we are those people counting words and detecting hidden meanings from numbers and statistics; this method is seen as being in contrast with the more traditional literary interpretation (close vs. distant reading, to say it with a slogan). Of this opinion seems to be Stanley Fish. In his blog post Mind Your P’s and B’s: The Digital Humanities and Interpretation he reports on a DH-like analysis of the Aeropagitica of John Milton, where he studies the "the dance of the “b’s” and “p’s” on a given passage. In the end, he concludes that DH-like analysis is not his piece of cake:
But whatever vision of the digital humanities is proclaimed, it will have little place for the likes of me and for the kind of criticism I practice: a criticism that narrows meaning to the significances designed by an author, a criticism that generalizes from a text as small as half a line, a criticism that insists on the distinction between the true and the false, between what is relevant and what is noise, between what is serious and what is mere play. Nothing ludic in what I do or try to do. I have a lot to answer for.
Well, there is nothing wrong in the fact that DH is not everybody's piece of cake. I can live with that, pretty easily, as it happens. The problem is that  to do an effective criticism, you should actually know what you are talking about. Mark Liberman has in fact run a test on the very premise of Fish argument and has discovered that in that passage:

  • The number of 'p's and 'b's is only 1% higher of the average number of 'p's and 'b's in the whole text
  • There are passages that contain even more 'p's and 'b's
  • There are letters that show similar patterns, such as 'x's and 'y's
  • There are letters that show even bigger picks, such as 'l's
A.k.a.: to do a DH-like research you should use DH tools, i.e. use a computer! Had Fisher used a programme for his own research he could has spared himself a bit of ridicule. To do DH-like research, you should be able to do it, actually. DH are not approaches and theories only, they are practice as well (see my definition of DH on a earlier post). Turns out that to count words (or letters) you actually have to count them.

What do we learn from this? Two main lessons, I think.
First, that we have to reflect on our image and the way we present our research and ourselves to people that take more traditional approaches to scholarship. Second, that if you want to criticise something you have to make sure you have done your homework (something I discussed in another post). The problem is, I think, is that Fish *has*, in my opinion, a point here, namely that the statistical, computational approach is not for everybody (he doesn't say that it is not useful, only that is not for him) and that there is still a lot of values in doing things traditionally.
But if you want to make a point, make sure your argument is solid, otherwise people will make fun of you, missing something potentially interesting.

Are my students listening?

Wednesday 25 January 2012

Research without Borders and the TEI

Last Friday (20 of January) I have been invited by Marjorie Burghart to give a lecture in Lyon as part of a two days DH event, called L’édition électronique dans tous ses états : évolution des pratiques, évolution des besoins (details of the event here and poster here).
It has been lot of fun, in particular because I have organised a role game and everybody was very involved.
I have also had the opportunity of investigating one of my favorite topics: why on earth people spend time (and money) to work for the TEI when all this work is not credited i.e. the name of who got the idea is not recorded anywhere, it is just represented as the collaborative effort of The TEI (a.k.a. the Technical Council + the SIG + TEI-L + etc. etc.)? This si not what academics normally do, right? And even more so, why on earth Institutions accept this to happen?
On a personal level, the best answer to these questions is, in my opinion, because working to improve the TEI is fun, you have the opportunity of meeting with exceptionally gifted researchers from all over the world and, even if you cannot immediately quantify or point at something specific, your research is affected by this. Mine has: I think I am a much better researcher as a results of my past 10-odds year of work with the TEI, as part of the SIG, the Council and now the Board.
At institutional level, the reason is that the TEI is recognised as one of the foundational bases of DH, of which we are all collectively responsible.
Yes, the TEI has a lot of open issues (last summer putsch is a luminous example of this), but, as always, I think the best way to solve the problems is to get involved. So, au travail mes amis!

Here are the slides of the presentation, in French though... apologies to all non-French speakers and to all French speakers as well (quality of the language is, well, you'll see!).

Monday 23 January 2012

Medievalists on the making and the digital

For the past few years I have been lucky enough to be involved in a wonderful training course, MMSDA, i.e. Medieval Manuscript Studies in the Digital Age. This course is offered for free to UK PhD students which have to work with medieval manuscripts and are interested in the digital stuff. We have now run the course for three years with an exceptional success which we mesure in the number of applicants  (65, in the first year, 42 the second and 28 the third) and their enthusiasm and commitment. The main brain behind this initiative is Peter Stokes (yep, my Peter Stokes).
The course was initially funded by the AHRC, so we were forced to offer it only to UK-based students, but, from the very first time we run it, we were aware of a much larger interest out there. This is the reason why we have sought alternative funding and we were finally lucky enough, thanks to the hard work of Charles Burnett form the Warburg Institute, to secure some substantial funding from a COST Action project,   IS1005, 'Medieval Europe - Medieval Cultures and Technological Resources'.

So we have opened the application to European Countries. Results? we had 90 applications (yes 90!!) from 18 countries for 20 places. The quality of the applicants where outstanding, I have never had to make more difficult choices, really! We have just been through them all and sent the list of successful candidates to the COST office for approval, then we will communicate the results.

This experience is telling me a few things:

  1. There are some amazing young researchers out there, we will have some stiff competition quite soon.
  2. Many people that have to engage with manuscripts lack appropriate training. Even at PhD level for many a manuscripts is little more than a support for a text. 
  3. Young researchers are desperate to acquire essential digital skills (we teach XML, TEI, imaging, so nothing very sophisticated, but very desirable, as it seems)
  4. We (i.e. the organisers) have willingly left out from the course soem essential topics: Greek, Hebrew, Arabic, Glagolitic, Cyrillic... all of this languages and scripts and traditions and manuscripts are part of our common European culture, but we tend to, quite conveniently, forget it... In our case it was mostly due to lack of time (there are just so many things you can fit in 5 days, you know), still there is something to keep in mind here, I think
Food for thought...