Friday 11 May 2012

Genetic encoding at work

I get back to my blog after a long silence which has been determined by a rather busy month (March) with three papers given in three different continents on three different topics (1. Paris on Proust: see below; Providence on Modelling in Teaching; Canberra on the role of TEI on DH projects), al all of it in the middle of term. Nice. Than there was a rather deserved holiday (Australia seems to be better each time I go!), and MMSDA (April). Finally, catching up with loads of emails, deadlines, etc.

This post wants to relate on the content of the the first of the three conferences, i.e. the presentation that Julie André and I gave in Paris on the 1st of March Proust, l’œuvre des manuscrits. The conference was organised by the "Equipe PROUST" of ITEM-CNRS (Institut des Textes et Manuscrits modernes), with funding by the ANR Program CAHIERS-PROUST (Nathalie Mauriac Dyer, ITEM, dir.)

You can admire the prototype I have created at this address: You can download the XML and XSLT, if so you wish.

The idea that is at the base of this prototype is that in digital editions we have so far tried to reproduce print editions without engaging with the new medium in a fruitful or interesting way. Even the most sought-after type of online edition, such as the transcription presented side by side with the facsimile is not new at all, and shows quite a few limitations.
  1. It creates an alternative space which tries to mimic the original space, without ever being able to represent it in full; 
  2. It leaves to the user/reader the task of establishing the relationship between the transcribed and the inscribed text; 
  3. It is bound to present pages (and not, for instance, openings), given the constraint in width of the screen, an approach that, if applied to Proust’s Chaiers, will indeed falsify the documentary evidence which shows how Proust considered the double page as his writing space (have a look at these materials on Gallica: they are amazing!).
The normal type of publication format adopted for draft manuscripts is the ultra-diplomatic edition, which presents the transcribed text in a format that tries to mimic the layout of the manuscript page as much as possible. While this type of edition provides many advantages, it lacks one fundamental aspect: the dynamicity of the writing process.

So what, then? For the transcription we have used the new TEI elements for documentary transcription (I talk about this in another post), then I have used SVG to plot on top of the facsimile the transcribed zones of text, then I have used a bit of javascript to put a bit of animation into the output to reproduce the sequence of writing and the sequence of reading of such zones. I have also used color to mark uncertainty: are we sure about the temporal collocation of the sequences? the yellower the background the least sure we are.

I think this type of visualisation is definitely not perfect but it is interesting for many reasons: first, because it tries to do something that the print editions cannot do; second, because it doesn't present a coherent read-me-top-to-bottom type of text (which would be just wrong in this case); and third, because it takes the (facsimile of the) document as its structural support.
What's still missing? Quite a lot, actually, but in particular I can think of these few points now:

  • A way to represent the dynamic sequences across pages and documents: this can be easily doen in the XMl source, but not yet in the output
  • A way to drag the zones away in roder to read what's underneath
  • Microgenesis: timing writing and rewriting at word level.  
But this is for the next project!


  1. I think this is very nice, Elena, and I'd love to try out your model on some of the manuscripts I've been working on in such a detail as this, e.g. a couple of Henrik Ibsen's manuscripts. You write that you show certainty/uncertainty using colouring, and that's good (although for myself, I often find it extremely difficult to state more than either certain or uncertain :-). But what about alternative writing or reading sequences? Often it is difficult to decide whether the sequence is A-B-C-D or A-C-D-B and so on. Did you consider how to show this during your work with the prototype? If so, I'd be very interested in hearing about that. Thanks again for posting! Best, Hilde

  2. Hi Hilde, thanks! The TEI model for handling alternative sequences is still a bit shaky, in the sense we have not given to it enough thoughts as we did not had enough case studies. Do you have one? if so, let's discuss it and work to make the encoding better!