ECOLM - An Electronic Corpus of Lute Music

ECOLM I - End-of-project report

Edited and abridged for publication here

Research Report


During the final year of ECOLM Tim Crawford was joined by Michael Gale (recruited in February 2001) and David Lewis (recruited in March 2002), two recent King’s College MMus graduates. They have proved excellent colleagues, with complementary skills, a serious attitude to the research outcomes (both technical and musical) of the project, and very hard-working and conscientious. Thanks to the work of these two assistants, there has been good progress on establishing a basic ECOLM test corpus.

Mr Gale’s background as a player of the natural trumpet has borne fruit in an investigation of the relationship between certain lute pieces in ‘battle’ and related genres and the early trumpet ensemble repertoire of the late 16th century which will be published very soon (see M. Gale, ‘Remnants …’, forthcoming). His work on Dowland’s ‘Lachrimae Pavan’ with Tim Crawford was presented at the IMS conference in August 2002 and will be published in augmented form in the near future (see M. Gale and T. Crawford, ‘John Dowland’s Lachrimae Pavan �’, 2002).

Back to top

Progress on test-corpus-building

The total number of pieces encoded in the ECOLM project currently stands at around 300. These vary from short 16th-century dances to quite substantial fantasias and sonata movements from the 18th century. The base format for these encodings is TabCode, which has undergone some significant minor modifications during this development phase of ECOLM, although its basic design remains as it was at the outset.

The encoding method followed three different plans: a) manual encoding, using a simple ASCII text-editor; b) direct graphical input using Tim Crawford’s Tablature Processor software; and c) file conversion (by way of the Tablature Processor) of files generated by the music score-notation program, Nightingale, in some cases derived from MIDI files.

An important aspect of the encoding process could be described as ‘Quality Control’. The input data was validated as far as possible by constant vigilance as to correct encoding of the original appearance of the notation, with ‘comments’ in the code being used to record editorial additions and corrections. Thus the code essentially reproduces a faithful diplomatic record of the source, with all its errors.

Editorial material is included in the code as ‘comments’, but since this is effectively unstructured text, it carries no semantic machine-readable content. This makes it impossible to provide, for example, alternative ‘original’ and ‘editorial’ readings of the music in a particular source.

Although this may appear to be a structural weakness in the TabCode design, it could more fairly be described as a limitation in its scope, since TabCode was not originally intended for use in critical editing of this kind. The problem will be addressed by the use of a new XML-compatible version of TabCode (see below), wherein comments, editorial suggestions and corrections and alternative readings can be easily accommodated.

Further aspects of quality control include rigorous checking of the code by a second person. This has so far been carried out informally. Part of the reason for this is due to the limitations of the TabCode as described above. In the XML version, it is intended to include ‘header’ information about the encoding process, such as the name(s) of the encoder(s), date(s) of initial encoding and subsequent modification/correction, details of the state of the exemplar from which the encoding is done (original MS, microfilm, modern tablature edition, modern tablature edition, modern transcription, etc.), and so on. A very comprehensive scheme for such ancillary data exists in the ‘kern’ format which forms the basis of the Humdrum suite of music-analysis programs, and an adapted form of this will probably be used in the XML version of TabCode.

The most helpful form of data-validation has been found to be on-screen display (or paper print-out) combined with musical audition. By importing the code into the Tablature Processor, the encoder can see the tablature recreated on screen and/or print it out; he/she can also listen to a MIDI playback of the encoded notes. While such playback falls very far short of a musical performance of the work, a very large proportion of input errors can be heard immediately. (It is an interesting, yet unsurprising psychological fact that this works best where the original is largely free of errors - it is very hard to recognise transcription errors in music diplomatically encoded from a corrupt source).

Back to top

Bibliographical control and resources

Basic bibliographical control for the various sources, source-collections and input-files (TabCode, Nightingale, MIDI, graphic images, etc.) for the project has been provided in a relational database (Microsoft Access) designed by David Lewis. However, in discussions, we have felt that this is not as effective as we would have wished, and it is intended again to take advantage of the flexibility of XML representation. Indeed we hope to be able to adopt one of the document-collection-representation models already under development in various Humanities Computing projects, with suitable modification for our purposes. The Access database will continue to be maintained until a suitable moment for migrating the data into this new format.

A level of sophisticated bibliographical control is offered by the four existing volumes of Christian Meyer’s Catalogue des Sources Manuscrits En Tablature, to which Tim Crawford has been a contributor. While this cumulative catalogue is not yet fully available online, the indexes to it are provided on the WWW and offer a useful resource (lists of titles and names are currently available, giving cross-references to the pages of the printed CDSMT volumes). We carried out a preliminary experiment in presentation, which could easily be extended, with one of the manuscript descriptions originally done by Tim Crawford in 1998. A slightly enhanced draft of the CDSMT description was converted manually and arranged in a tabular format, the column of page-numbers being used to provide direct HTML links to the complete set of TIFF images of the manuscript itself (see below) stored on the web-server. The time taken to convert this catalogue description into a presentational framework was not great (perhaps two hours in all), suggesting that this would be a sensible way to proceed in future. Although a degree of automation can be built into this process, each source has to be considered on its own merits, and some aspects of the CDSMT (e.g. piece numberings) do not always coincide precisely with the ECOLM methodology. This means a lot of manual input in any case - it is not felt that the benefits of automation justify the amount of programming effort required to achieve it.


Back to top

Future possibilities in corpus-building

The decision to use an XML encoding scheme in future has other advantages. Planning has begun for a joint project with Frans Wiering (Utrecht University) and Philippe Canguilhem (Toulouse University). The idea is to develop an online version of Vincenzo Galieli’s treatise, Il Fronimo (editions of both 1568 and 1584) for incorporation into Dr Wiering’s Thesaurus Musicus Italiarum:


This has mutual benefit for all the participants, since i) the TMI has not hitherto had a scheme for encoding lute tablature, ii) the TMI is already an XML-based project, and iii) Dr Canguilhem’s recent PhD thesis on Il Fronimo is a ready-made source for editorial and critical commentary, making this an ideal test-case for an online musicological presentation. The book contains large numbers of music examples in conventional music notation (already catered for in the TMI encoding method) and lute tablature, which need to be inter-related.

During the course of the ECOLM project, contact was made with a French computer scientist working in Canada, Bernard St�pien, who in the 1970s and 80s undertook research using artificial intelligence techniques in transcribing lute tablature into conventional music notation. While this can be done quite easily for the modest requirements of a audio/MIDI playback, a transcription that analyses the music into separate ‘voices’, such as are perceived to exist by performers and listeners, is a much harder proposition. Mr Stepien has adapted his old Prolog code to handle a simple XML version of TabCode, and has been in discussion with Frans Wiering and Tim Crawford about future collaborations. (See: )

Encoding method and display; OMRAS integration

As well as minor adjustments and improvements to the TabCode definition (brought about by the ‘real-world’ experience of encoding significant quantities of data), some development of a new means of displaying tablature has been done. This uses the WWW scripting language Javascript, which is supported by most web browsers, to display tablature as a array of very small graphic images whose position on the page is controlled by the script, somewhat in the manner that normal text uses font characters. A somewhat crude test version of this can be seen and tested on the ECOLM web site (see: ), although it must be stressed that its capabilities are very limited in this first test version. The major advantage of this method is that it is platform-independent; although at present it only works satisfactorily with MS Internet Explorer, it gives reasonable results with both Macintosh and PC.

A public-domain program, Myrmidon, has been obtained which allows pages of tablature produced with Tim Crawford’s Tablature Processor to be saved as images embedded in HTML pages which can be viewed with a web browser. The results can be seen on the ECOLM web-site in the section which shows how the TabCode is used to encode tablatures (see: http://ECOLMtest/TabCodeTests/html/Lachrimaes.html ).

While at present the degree of automation is modest, it is possible to convert TabCode into the special music format used for music information retrieval within the OMRAS project. The current method involves an intermediate conversion into Nightingale Notelist format, but this probably can be eliminated eventually. An OMRAS experiment with 75 ECOLM-encoded versions of Dowland’s ‘Lachrimae Pavan’ is described briefly below.

Back to top

Specialised computer techniques

In order to devise an algorithm to recognise variant versions of a piece of lute music (the same applies to keyboard music, and in fact can apply to most polyphonic repertories) a method has to be found which can recognise similarity in what might be termed the ‘harmonic profiles’ of pieces of music. One possibility is to analyse the music and derive its functional harmony, expressed as figured bass, or with Reimann’s notation (I, IV, V etc). The output from such an analysis can then be treated as a linear string of characters to be matched with those stored in a database. There are some problems with this approach. Firstly, harmonic analysis is not an exact science and is recognised to be a subjective process: only rarely will two human analysts produce identical analyses of anything but the very simplest music. There is actually a range of conclusions that might be drawn about the harmony at any point in the music. Secondly, there is a serious difficulty in the ‘segmentation’ of the musical surface before a conventional analysis can proceed; it is not normal to assign an output analytical ‘label’ to every event in a musical sequence, and a sophisticated understanding of larger-scale structure is necessary to derive the essential ‘middleground’ harmonic profile of a piece. In short, conventional harmonic analysis cannot yet be performed reliably by computer. Thirdly, even if a suitable harmonic analysis method were available, there is no reliable ‘distance measure’ that could be used to determine how ‘similar’ two variant linear harmonic profiles of this kind might be.

A new analytical approach has been adopted by Tim Crawford which uses the concept of a probabilistic harmonic profile. Instead of a single symbol (I, IV, V) or verbal term (‘tonic’, ‘subdominant’, ‘dominant’ or ‘C major’, ‘F major’, ‘G major’) the method derives for each event (or collection of events within a beat- or time-based window) the relative likelihood that each of a preselected set of simple chords can describe the harmonic context for that event. The set of chords we use is simply the 24 major and minor triads rather than any more complex chords, as we have a reliable measure of the human-perceived ‘distances’ between members of this set in the work of the highly-respected music psychologist Carol Krumhansl. (C. Krumhansl, Cognitive Foundations of Musical Pitch, New York: OUP, 1990.) These relative likelihoods are expressed as probabilities, which allows the use of some very powerful state-of-the-art analytical techniques for matching large-scale sequences of such profiles. One of these is so-called Markov modelling, in which we build multi-dimensional structures which encapsulate the relative likelihoods of certain harmonic transitions occurring within a piece. A specially-adapted form of Markov model has been developed within the OMRAS project by Jeremy Pickens, and can be used for matching pieces of polyphonic music. The models can be compared very efficiently using a measure called the Kullback-Liebler divergence. The main point of this approach is that variant pieces with similar harmonic profiles produce similar harmonic models; minor local differences in harmony (caused by absence or presence of extra non-harmonic notes) are less likely to produce wide divergences in the models.

This method of harmonic modelling has been tested in experiments reported in two recent papers which make use of data from the ECOLM project. Pickens et al. 2002 reports on experiments using polyphonic audio input (both from actual piano recordings and from synthesised piano sounds) to match encoded scores in a database. Retrieval of four different groups of musical input was rigorously tested using standard evaluation techniques and found to perform extremely well. Two of these groups were based on lute music: 75 versions of Dowland’s ‘Lachrimae Pavan’ (in various scorings for solo instruments and ensembles and in various keys and ‘performed’ via MIDI on a synthesised piano), and 50 variations on the well-known ‘La folia’ ground-bass from the 17th and 18th centuries, about half of which are for the lute. Pickens and Crawford 2002 (actually written earlier than the above paper) gives more detail about the probabilistic harmonic profile method and also introduces a transposition-invariant version of the modelling technique, so that variants with a similar harmonic profile but in different keys can be matched. While there is much more work to be done, this technique shows every sign of being a very important step forward in the processing and retrieval of lute music in particular as well as for polyphonic music information retrieval in general.

Back to top

Preliminaries to real corpus-building

Discussions have continued on further additions to the ECOLM corpus. The librarian of the Royal Academy of Music has restated the Academy’s intention to cooperate with ECOLM as outlined in the project proposal, and to work with us on incorporating the extensive lute-tablature materials in the Robert Spencer collection.

On Tim Crawford’s visit (July 2002) to the Dolmetsch collection in Haslemere, Surrey, which houses three important lute manuscripts as well as one in tablature for the lyra viol, great enthusiasm was expressed for the ECOLM concept as being a way to make the materials in the collection available to scholars and performers. As soon as funding and circumstances allow, these manuscripts will be incorporated into ECOLM in digital-image and encoded formats.

Contacts with Polish scholars concerning the collections at Warsaw, Wroclaw and Krakow have continued, and were forwarded by three visits made to Poland in 2001 by Tim Crawford (see: T. Crawford, ‘Silvius Leopold Weiss and the improvised prelude’, Wroclaw, March 2001; T. Crawford, ‘Matching variations: first steps first steps towards a method for lute tablatures’, Warsaw, September 2001; T. Crawford, ‘Weiss Today and Tomorrow’, Wroclaw, November 2001).

At the IMS conference (Leuven, Belgium, July-August 2002) the participation of Michael Gale and Tim Crawford in an invited session on early instrumental music afforded the occasion for further discussion with Drs Dinko Fabris and John Griffiths about the possibility of incorporating into ECOLM a TabCode encoding of an important Neapolitan MS in lute tablature (Krakow/Berlin MS 40032) following the release of their forthcoming online/CD-ROM facsimile edition and transcription.

Back to top

Lute-music images

An unexpected bonus for the project was an offer from a Japanese digital-library researcher, Rei Atarashi (Communications Research Laboratory, Tokyo) of a collection of microfilms scanned onto CD-ROMs as graphic images. In due course six CD-ROMs arrived and were found to contain images of some 76 lute sources, mostly manuscripts as well as much other early-music material. The quality varied according to the state of the scanned films, but is generally excellent. Taking up the offer to scan up to 12,000 pages of our own material at no cost to ECOLM, we duly despatched 27 reels of microfilm (including many of those donated to Tim Crawford for this research by Dr Douglas Alton Smith) and have thus far received a further 15 CD-ROMS of lute music images whose quality is at least as good as the first batch; the number of lute sources in this second batch is around 115. This makes a total of almost 200 lute sources in medium-quality black-and-white images which form an excellent basis for corpus-building (see Atarashi and Crawford, 2002). At a very rough estimate, this might amount to as many as 12,000 pieces of music.

Although graphical images formed a small part of the original ECOLM proposal, it has seemed essential to devote some effort to investigating some of the new possibilities this unexpected gift presents. Thus a good deal of work has been done by Tim Crawford during the year on techniques for manipulating, converting and displaying these large images in web-pages, using the graphics program Graphics Converter and the web-browser scripting language Javascript.

A start has been made on integrating our graphic images with encoded material and bibliographical resources by ‘enriching’ a sample manuscript inventory from Christian Meyer’s Catalogue des Sources Manuscrits en Tablature with links to graphic images of the manuscript’s pages. A test version of a web display of the inventory can be seen at: .

At a pre-IMS conference meeting on computer techniques organised by Dr Frans Wiering, Ichiro Fujinaga (McGill University) presented the first results of his attempt to recognise characters in lute tablatures scanned from the images from one of our newly-acquired CD-ROMs. This was a page from the printed collection, Thesaurus Harmonicus (1603), a large source of central importance in the study of early 17th-century lute music. The results were highly impressive and suggested that a major step forward in corpus building could be achieved by relatively slight modifications to his optical music-recognition software. There remains a fair amount of work to be done (e.g. converting the program’s output to TabCode), but it should be possible at least to encode printed sources of this kind (printed from moveable type) automatically with a very high degree of accuracy.

Back to top

ECOLM I Output of Research

ECOLM-related publications

(asterisk * indicates direct outcome of ECOLM research)

T. Crawford, ‘Silvius Weiss and the Fantasia’, Dresden Lautentäge, Dresden, March 2000

T. Crawford, ‘Silvius Leopold Weiss and the improvised prelude: some evidence from Silesian sources’, X Miedzynarodowa Konferencje Naukowa Tradycie Slaskiej Kultury Muzyczne, Wroclaw 16-17 March, 2001 (publication in Polish forthcoming; English version = ‘Weiss and the improvised prelude’, below)

*T. Crawford, ‘Matching variations: first steps first steps towards a method for lute tablatures’, Study Group on Computer Aided Research of the International Council for Traditional Music, Warsaw, conference: Computer Aided Solutions to Analytical Problems, September 19-21, 2001

T. Crawford, ‘Weiss today and tomorrow’, conference on the lute music of S.L. Weiss, Wroclaw, November 2001

*T. Crawford, ‘Matching variations’, Study Group on Data and Computer Applications of the International Musicological Society, pre-IMS Congress meeting, Louvain-la-neuve, Belgium, 30 July 2002

*T. Crawford, ‘Building an Electronic Corpus of Lute Music’, 17th IMS Congress, Leuven, Belgium, 3 August 2002

*M. Gale and T. Crawford, ‘John Dowland’s Lachrimae Pavan in its European Context’, 17th IMS Congress, Leuven, Belgium, 5 August 2002

*T. Crawford, ‘The best musick in the world’ (on the ECOLM project), Digital Resources in the Humanities (DRH2002), Edinburgh, 9 September 2002

T. Crawford, ed., Silvius Leopold Weiss (1687-1750), Sämtliche Werke für Laute, vols 5 & 6 (Dresden MS, facsimile) (Cassel: Bärenreiter, forthcoming, 2002)

T. Crawford, ‘S.L. Weiss and the London and Dresden manuscripts of his music’, Journal of the Lute Society of America, special issue for 1998-2000, Silvius Leopold Weiss: Life, Works, and Instruments, vol. 2, ed. D. Smith (forthcoming 2002)

T. Crawford, ‘Weiss and the improvised prelude’, Journal of the Lute Society of America, special issue for 1998-2000, Silvius Leopold Weiss: Life, Works, and Instruments, vol. 2, ed. D. Smith (forthcoming 2002)

T. Crawford, ‘S.L. Weiss’s use of the 12th and 13th Courses’, Journal of the Lute Society of America, special issue for 1998-2000, Silvius Leopold Weiss: Life, Works, and Instruments, vol. 3, ed. D. Smith (forthcoming 2003)

*M. Gale, ‘Remnants of some late sixteenth-century trumpet ensemble music’, Historic Brass Society Journal, xiv (forthcoming, 2002) [based partly on lute arrangements of ‘battle’ music]

*R. Atarashi and T. Crawford, ‘Early Music Microfilm Archive’ (in Japanese)

Back to top