(3.1) Text Encoding

(3.1.1) Text Structure vs. Encoded Structure - Dealing with Mixed Genres and Ambiguous Texts

Karl Johan Sæth
Ingrid Falkenberg
Ellen Margrete Nessheim
Stine Brenna Taugbøl
Mette Gismeroy Ekker
University of Oslo, Norway


'Henrik Ibsen's Writings' aims at producing and publishing both an electronic version and a book version of all Ibsen's writings: dramas, poems, drafts, letters, articles, notes. All manuscripts and editions in Norwegian/Danish from the playwright's lifetime will be encoded using SGML/TEI. Our encoding is rather detailed, as we wish to reproduce every text witness in full accuracy.

In this paper we will examine some empirical problems which have surfaced when encoding text structure and textual features in Ibsen's plays. We will relate these to problems concerning how to encode the textual features in Ibsen's 'Norma' (1851). Based on genre one could define this text as a drama, but it could also be considered a covert article. An analytical perspective on the text would expose a good mixture of different genre elements. The text's physical representation, i.e. typography, shares the same ambiguity. These different approaches all form different 'logical' structures which overlap each other, and either approach forces us to consider an unfortunate priority in the encoding. To solve these problems we have looked into different solutions for encoding text structure and overlapping text features. Briefly we have asked the basic questions: What is text, what defines structure and other textual features, and how does this affect the encoding of texts?

These questions have been much discussed within the text encoding society in the past, but the approach in this paper is somewhat different from much of the earlier work. We will try to examine the questions from a more practical point of view, with focus on the encoding of 'Norma', and we hope that we thus will be able to add a new perspective to the discussion.

Ibsen's 'Norma':

The problems of overlapping and ambiguous textual features and structures are particularly manifest when one has to deal with texts with mixed features which cannot easily be combined. Ibsen's 'Norma', which was written and published in the newspaper 'Andhrimner' in 1851, is an example of such a text. Choosing which textual features to encode in 'Norma' is not easy, as it is problematic to define the text as a drama, but even more problematic to define it as something else. We have chosen to regard it as a drama, but several textual elements in 'Norma' are difficult to incorporate in the drama structure. Particularly problematic is e.g. a 'speech' from 'the curtain' (which has no reference in the cast list) typographically rendered not as speech, but as a stage direction. Other problems concern footnotes (both in the cast list and in the speeches), speeches in brackets that do not seem to be 'asides' (asides are marked by stage directions) etc. These problems, and others like them, will be presented and further discussed at the conference.

Encoding Text Structure:

The nature of a text has been much discussed, in several different contexts. One answer to what a text is, is that it is made of several interwoven features, or content objects, which together form 'a text'. The nature of a text is thus complex, and it seems difficult, or even impossible, to find a single structure that 'is' the text. Some would argue that texts even seem to be able to include things 'outside' themselves. Another view on the text, much discussed within the text encoding society, is the claim that 'text is an ordered hierarchy of content objects', the so called OHCO-thesis. This claim has been thoroughly examined by Renear et al. (1993), and further discussed e.g. in Biggs & Huitfeld (1997). We will leave this discussion aside here, and only point out that the OHCO-thesis (at least in its simplest form) may be appealing from a text encoding perspective, but that it is far from unproblematic.

When encoding texts one generally chooses either declarative markup languages based on SGML, or its subset XML. We are using SGML/TEI, but are considering moving over to XML. As Sperberg-McQueen & Huitfeldt (1998) have pointed out, SGML markup in its simplest forms uses a straightforward model for markup: elements nest within each other so that the SGML document forms a hierarchical structure. The basic model of SGML associates single occurrences of features with single SGML-elements. Tagging a textual element as a SGML-element of a particular type and giving it particular attribute values, thus claims that this element exhibits the textual features associated with that same element type and those attribute values. The relationship between SGML element types and text features depends, however, on the encoder's understanding and interpretation of the genre, structure and perspective of the text. The encoder defines text objects in elements matching the chosen hierarchical structure.

One of the large challenges for application of SGML to existing texts is finding suitable representations in SGML's tree-based data model for multiple hierarchies and textual features which overlap each other. Such overlapping textual features seem to be an inescapable fact of textual life, but present a problem in the simple SGML-model because while two textual features/hierarchies may overlap, two SGML-elements may not.

Encoding Overlapping Features:

The problem of overlapping features has been discussed before. In the TEI Guidelines there are several ways to overcome some of the problems: one may e.g. use 'milestone' elements, in which a feature is predicated by the span of text between one milestone element and the next. Other techniques rely on the fragmentation of one element into multiple SGML elements and then knitting the fragments into a whole; this is the method used for example in the part attribute of the <l>-element and in the <join>-element. There are also several possibilities permitting 'true' overlapping features. Within SGML/TEI the 'concur'-feature allows a document to be marked up concurrently using more than one DTD, with each tag labelled with the name of the DTD to which it belongs. Other none-SGML systems include MECS (Multi-Encoding-System), a system developed at the Wittgenstein Archives at the University of Bergen. MECS permits any two codes to overlap, but has on the other hand no specific document grammar, and therefore no SGML-alike document validation is possible.

We have chosen not to use such systems in our project, as they do not seem suitable for our purposes, partly because they make text encoding too complex. Furthermore, encoding 'true' overlapping structures with concur, MECS, or similar systems, does not really solve the problem of deciding how to structure the text: you have the possibility to encode overlapping features, but you still have to decide which feature(s) to encode, and even which not to encode.

The boundaries of the different features are not always clear and text features seem to exist both dependent and independent of each other. Our project uses the first printed editions as base texts for the edition. When encoding texts, we try to reproduce these texts as accurately as possible. When choosing to use standard TEI, we thus have to deal with the problem of encoding the textual features into hierarchical SGML-documents. This also means choosing which text features to encode and which not to encode, when that is necessary.

Encoding 'Norma':

In the case of 'Norma' we have ended up encoding the text as an ordinary drama. The additional features of the text that do not seem to be part of a 'normal' drama structure, but still can easily be encoded into the drama structure (e.g. footnotes and speeches in brackets), are also incorporated, even though this means 'violating' the logical structure of a drama. On the other hand additional features that depend on redefining the whole conceptualization of dramas (e.g. speeches in stage directions) so far are ignored in the encoding (but documented elsewhere, e.g. in the header of the document). 'Norma' is an ambiguous text, and while we want to incorporate as much as possible of this ambiguity in the encoded version of it, not all ambiguity can be kept, and however the encoding is done, the encoded text could possibly be ambiguous in new ways.

It may seem like a paradox that the encoding of features necessary to give possibilities for electronic processing and analysis of texts, at the same time includes interpretation that may restrict the use of the texts. There is no simple way out of this problem, but if the interpretation in the encoding is restricted at a reasonable level, the encoding can open up the text more than delimit it. Renear et al. (1993) state that 'It should be a commonplace that machine-readable texts are "subjective" and "interpretive", but not especially subjective or interpretative.', and that encoding a text in this aspect is much like making a traditional edition.

Biggs, Michael and Huitfeldt, Claus (ed) (1997). "Philosophy and Electronic Publishing. Theory and Metatheory in the Development of Text Encoding" Monist 80:3. <http://hhobel.phl.univie.ac.at/mii/mii/mii.html>
Ibsen, Henrik (1851). "Norma", in "Andhrimner" 9 & 10. 1851.
Renear, Allen, Mylonas, Elli and Durand, David (1993). "Refining our Notion of What Text Really Is: The Problem of Overlapping Hierarchies". <http://www.stg.brown.edu/resources/stg/monographs/ohco.html>
Sperberg-McQueen, C. M. and Huitfeldt, Claus (1998). "Concurrent Document Hierarchies in MECS and SGML", ALLC/ACH98, Debrecen. <http://www.oasis-open.org/cover/sperbergACH98.html>
Sperberg-McQueen, C. M., and Burnard, Lou (eds) (1994). "Guidelines for Text Encoding and Interchange" Text Encoding Initiative, Chicago and Oxford.

Return to ALLC/ACH Programme

(3.1.2) Perdita's Progress: Raising Standards in a TEI-based Approach to Cataloguing Early Modern Manuscripts

Jill Seal
Nottingham Trent University, UK

Claire Warwick
Sheffield University, UK

Elizabeth Clarke
Nottingham Trent University, UK

The Perdita Project was established in 1997 at the Nottingham Trent University, by Elizabeth Clarke and Victoria Burke of the English & Media Studies department, with Martyn Bennett of the History department, funded initially by the Nottingham Trent University. A substantial AHRB grant in 1999 meant that we could appoint research fellow Jonathan Gibson and researchers Jill Seal and Gillian Wright. Claire Warwick from the Department of Information Studies at Sheffield is electronic publication consultant to the project, which runs until 2002.

We regard this paper as a collaborative exercise. In it we aim to introduce some of the dilemmas which we have faced during the project. We hope that our progress may be of interest to those at the conference, and that our discussions with the humanities computing community will also be of help to us in the continuation of the project.

The Perdita Project is producing a comprehensive guide to manuscript compilations of early modern women. We are carrying out research on over 450 manuscripts written or compiled by women, which include miscellanies, commonplace books, account books, medical and cookery receipt books, religious writing, and autobiographical material. Our descriptions of the manuscripts will be encoded in SGML to allow extensive searching capacity, and will be published on the Internet in 2002.

The interdisciplinary nature of the project is vital to our work, and is something that we will discuss throughout the paper. By describing previously unpublished materials by women of the 16th & 17th centuries, we aim to be a resource for scholars both literary and historical, enabling access to manuscript sources which are often very difficult to trace in comparison with published texts. We see ourselves as part of the movement to rewrite literary history, moving the emphasis from printed text and male and/or canonical works to a fuller picture of the writing of the period (Ezell, 1993). Electronic publication also allows us to address important issues to do with the dissemination of text in an electronic medium. This seems uniquely appropriate, since as Woodmansee (1994) argues, the transmission of scholarly electronic text shares some features with the coterie traditions of manuscript dissemination which we are studying in the early modern period.

In the paper we shall explore some of the challenges we face in attempting to combine the two traditions and when working with the different media of manuscript, print and electronic text. We will discuss why we have chosen to encode manuscript descriptions rather than the texts themselves, and how far we should interpret the manuscripts in the descriptions that we provide. To what level, and in what way should these descriptions be encoded, and how will this affect their usefulness? How far should we try to consider the present user community? Standards, whether in manuscript description or in electronic text encoding, are also vital to our work.

Projects which deal with the electronic publication of manuscripts tend either to provide users with digitised images of manuscript pages, and/or to transcribe and encode the text. However, we have chosen a different strategy, since we are not presently intending to transcribe entire manuscripts. Rather, we will present descriptions, which will take the form of an extended catalogue entry, including a list of contents, a physical description, and a biographical article on the compiler(s). We believe that there are valid scholarly reasons for doing so.

There is already a certain amount of literary text available in electronic form, much of which is lacking any kind of commentary or contextual material. Given the time and financial constraints of our project, we therefore preferred to concentrate on a more novel research area. Our methodology has been designed as a response to the shift in focus in manuscript studies from the search for authoritative texts to the historical circumstances of manuscript production and circulation (Beal, 1980-, Marotti, 1995, Woudhuysen, 1996, Hobbs, 1992, Love, 1993). Rather than simply producing large amounts of transcribed text with no accompanying commentary or contextual research, we prefer to make an important scholarly contribution to this research area.

We also consider that it is important that our resource should lead scholars to visit archives, and consult the manuscripts themselves, when possible. Since the provision of digital surrogates tends to increase the amount of usage of the original material (Lee 1998, Chapman, Kingsley & Dempsey, 1999), we feel that out efforts should be directed to descriptive scholarly research to aid researchers in their use of original documents.

We do, however, acknowledge the problematic nature of our task and of classifications such as authorship, function and gender in looking at manuscript compilations, and pledge ourselves to giving "as much information as possible to facilitate useful readings of the manuscript compilations". But what are "useful" readings, how much information should we give, and in what form? We therefore intend to conduct a study of our potential user community in collaboration with Sheffield University DIS to try to answer these, and other, questions.

However, we are aware of the potential problems of trying to ensure that the resource remains usable and accessible by a community of future users whose needs we cannot hope to predict. This means that we need to apply, and in some cases set, standards in various areas. Most obviously, we must apply the highest standards in manuscript cataloguing and description. There are also other areas which we are particularly well-equipped to explore, for instance, a standard vocabulary for describing handwriting. Most fascinating of all is the question of what a woman's hand might look like. Electronic delivery will provide us with an ideal opportunity to contribute to this discussion, by providing visual samples of women's hands.

We are also concerned with the standards necessary for electronic publication. We must be aware of how far the standards necessary for text encoding and useful searching impose interpretations on the manuscript, since already, in our editing and cataloguing process, we are working at several removes from the original text (see fig. 1).We are conscious that the decisions we make in encoding the descriptive material must not be so prescriptive that they hinder usage, but at the same time we aim to aid searching by appropriate markup. We therefore approach this project in the spirit of the text-encoding initiative. However, further complications are caused by the fact that what we are encoding is essentially metadata, not simply transcribed text.

We would like to explore the ideological, conceptual and practical differences between the TEI, metadata systems, and the use of controlled vocabulary. In the world of electronic resources there appears to be a culture clash between a post-structuralist, qualitative 'search for anything you like - create your own text' ideology and the controlled, quantitative approach to classifying objects taken by museums. This may represent the difference between dealing with text and dealing with objects, or the difference between English and History, where computing projects tend to deal with statistics. However, we at Perdita are faced with the problem of trying to balance the two approaches. We are aware that the TEI header is ideal for a project which wants to shape the data in the form of the original text, even if it does not encode it. Some historians are now beginning to recognise that 'data' is deeply embedded in text, which is why we believe that TEI is the best ideological option.

Yet, database pioneers such as The Getty Institute in LA are encouraging us to adopt database methodology and to use terminology and Thesauri, because of their more systematic nature. This view is supported by research by DIS at Sheffield, which suggests that when constructing and using metadata, many users find the lack of a controlled vocabulary inhibits the ease with which they can search electronic resources. (Whittaker 1999) At the British Women Writers' Conference at Albuquerque in September, everyone recognised the need for a standard set of keywords. Unfortunately no such resource yet exists.

We have therefore decided to view our work in the light of future users. We will describe our attempts to combine both standards and to provide some sense of the original text, using TEI markup, with a standardised search vocabulary for ease of searching.


Beal, P. (1980-). Index of English Literary Manuscripts (2 vols, 2 pts). Mansell, London.
Chapman, A., Kingsley, N. and Dempsey, L. (1999). Full Disclosure. Releasing the Value of Library and Archive Collections. UKOLN, Bath.
Ezell, M. (1993). Writing Women's Literary History. Johns Hopkins U. P., Baltimore
Hobbs, M. (1992). Early Seventeenth-Century Verse Miscellany Manuscripts. Scholar Press, Aldershot.
Lee, S. (1998). Scoping the Future of Oxford's Digital Collections. <http://www.bodley.ox.ac.uk/scoping>, last accessed 4/5/99.
Love, H. (1993). Scribal Publication in Seventeenth-Century England. Clarendon Press, Oxford.
Marotti, A. (1995). Manuscript, Print, and the English Renaissance Lyric. Cornell U. P., Ithaca.
Whittaker, S. (1999). The construction of Dublin Core Metadata by non-specialist users. Unpublished MA dissertation, University of Sheffield.
Woodmansee, M (1994). "On the Author Effect: Recovering Collectivity". In Martha Woodmansee and Peter Jaszi (eds) The Construction of Authorship: Textual Appropriation in Law and Literature. Duke U. P., Durham, NC.
Woudhuysen, H. R. (1996). Sir Philip Sidney and the Circulation of Manuscripts 1558-1640. Clarendon Press, Oxford.

Return to ALLC/ACH Programme

(3.1.3) Image Description at the William Blake Archive

Kari Kraus
University of Rochester, USA

The larger objective of this paper is to follow the lead of Matthew Kirschenbaum and the participants in his panel at last year's ALLC/ACH conference in Charlottesville, Virginia, in thinking about visual electronic resources as structured data. More specifically, I propose to turn the spotlight on a significant portion of the William Blake Archive's metadata: its SGML-encoded image descriptions. Introductory remarks will emphasize the image descriptions as a design feature intended to maximize use of and complement the Archive's other image resources, but the balance of the paper will consider more foundational issues: what exactly are image descriptions in formal terms? What is their relationship to their first-order objects? What kinds of problems do they present for the project team? What functions do they serve? My aim is to provide a broad overview of our practices and the challenges we face (both theoretical and practical) in describing images. The full version of this presentation will attend closely to the discursive features of the image descriptions and their principles of inclusion of pictorial data on the grounds that they serve as a barometer of the general reliability of the search and retrieval functions.

The image descriptions and their characteristic terms - which are best considered as a unit - are an integral part of the Archive's paradigm of image search and retrieval. General descriptions are available via a link to a separate window from all Object View pages in the Archive, and more specific descriptions are returned in the course of an image search. The descriptive commentary is also available to the user who invokes an Inote session from any of several windows and pages. (Inote, which has received much press in humanities computing circles since its public debut, is a java-based image annotation tool developed at IATH that superimposes a four-quadrant grid on an image. Clicking an area of interest opens a separate annotation viewer containing the editorial commentary targeted to the selected region.) The image descriptions are conceptually inextricable from the Archive's characteristic terms, a menu of which the user selects from when launching an image search. Though it is tempting to devote a portion of the paper to an evaluation of how the Archive's controlled vocabulary measures up when compared to other established classification and vocabulary browsers (such as IconClass, the Art and Architecture Thesaurus, and the Library of Congress Thesaurus for Graphic Materials), that subject is sufficiently complex to warrant separate treatment at some future date.

Strictly speaking, the Archive's search engine consults only the SGML-encoded characteristic terms when returning hits to a user. The exact function of the prose that appears between <illusobjdesc> tags is more difficult to define. My provisional account of the purpose of the descriptions is threefold: first, they illustrate that the visual-to-verbal transposition isn't a simple one-step operation: more complexly, images are first captured in prose and from the prose we extract smaller searchable units. Second, the descriptions provide a system of checks and balances that allow the user to cross-reference them with the characteristic terms, making the underlying logic of the hits returned in the course of a search session explicit. Third, they offer us a space where Willard McCarty's metadata rule of disambiguation can be violated: as I will explain in more detail in my talk, a fair amount of uncertainty and doubt is built into the descriptions as a response to iconographic ambiguities in the source material; this interpretive uncertainty is, if not effaced entirely, at least diluted at the characteristic level. These functions aside, it is my sense that the descriptions aren't being mined for information as effectively as they could be, though the beta version of WBA 2.0, scheduled for public release later this summer, should make it possible to harvest their data in new ways.

Because the Archive's image search and retrieval software, Dynaweb, must consult an SGML information base that is, of course, textual rather than pictorial in content, the descriptions and their corresponding characteristic terms serve as a verbal proxy for their visual objects. To put it otherwise, the annotations function as metatext (rather than, as in conventional print relationships, paratext) to the primary data, creating an unusual onus of responsibility for the editors and assistants to provide descriptions and terms that show as much fidelity as possible to the originals in order to guarantee optimum results for the user searching across the visual collection.

This process of faithfully, accurately, and thoroughly translating from one medium (the visual) to another (the verbal) is, as one would expect, beset by difficulties. But we are not without precedent or aid. In particular, the problems we encounter in the course of composing our image descriptions can be profitably understood by looking to the growing body of literature in Humanities Computing on encoding transcripts of source materials. My suggestion assumes that at heart the two endeavors (describing images and transcribing texts) are first and foremost acts of translation - the former from pictorial data to linguistic data, the latter from - to take the example of the Beowulf Project - chirographical marks in a codex manuscript to the descriptive codes of SGML - and as such are prey to all the difficulties inherent in the translation process.

At last year's ALLC/ACH conference, Paul Caton and Julia Flanders raised the issue of the measurability of data as an important consideration for project teams interested in encoding renditional information: they stressed the necessity of

"having a clear sense of what the units of information are, how to identify and distinguish them consistently, and how to record them accurately. It [the criterion of measurability] represents/requires an attempt to decide . . . what threshold of perceptibility will be maintained: . . . what will be considered either too small or too costly or difficult to record."

At the Blake Archive, our efforts to extract iconographic content from the source material for the purposes of image description are routinely complicated by just the kinds of measurability and perceptibility difficulties outlined by Caton and Flanders. Should Blake's interlinear motifs be captured in as much detail as the large-scale designs? Should the hatched lines that appear meaningless in the 100 dpi image but representational in the 300 dpi be described (and thus made searchable)? Should we even avail ourselves at all of the 300 dpi images as resources to transform our unaided human eyes into Argus eyes, or is this approach contrary to the central tenets of diplomatic editing? What about the left arm of that central figure on plate 16 in copy D of The Marriage of Heaven and Hell, which was printed but colored on the impression so that it blends almost imperceptibly with the adjacent figure's gown? Is that wispy line extending from the ascender of the letter "b" a bird or a vine, or is it both? Indeed variations on the last question may reflect the most common kind of perceptual ambiguity the annotator of Blake encounters, attributable in part to Blake's almost profligate use of metaphors; his peculiar syncretic insight makes it dangerous to ever rule out the possibility that a design motif has multiple referents: flora and fauna are often indistinguishable, a human figure may be a composite of several characters, and a lock of hair easily morphs into a coiled serpent.

The final version of the paper will address our working solutions to the problems enumerated in the foregoing paragraph. I will conclude by briefly shifting the emphasis from the creation of image descriptions to their potential use, offering some parting thoughts and caveats on the kind of knowledge that the image descriptions might produce when used in conjunction with the Archive's other image tools and resources.


Caton, Paul and Flanders, Julia (1999). "Encoding Rendition in Primary Source Texts Using TEI." Unpublished essay, 1999. Presented at ACH-ALLC 99. University of Virginia, Charlottesville. 12 June 1999.
Eaves, Morris, Essick, Robert N., and Viscomi, Joseph (eds) The William Blake Archive <http://www.iath.virginia.edu/blake/>.
Inote. Software from the Institute for Advanced Technology in the Humanities. <http://www.iath.virginia.edu/inote/>.
Kirschenbaum, Matthew, et al. (1999). "Refining Our Notions of What (Digital) Images Really Are." ACH-ALLC 99. University of Virginia, Charlottesville. 10 June 1999.
McCarty, Willard. "Humanities Computing as Interdiscipline." <http://www.iath.virginia.edu/hcs/mccarty.html>.

Return to ALLC/ACH Programme

(3.2) Computational / Corpus Linguistics

(3.2.1) Kirrkirr: Software for Browsing and Visual Exploration of a Structured Warlpiri Dictionary

Christopher D. Manning
Stanford University, USA

This paper discusses the goals, architecture, and usability of Kirrkirr, a Java-based visualization tool for XML dictionaries, currently being used with a dictionary for Warlpiri, an Australian Aboriginal language.

While dictionaries on computers are now common, there has been surprisingly little work on innovative ways of utilising the capabilities of computers for visualization, hypertext linking and multimedia in order to provide a richer experience of dictionary content. Most electronic dictionaries present the search-dominated interface of classic information retrieval (IR) systems, which are only effective when the user has a clearly specified information need and a good understanding of the content being searched. The ability to browse often makes paper dictionaries easier and more pleasant to use than such electronic dictionaries. Search interfaces are ineffective for information needs such as exploring a concept. Some work in IR has emphasised the need for new methods of information access and visualization for browsing document collections (Pirolli et al. 1996), and we wish to extend such ideas into the domain of dictionaries, in part because indications are that current interfaces are unlikely to have much direct educational benefit for students (Kegl 1995).

Our goal has been to provide a fun dictionary tool that is effective for browsing and incidental language learning. In particular we attempt to address Sharpe's (1995) "distinction between information gained and knowledge sought". The speed of information retrieval that e-dictionaries deliver, and the focused decontextualized search results they provide, can frequently lead to loss of the memory retention benefits and chances for random learning that manually searching through paper dictionaries provides.

Within the Australian context, indigenous dictionary structure and usability are often dictated by professional linguists, while the needs of others (speakers, semi-speakers, young users, second language learners) are not met. Another major goal has been to design an interface usable by, and interesting to, young users and other language learners. From this viewpoint, the low level of literacy in the region, and the inherently captivating nature of computers suggests that an e-dictionary is potentially more useful than a paper edition. Among other benefits, we can provide an interface less dependent on good knowledge of spelling and alphabetical order.

Our dictionary interface initially targeted Warlpiri, a language of Central Australia, for which there has been an extensive on-going project for the compilation of semantically-rich lexical materials (Laughren and Nash 1983, Hale and Laughren [to appear]). We converted this data from a non-standard format into a richly-structured XML version (XML 1999). The current version uses ad hoc indexing of this textual version for efficient access, but we expect to move to XQL, as this standard matures. Our system is written in Java, using the Swing API, and runs on all major platforms (Windows, Mac, Unix).

For dictionaries with plain textual content behind them, there is little that they can provide in the way of output but an on-line reflection of a printed page. In contrast, XML allows definition of the precise semantics of the dictionary content, while leaving unspecified its form of presentation to the user. We exploit this flexibility in our application, by having the program mediate between the lexical data and the user. The interface can select from and choose how to present information, in ways customised to a user's preferences and abilities.

One dimension is that as well as the definitions of words, users frequently want to know their relationships to other words, and the patterning in these relationships. Kirrkirr provides a color-coded network display of semantic links between words, which can be explored, manipulated and customised interactively by the user (Jansz et al. 1999) using the animated graph-drawing techniques of (Eades et al. 1998, Huang et al. 1998). In their spring algorithm, a network of words become nodes which are held apart by gravitational repulsion, but kept from becoming too far apart by springs which have a natural length. This graph algorithm differs from most others by providing iterative updating of the graph layout, which means that users can drag nodes across the screen, and the algorithm will cause other nodes to flee out of the way, while words related to another word are dragged along. The detailed semantic markup of the dictionary, with many kinds of semantic links (such as synonyms, antonyms, hyponyms, and other forms of relationships) allows us to provide a rich browsing experience. For example, the ability to display different link types graphically as different colors solves one of the recurring problems of the present web, with its one type of link: users have some idea of what type of relationship there is to another word before clicking. Thinking of the lexicon as a semantic network with various kinds of links was a leading idea of the WordNet project (Miller et al. 1993), but the simple text based computer interface they provide fails to do justice to the richness of the underlying data. Others have attempted to remedy this lack (e.g., Plumbdesign 1998), but we feel that our work is better aimed at providing the kind of simple network display suitable for our users.

To augment traditional semantic relations in the dictionary, we provide also linkages derived automatically from collocational analysis (of the limited amount of online Warlpiri text), and present an interface derived from semantic domains. These interfaces both address the notion of "terminology sets" - words that belong together, a notion which seems particularly salient for native speakers (Goddard and Thieberger 1997). We discuss the determination of collocational bonds, using the method of Dunning (1993), and the limitations of what we can do with the data available.

Formatted dictionary entries, displayed using HTML, are produced from the underlying XML by the use of XSL stylesheets (XSL 1999). These provide conventional hypertext for navigating between entries, in particular providing a color-coding of different kinds of semantic relationships between words which is consistent with that in the network display. A variety of XSL stylesheets are provided, which can give different formatting to the dictionary content appropriate to different users. For instance, items such as abbreviations for parts of speech, and other grammatical notes, and detailed decompositional definitions can be confusing for most Aboriginal users (Corris et al. 1999), and style sheets can provide just the desired information in large easy-to-read type.

In addition to the above, the dictionary incorporates multimedia - the user can hear words and see appropriate pictures - and a conventional search interface. The dictionary provides a user-friendly console where search results can be sorted and manipulated. As well as standard keyword search, which can optionally be restricted to appearance within a specified XML entity, the system provides two features targeted towards two principal groups of users. Linguists often want to search for particular sound patterns (such as certain types of consonant clusters), and so the system allows regular expression matching for such expert users. On the other hand, the limited literacy level of many potential users means that they will have particular problems looking up words. In part this is due to particular problems whereby the phonetic orthography of Warlpiri does not match very closely to the (rather arcane) spelling rules of English in which their literacy skills are usually based. To alleviate this problem, we have implemented a "fuzzy spelling" algorithm which attempts to find the intended word by using rules which capture common mistakes, sound confusions and alternative spellings.

We have performed some preliminary trialling of the dictionary through visits by Mim Corris to Yuendumu and Willowra, and Jane Simpson to Lajamanu. This has involved completing dictionary tasks, and observational use with primary and lower secondary students and trainee Warlpiri literacy workers, and comments from teachers and other adults. In general reactions have been quite enthusiastic, and the dictionary does appear to succeed in creating and maintaining interest. We have received suggestions on how to make it a better basis for classroom activities, which we hope to incorporate in future versions.

The diversity of areas researched in this work is rare relative to past work in electronic dictionaries, which often addresses the problems of storage, processing and visualisation/teaching as unrelated. Despite some significant research into the construction of lexical databases that go beyond the confined dimensions of their paper ancestors, there has been little attempt at seeing this work through to benefiting people such as language learners, who could truly gain from a better interface to dictionary information. Additionally, the range of potential users here is considerably more diverse than encountered in typical studies of dictionary usability (e.g., Atkins and Varantola 1997). For instance, issues such as low levels of literacy are rarely touched on. Our system has attempted to reduce the importance of knowing the written form of the word before the application can be used, while having ample opportunities to learn written forms. Features such as an animated, clearly laid out network of words and their relationships, multimedia and hypertext aim at making the system interesting and enjoyable to use. At the same time, features such as advanced search capabilities and note-taking make the system practical as a reference tool. Having designed the system to be highly customisable by the user, it is also highly extensible, allowing new modules to be incorporated with relative ease. We thus think that it is a good foundation for an electronic dictionary, and while the focus of this research has been on Warlpiri, this research (and the software constructed) can be easily applied to other languages.


Atkins, B.T.S., and Varantola, K. (1997). Monitoring dictionary use. International Journal of Lexicography 10(1):1-45.
Corris, M., Manning, C., Poetsch, S., and Simpson, J. (1999). Using dictionaries of Australian Aboriginal languages. Paper presented at the Applied Linguistics Association of Australia Annual Congress, Perth.
Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19:61-74.
Eades, P., Huang, M., and Wang, J. (1998). Online Animated Graph Drawing using a Modified Spring Algorithm. Proceedings of the 21st Australian Computer Science Conference, pp. 17-28.
Goddard, C. and Thieberger, N. (1997). Lexicographic Research on Australian Languages 1968 - 1993. In M. J. Walsh and D. Tyron (eds) Boundary Rider: Essays in Honour of Geoffrey O'Grady, pp. 175-208. Pacific Linguistics, Canberra.
Hale, K. L. and Laughren M. (to appear). The Warlpiri Dictionary.
Huang, M. L., Eades, P., and Cohen, R. F. (1998). WebOFDAV: Navigating and visualizing the Web on-line with animated context swapping. Proceedings of the 7th International World Wide Web Conference, pp. 638-642.
Jansz, K., Manning, C. D., and Indurkhya, N. (1999). Kirrkirr: Interactive Visualisation And Multimedia From A Structured Warlpiri Dictionary. Proceedings of AusWeb99, the Fifth Australian World Wide Web Conference, pp. 302-316.
Kegl, J. (1995). Machine-Readable Dictionaries and Education. In D. Walker, A. Zampolli and N. Calzolari (eds) Automating the Lexicon: Research and Practice in a Multilingual Environment. Oxford University Press, Clarendon.
Laughren, M. and Nash, D. G. (1983). Warlpiri Dictionary Project: Aims, method, organisation and problems of definition. Papers in Australian Linguistics No. 15: Australian Aboriginal Lexicography, pp. 109-133. Pacific Linguistics, Canberra.
Miller, G., Beckwith, R., Fellbaum, C., Gross, R. and Miller, K. (1993). Introduction to WordNet: An On-line Lexical Database. In C. Felbaum (ed) (1998). WordNet: An electronic lexical database. MIT Press.
Pirolli, P., Schank, P., Hearst, M. A., and Diehl, C. (1996). Scatter/Gather Browsing Communicates the Topic Structure of a Very large Text Collection. Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI '96).
Plumbdesign (1998). Visual Thesaurus Java applet <http://www.plumbdesign.com/thesaurus>
Sharpe, P. (1995). "Electronic Dictionaries with Particular Reference to an Electronic Bilingual Dictionary for English-speaking Learners of Japanese", International Journal of Lexicography, Vol. 8, No. 1, pp. 39-54.
XML (1998). Extensible Markup Language (XML) 1.0 W3C Recommendation 10-February-1998. In T. Bray, J. Paoli, and C. M. Sperberg-McQueen (eds) <http://www.w3.org/TR/1998/REC-xml-19980210>.
XSL (1998). Extensible Stylesheet Language (XSL) Version 1.0 Working Draft. In J. Clark and S. Deach (eds) <http://www.w3.org/TR/WD-xsl>.

Return to ALLC/ACH Programme

(3.2.2) New Paths in Middle High German Lexicography: Dictionaries Interlinked Electronically

Johannes Fournier
University of Trier, Germany

I. Topic:

Since September 1997, a small team of lexicographers and computer scientists at the University of Trier (Germany) have been developing an integrated electronic dictionary of Middle High German applying TEI Guidelines. The resulting integrated digital dictionary is expected to be finished by August 2000 and be published on CD-ROM as well as on the Internet. It is not only meant to facilitate the simultaneous use of the dictionaries concerned, but also to offer advanced query options to provide essentially new insights for those involved in vocabulary studies, metalexicography, and the composition of a new MHG dictionary.
II. Digitization as a necessity:

The most important dictionaries of the MHG language were written in the last century and need to be replaced urgently by a new major work. This necessity arises not only from the enormous increase in the number of editions of MHG texts since the end of the 19th century, but also from changed insights into the structure of vocabulary and new ways of describing word usage. Consequently five years ago, two teams of lexicographers at the Universities of Trier and Goettingen started to lay the foundations for a new MHG Dictionary by creating an electronic archive of texts and quotations. It will probably take up to 25 years, however, for the whole dictionary to be finished, thus scholars of all disciplines having to deal with MHG sources will still have to use the older dictionaries for quite a while.

The dictionaries that already exist, i. e. the "Mittelhochdeutsches Woerterbuch" by Georg Friedrich Benecke/Wilhelm Mueller/Friedrich Zarncke (1854-1866), the "Mittelhochdeutsches Handwoerterbuch" with its supplement, the "Nachtraege", by Matthias Lexer (1872-1878), and the "Findebuch zum mittelhochdeutschen Wortschatz" by Kurt Gaertner et al. (1992), are very closely interconnected and can only be used simultaneously, which is due to the fact that they must be considered, briefly speaking, as a kind of series of supplements to supplements to supplements. Therefore they were ideal candidates for the composition of an integrated digital dictionary. One of the major aims of the digitization is to make the lexicographical information of the dictionary entries accessible via a database and thus to enable sophisticated searches over all four dictionaries independently of headwords. Applying TEI Guidelines to machine readable versions of the printed dictionaries seemed the easiest and fastest way of creating the digital "compound dictionary".

III. (Semi-)Automatically generated markup according to TEI Guidelines:

The MHG dictionaries consist of eight volumes with about 1,100 printed pages, containing more than 80,000 headwords. Therefore it is obvious that TEI compliant markup of the dictionary entries had to be generated automatically as far as possible. For the purposes of encoding we used TUSTEP, the Tuebingen System of Text Processing Programs with its variety of parameter-controlled functions for user-defined textdata-processing that facilitate structured entry-input.

Some parts of the TEI design scheme were especially relevant for the dictionary encoding. Some advantages and problems when applying TEI have to be discussed in detail, such as the hierarchical embedding of elements within the articles, the use of global attributes for the markup of a wide range of lexicographical information, and the recoverability of articles. It should also be mentioned that TEI Guidelines should be improved with regard to the encoding of dictionaries of older stages of a language, for the description of such languages poses some problems seldom encountered when describing modern languages.

It is apparent, however, that most problems which arose when using TEI did not stem from the application of TEI Guidelines as such, but were primarily due to the fact that the dictionary entries often appeared to lack clear structure and were rather discursive in style. This has often made automatic SGML encoding a difficult task. In many cases only manual markup led to TEI compliant documents. Nevertheless, the results achieved so far fully justify the decision in favour of TEI Guidelines.

IV. New ways of using dictionaries:

Through the electronic version, the MHG dictionaries can be used much more easily and comfortably: hyperlinks connect all the corresponding headwords, the search for cross-references only takes a mouse-click's time; pop-up menus contain the relevant information about all sources of citation; bookmarks and notes can be created easily. PostScript files of all dictionary pages are interlinked with the electronic articles so that the compound digital dictionary can be used and cited as a work of reference in exactly the same way as its printed precursors.

Far more important is the access to a database containing the relevant information for the entire contents of the four dictionaries within the composite whole. Access via that database not only offers full-text retrieval but also retrieval of selected information, e.g. of parts of speech, of word forms in MHG quotations, of definitions or of strings in the etymology sections of dictionary entries. Highly important for advanced and complex query options is the linking of a list of all dictionary sources with the electronic dictionary itself: all sources have been sophisticatedly classified according to geographical provenance, chronology and genre, categories that can be used to limit data base queries to a small, self-defined corpus of texts cited within the entries. Which words were directly borrowed from Italian, but not through Latin or French? Which words are only quoted from sources concerning legal issues? Which MHG words denote the same concepts? These are some of the questions that can now be answered without great expense of time. What is still more, the integrated electronic dictionary is especially important for the lexicographers involved in the creation of the new MHG dictionary where the older dictionaries are used as pointers to words for which references rarely exist.

V. Institutional frame:

Some years ago, the Deutsche Forschungsgemeinschaft (DFG = German Research Council) initiated a program for the so-called "Retrospective Digitization of Library Materials". The main goal of the program is to facilitate the access to library holdings that may be rare or highly important for scholarly interests by providing electronic versions of these holdings. From the beginning, the program encouraged the use of SGML for full-text encoding.

Since September 1997, the DFG has been funding the creation of an integrated digital dictionary of Middle High German to be published on CD-ROM as well as on the Internet. It is intended to serve as a prototype for the digitization of other historical dictionaries, including the digitization of the famous "Deutsches Woerterbuch" of Jacob and Wilhelm Grimm


Burch, Thomas, Fournier, Johannes and Gaertner, Kurt (1998/2). Mittelhochdeutsche Woerterbuecher auf CD-ROM und im Internet. Der Einsatz von SGML in der Retrodigitalisierung lexikographischer Standardwerke. In Akademie-Journal. Mitteilungsblatt der Konferenz der deutschen Akademien der Wissenschaften, p. 17-24.
Fournier, Johannes: Digitale Dialektik. Chancen und Probleme mittelhochdeutscher Woerterbuecher in elektronischer Form. In Woerterbuecher in der Diskussion IV. Vortraege aus dem Heidelberger Lexikographischen Kolloquium. Hrsg. von Herbert Ernst Wiegand. Tuebingen 2000 (Lexikographica; Series Maior 100), p. 85-108.
Burch, Thomas and Fournier, Johannes (forthcoming). Zur Anwendung der TEI-Richtlinien auf die Retrodigitalisierung mittelhochdeutscher Woerterbuecher. In Probleme und Perspektiven computergestuetzter Lexikographie. Hrsg von Ingrid Lemberg, Bernhard Schroeder und Angelika Storrer. Tuebingen 2000 (Lexikographica; Series Maior).

Further information on the topic proposed is available at this location <http://gaer27.uni-trier.de/MWV-online/MWV-online.html>

Return to ALLC/ACH Programme

(3.2.3) Stephen Crane and the 'New-York Tribune': a Case Study in Traditional and Non-traditional Authorship Attribution

David I. Holmes
Michael Robertson
The College of New Jersey, USA


This paper presents seventeen previously unknown articles that we believe to be by Stephen Crane, published in the 'New-York Tribune' between 1889 and 1892. The articles, printed without byline in what was at the time New York City's most prestigious newspaper, report on activities in a string of summer resort towns on New Jersey's northern shore. Scholars had previously identified fourteen shore reports as Crane's; these newly discovered articles more than double that corpus. The seventeen articles, witty and often hilarious, confirm how remarkably early Stephen Crane set his distinctive writing style and artistic agenda; more than a century after their publication in the 'Tribune' they remain delightful reading. Stephen Crane began his career as a professional writer in the summer of 1888, when he was sixteen [1]. His assignment was to assist his brother J. Townley Crane, Jr., almost twenty years older than Stephen, who had established Crane's New Jersey Coast News Bureau in 1880 when he arranged to serve as correspondent for the Associated Press and the 'New-York Tribune'. For three-quarters of the year, Townley Crane's duties must have been light, as he ferreted out news in the sparsely populated shore towns of Monmouth County. However, during the summer months, the news bureau's duties exploded. New York City newspapers of the1880's devoted remarkable amounts of space to chronicling the summer vacations of the city's upper and upper-middle classes. Every Sunday edition of most New York newspapers and, during July and August, most daily editions as well, carried news articles from the summer resorts popular with the more affluent citizens of Gilded Age New York. The format of these articles was standardized: a lead proclaimed the resort's unique beauties and the unprecedented success of the current summer season, a few brief paragraphs recounted recent events, such as a fund-raising carnival or the opening of a new hotel, and the article concluded with a lengthy list of names of recent arrivals and where they were staying. Working within the boundaries of this restrictive format, Stephen Crane developed a highly original, distinctive style. His shore reports are as ruthlessly ironic as 'Maggie', the novel he was writing during the same period, but, instead of directing his irony towards the inhabitants of the Bowery, he aimed it at the hotel proprietors and summer visitors of the New Jersey shore.

Discovery And 'Traditional' Attribution

During the 1940's and 1950's, scholars familiar with Crane's style and interests were able to identify several other unsigned articles in the 'Tribune' as his. By coincidence, all of these articles originated in three adjoining towns on the Jersey shore: Asbury Park, Ocean Grove and Avon-by-the-Sea. When Fredson Bowers began editing his massive volume of Crane's works [2], he evidently decided to limit his search for additional unsigned 'Tribune' articles by Crane to reports with datelines from those three resorts. Combing the 'Tribune' during the summer months from 1888 to 1892, Bowers identified as Crane's three articles overlooked by previous scholars, bringing the total of New Jersey shore reports to fourteen. No one questioned Bowers' decision to focus on the three adjoining shore communities. However, during research on a book concerning Stephen Crane's journalism, we came across an item in the Schoberlin collection at the Syracuse University Library that threw into doubt Bowers' procedure. A one-page prospectus for Crane's New Jersey Coast News Bureau was found which provided evidence of an attempt by Townley Crane to expand his business. In particular, the body of the prospectus lists shore towns ranging from Atlantic Highlands in the north to Seaside Park in the south. With this evidence of the Crane news bureau's wide geographical range, we began to question why all of the shore articles attributed to Stephen originated from Asbury Park and the two towns just south of it. Would it not make sense for Townley to send his teenaged brother to cover news in the resorts a few miles distant from their home base of Asbury Park and save himself the trouble? We searched the 'New-York Tribune' for the summers of 1888 to 1892, when Stephen was fired, looking for articles with a dateline from the wider base of New Jersey shore towns named in Townley Crane's prospectus. The Crane brothers' writing styles are widely divergent. Reading Townley's articles (written before Stephen began his journalistic career), it is evident that his style is that of brief, invariably flattering prose, while Stephen delighted in gleeful irony. This search revealed seventeen articles datelined from the shore towns of Long Branch, Belmar, Spring Lake and Sea Girt that appear to have the style of Stephen Crane. Hotel proprietors, baggage handlers and "summer maidens" are all written about with disdain. The articles are so stylistically distinctive in their irony and verbal inventiveness that they clearly look to be from Stephen's hand rather than from Townley's.

'Non-Traditional' Attribution: Stylometry

Stylometry provides an alternative and objective analysis. The stylometric task facing us was to examine the seventeen articles and attribute them to either Stephen or Townley Crane. Suitable control samples in more than one genre are required, so, within the genre of fiction, several textual samples of about 3,000 words were obtained from 'The Red Badge of Courage' and Joseph Conrad's 'The Nigger of the Narcissus', the latter being chosen because we know that Crane and Conrad read and admired each other's novels. For journalistic controls, we turned to Richard Harding Davis and Jacob Riis, who were, along with Crane, the most prominent American journalists of the 1890's. Examples of Stephen Crane's New Jersey shore reports, New York City journalism, and war correspondence were taken from the University of Virginia edition of Crane's work; samples of Townley Crane's journalism were taken from the 'New-York Tribune'. The seventeen anonymous articles were first merged, the resultant text then being split into two halves of approximately 1800 words each. The "Burrows" technique [3], which works with large sets of frequently occurring function words, is a proven and powerful tool in authorship attribution. Essentially it picks the N most common words in the corpus under investigation and computes the occurrence rate of these N words in each text or text-unit. Multivariate statistical techniques are then applied to the resultant data to look for patterns. The first phase in the investigation was designed to establish the validity of the Burrows technique on the known textual samples detailed above. Using principal components analysis, the Crane and Conrad fiction samples are clearly distinguishable from each other. Turning to the three genres within Crane's journalism, the principal components analysis on the occurrence rates of non-contextual function words shows, quite remarkably, how their rates of usage differ between his New York City, shore and war journalism, yet remain internally consistent within these genres. A final analysis incorporating the samples of journalistic writing from Townley Crane, Richard Harding Davis and Jacob Riis provides further validation of the Burrows method with a clear distinction visible between the shore journalism of Townley Crane and Stephen Crane. Discarding the control samples, which have served their purpose, we then focus on the main task, namely the attribution of the seventeen anonymous articles to either Stephen or Townley. Both cluster analysis and principal components analysis provide mutually supportive results in attributing the anonymous articles to the youthful ironist Stephen Crane. The "non-traditional" analysis has supplied objective, stylometric evidence which supports the "traditional" scholarship on the problem of authorship. We believe that this joint interdisciplinary approach should be the way in which attributional research is conducted.


1. Wertheim, S. and Sorrentino, P., eds. (1988). The Correspondence of Stephen Crane. 2 Vols. Columbia UP, New York.
2. Bowers, F. ed. (1973). Tales, Sketches and Reports. Vol. 8 of The University of Virginia Edition of the Works of Stephen Crane. UP of Virginia, Charlottesville.
3. Burrows, J. L. (1992). "Not Unless You Ask Nicely: The Interpretive Nexus Between Analysis and Information". Literary and Linguistic Computing, 7, pp. 91-109.

Return to ALLC/ACH Programme

(3.3) Digital Resources (Panel Session)

Making MITH a Reality: The Maryland Institute for Technology in the Humanities, Year Two

Martha Nell Smith
Maryland Institute for Technology in the Humanities, USA

Charles Lowry
Lori Goetsch
University of Maryland, USA

Jo Paoletti
University of Maryland & Maryland Institute for Technology in the Humanities, USA

Lisa Antonille
Jason Rhody

Maryland Institute for Technology in the Humanities, USA

Making MITH a Reality:
The Maryland Institute for Technology in the Humanities, Year Two http://www.mith.umd.edu
key words for session: digital libraries, feminism and writing technologies, film editing, intercultural learning center
Participants: Dr. Martha Nell Smith, MITH Director; Dr. Charles Lowry, Dean of Libraries, University of Maryland; Dr. Jo Paoletti, MITH Fellow; Jason Rhody, MITH Programs Coordinator & MITH Networked Associate; Lisa Antonille, MITH Networked Associate; Dr. Neil Fraistat, MITH Internal Advisory Board Chair & General Editor, Romantic Circles.

In December 1998, the University of Maryland (UM)'s College of Arts and Humanities, Libraries, and Office of Information Technology were awarded a $410,000 grant from the United States' National Endowment for the Humanities (NEH) to develop MITH (the Maryland Institute for Technology in the Humanities) in order to foster faculty development and coordination of advanced technological resources and humanities applications of technology beyond early adopters into the university mainstream and out to the wider educational community and the community at large. 

This roundtable will begin with a brief introduction by MITH's Director, Martha Nell Smith, followed by presentations by five of MITH's key participants. 
- "READY, FIRE, AIM: Responding to the Paradigm Shift, the University of Maryland Libraries and MITH," Charles Lowry (Dean of the Libraries); 
- "Intercultural Learning Center on MITH's Virtual Plaza," Jo Paoletti (MITH Fellow Spring 2000, Associate Professor of American Studies, UM); 
- "Reconstellating Relationships: Practical & Effective Teamwork," Jason Rhody (MITH Programs Coordinator);
- "Symbiosis, Virtually and Face-to-Face: MNAP," Lisa Antonille, (MITH Networked Associate Fellow); 
- Neil Fraistat (General Editor of Romantic Circles [ http://www.rc.umd.edu ] and Chair of MITH's Internal Advisory Board). 
These members of the MITH team will discuss how their individual and institutional collaborations have worked during MITH's first year of operations, what unforeseen challenges they faced, and what unanticipated opportunities presented themselves so that MITH became operational four months ahead of time, and now offers support for faculty fellows at UM and networked associates within and without UM; faculty/student colloquia; a graduate student web-authoring collective; undergraduate Honors Humanities; and graduate English and Comparative literature courses.

UM, a designated national supercomputing and Internet2 center, is one of the best networked and computer-supported universities in the world. UM boasts more wired classrooms than most other institutions of higher learning. These include multi-platform teaching theaters and classrooms, foreign language teaching facilities, and a variety of technology-enhanced departmental classrooms and laboratories. Computing resources for scholarship and research in the humanities are comparably strong. The establishment of MITH gives critical mass to the UM's substantial and significant pioneering efforts. MITH creates an essential unifying physical presence as well as, through its web-based projects and programs, a virtual campus for this dispersed College, thereby creating an intellectual identity based on shared discourse and interests. Additionally, as a central (literal and virtual) coordinating facility dedicated to producing projects in the College's various departments and to providing access to materials, MITH provides an umbrella organization for the conception, production, maintenance, and enhancement of electronic resources indispensable for realizing UM's twenty-first century teaching, research, and outreach missions. As a laboratory for the humanities, MITH offers a center for sharing information, tools (hardware and software), and opportunities for synergistic development, creating a dynamic field for diffusion of innovation in humanities technology available to the world-wide community as ideas and projects of individual scholars influence one another in the production of new knowledge. The goals, then, of MITH are threefold:

  1. to generate and foster the development of innovative projects that respond to the traditional interests of the humanities while nurturing emerging modes of scholarship and learning; 
  2.  to guarantee aggressive outreach of these new technological approaches not only to the faculty members and students of UM, but also to the state educational community in grades K-12 and community groups committed to educational reform; 
  3. and, in support of goals one and two, to provide advanced technological resources for the creation, deployment, and dissemination of technology-based scholarship and instruction.

To begin the roundtable presentation, Dean Lowry will discuss the pragmatic institutional responses that the University of Maryland Libraries faculty are making with colleagues in the College of Arts & Humanities to the many challenges in the higher education environment, including teaching, information technology, and transformation of scholarly communication. Lowry's presentation highlights human, technological, and physical resources assembled in the effort and in initial project activities. Lowry's probative examples focus on the "price revolution" in the cost of scholarly information; diversification in scholarly information (such as GIS, online government information, e-journals, and the "electronic ephemera" of our time, internet resources); digitizing the Libraries own collections; the "informating process" and the transformational effects inside libraries (to organization and to staff). Via Digital Libraries Operations (DLO), the Electronic Text & Imaging Center (ETIC), and the support of an Instructional Development Coordinator, UM's Libraries are collaborating with MITH to present a range of projects video- and audio-streaming of the English department's Writers Here & Now Program, a decades-old series presenting major (Pulitzer and Nobel prize-winning) writers from around the U.S. and the world, as well as to produce digital archives of Beckett Directs Beckett, a UM Visual Press series featuring Beckett directing his own Waiting for Godot, Krapps's last tape, and Endgame, as well as presenting an extended televisual interview with the Nobel laureate. Another digital presentation involving film will be a critical treatment of Lovelace & Babbage, iconic figures in the early history of computing. Part of the award-winning Women & Power Series (Flare Productions), Lovelace & Babbage (under production in 2000) will be key to theoretical interrogations of a graduate seminar offered by one of the film's producers in MITH's computer studio in Spring 2000 as well as of UM's digital libraries project. MITH's collaborations with Flare contribute both to the making of the film and enhancing UM's digital libraries' project, and Lowry outlines future plans of UM's Digital Libraries projects and ways in which collaborations with MITH will augment both the faculty fellows and outreach (to K-12 teachers) programs in ways not anticipated in the original MITH proposal.

MITH Fellow Jo Paoletti focuses on her development of an Intercultural Learning Center (ICLC), which provides a crossroads where college and 7-12 students from diverse backgrounds interact through discussion groups, shared projects, and carefully designed exercises. Enabling teachers and students to discuss controversial issues of race, ethnicity, and identity in a "safe" space, learn about each other through shared papers and essays, and collaborate on meaningful community-oriented projects, the ICLC that Paoletti has been building during her semester at MITH elaborates and extends her leadership of and participation in UM's Web Initiative in Teaching (WIT), which developed four distinct but related distance-learning courses American Studies Department's Diversity in American Culture; the English Department's Cross-cultural Communication course; and two courses in English for speakers of other languages offered by the Maryland English Institute. The ICLC pilot spaces are situated in two different platforms, WebCT (a course management tool) and Active Worlds (a 3-D web browser with built-in communication features). The development of a web-based ICLC includes many of the features available in WebCT (i.e., chat, threaded conferences, space for individual and collaborative projects, readings, and other resources), plus others which are not available because of WebCT's design limitations, such as a queuing mechanism for partnering students as keypals or speaking partners. Having designed and tested a beyond-WebCT ICLC during her semester at MITH, Paoletti will critique the means and contents of knowledge production generated by college students serving as editors and mentors in their reading and responding to materials written by high school students; by high school students and college students sharing family histories and using bulletin boards to discuss the process of Americanization (the high school involved in this research project has a large immigrant population); and by groups of students listening to or reading interviews with former slaves, discussing them online, and publishing an online collection of their own poems and other writings in critical response to these personal histories.

MITH Programs Coordinator Jason Rhody describes the constantly reconstellating relationships involved in MITH's Fellows, colloquia, seminar, and distinguished speaker programs. Among the many functions MITH has served at UM is as a forum for teachers and researchers of diverse disciplines to come together in a collaborative setting to brainstorm, present, assess, share, and reconfigure work plans for the most effective knowledge transfers taking advantage of innovations in technology. As MITH Distinguished Speaker this past spring, Irvin Kershner, Director of The Empire Strikes Back, remarked "when one of my movies goes exactly according to my plan, turns out exactly as I envisioned it, then I know that something has gone terribly, terribly wrong," thus making explicit the dynamism and change necessary for successful (practical) implementation of virtual dreams. Illustrating what we have learned and what we know we must unlearn, Rhody draws on our first year's case studies Katie King's Feminism and Writing Technologies project, which analytically plumbs and extends pedagogical imaging and imagination by involving graduate and undergraduate students in research that places "new" technologies within a broad synthetic historical and cultural framework (specifically by building a virtual 17th-century women's print shop, one designed so that students working in the print shop can make their own broadsides, which, electronically translated into 17th-century styles of handwriting, instruct students in the paleography of manuscript study); the Dickinson Electronic Archives projects, in which four UM computer science undergraduates have embarked on a scholarly production of a 3-D Dickinson Homestead online, replicating her house to build a virtual study center, open to anyone with a web browser; John Fuegi's video biography of Ada Lovelace, the daughter of Lord Byron who invented the first computer, and MITH's critical edition of that; Mitch Lifton's Beckett Directs Beckett Digital Narrative project which brings research into the undergraduate classroom and MITH's collated outtakes to produce a critical edition of the video project itself by involving students in the production of digital resources and collaterally hones their critical assessment skills as they learn to evaluate both content and software; Classics 170, a web-based introductory humanities course housed at MITH; and our work on XML mark-up of selected documents from UM's Prange Collection of Japanese publications during the US occupation following World War II.

MITH Networked Associate Fellow Lisa Antonille analytically reports on MNAP, the MITH Networked Associate Fellowship Program, and how it makes a wide variety of resources available to graduate students, adjunct faculty, independent researchers, and researchers at other institutions. Currently, there are four Networked Associate projects, including two online journals (Schulkyll and ethos: Hypertexts in the Humanities; http://www.mith.umd.edu/ethos ), a Resource Center for Cyberculture Studies (RCCS; http://otal.umd.edu/~rccs ); an Intercultural Learning Center (ICLC) with ties to the University of North London; and a web activism site mounted by the musical group Sweet Honey in the Rock. Nearly five more projects are beginning the application process. This unique program facilitates the growth of innovative projects that may otherwise suffer from lack of access to necessary hardware, software, technical support, and like-minded individuals and projects. Furthermore, MITH provides a physical space and a community of scholars committed to visionary and practical applications of technology to advance research and teaching in the humanities in which Networked Associate Fellows become active participants and via which they can bring their projects to fruition. For more developed associates, MITH also encourages a wider dissemination of projects and assists in the iterative process of project development and improvement. Conversely, MITH benefits from MNAP in the reciprocal, symbiotic exchange among fellows, associates, and staff.

Neil Fraistat, Chair of MITH's Internal Advisory Board and General Editor of Romantic Circles, will discuss ways in which his experimental peer-reviewed and peer-built scholarly website focused on the literature and culture of the British Romantic Period collaborates with MITH. Besides featuring a virtual Dickinson Homestead, a virtual Intercultural Learning Center, a virtual early twentieth-century artistic salon hosted by composer Arthur Schelling, a virtual Empire Strikes Back set, and a virtual platform for website activism, the MITH Plaza will feature a portal to Romantic Circle High School. Through reviewing various intersections between RC and MITH, Fraistat shows that there are enormous opportunities for each and both together to become important agents for change in the way the Humanities in the university, across grades K-16, and across the community at large reconceives its most fundamental goals, as well as its scholarly and instructional practices.

Smith concludes with brief remarks reflecting a bit on the extensive collaborations necessary to make MITH succeed, on unforeseen problems and possibilities, and by asking the audience for analytical feedback to improve MITH's future. MITH is especially interested in fostering and developing cross-institutional collaborations (and has already initiated connections with IATH at the University of Virginia and the Women's Writers Project at Brown University), and this session is designed to provide opportunity for analytical dialogue. Critical dialogue with the audience and with one another is a crucial component of our evaluation process so that we might share and analyze key failures and frustrations as well as key successes and thereby propose questions about institutional collaborations and their necessity for producing digital resources in the humanities, being realized particularly at UM in MITH.

Return to ALLC/ACH Programme