(1.1) Text Encoding

(1.1.1)Writing about It: Documentation and Humanities Computing

Julia Flanders
Brown University, USA

Documentation is arguably the most important part of a humanities computing project's long-term existence, in two senses. First, in the sense that without it a project cannot maintain continuity and consistency; and second, in the sense that without it a project cannot communicate its methods to other members of the larger community, offering them for critique if in need of improvement, and making them known if worthy of emulation.

Without documentation a project is effectively without "self-knowledge", by which I mean the information which the project itself as an entity needs to know in order to survive and to perpetuate itself. It is crucial to distinguish here between the knowledge belonging to individual participants in the project's work, and the project itself. What individuals know is not necessarily accessible to other project members, and this knowledge is taken away with them when they leave. What the project knows, on the other hand, is an explicit part of its internal and public existence, with several important consequences. It can be found without recourse to private knowledge; it does not depend on any individual and is not vulnerable to changes in staff. And finally, this kind of self-knowledge has a particular rhetorical status within the project and as a public expression of identity, in that it has an understood authority: it is explicitly endorsed by the project and its truth or applicability are not in question.

This is as much as to say that we need to take documentation seriously not only as a practical matter but also as a question of theory. Producers of documentation must negotiate between different rhetorical scenarios, from the didactic, developmental narrative of a training manual to the encyclopedic granularity of a reference guide. Treating this negotiation as a rhetorical problem seems to locate it in the writing itself. But in fact we can see, once we look more closely, that documentation is a specialized kind of data or content to be purveyed, and that these different scenarios amount to different approaches to data retrieval which must be accommodated. This complexity is compounded by the fact that we are concerned here with humanities computing documentation, to be used by humanists, with humanistic expectations about the relationship between that which is documentable - reproducible, deterministic, normative - and that which is subject to independent judgment and expertise.

Finally, the challenges documentation poses - its peculiar embodiment of the Arnoldian tension between "Hebraizing" and "Hellenizing", between doing and thinking - also resonate with issues central to humanities computing. Documentation both embodies a project's self-reflection and calls it to a close, requires that reflection conclude in order that action may commence. And yet the normative statements which documentation strives to offer about a project's practice are inevitably, in a project of any scope, the occasion for discovering further issues which have not yet been decided. The perpetually unfinished work of documentation thus holds the project in a state of dynamic suspension, always trying to resolve issues and get back to work, always trying to finish work so that it can be documented.

Some specific points are worth noting here, to be discussed at greater length in the finished paper. First of all is the issue already mentioned of the relationship between training documentation and reference documentation. The most apparent difference between these two forms is the kind of text being produced: in the first case, a developmental narrative which takes the trainee through the project's methods from basic to advanced in a way which maximizes memorability and comprehensibility; in the second case, a reference work which provides instant access to particular topics discussed as independent items with a high degree of granularity. These two modes are so different that it is often extremely difficult to convert from one to the other, increasingly so in proportion to how successfully the given mode has been realized. They also require quite different kinds of infrastructure to make them useful in the work environment: for instance, the reference model works best when accompanied by good metadata and a good retrieval system. It also requires attention to the level of granularity at which individual instructions are conceptualized, and to how related instructions will be identified and aggregated.

Producing documentation in either mode requires that one conceptualize the consumer's needs and habits in detail, and this raises a second point which has already been mentioned above. What role do humanists allot to documentation, broadly considered? This question points to a larger issue for humanities computing, namely the role of judgment and interpretation in the creation of humanities data, and in what areas the exercise of these things is appropriate. If the documentation is framed for a workplace in which comparatively unskilled workers require explicit instructions on making absolutely consistent choices, then the documentation itself needs to be equally determinate, authoritative, and exhaustive in the way it communicates. It must anticipate every alternative and leave no opening for variation; from the retrieval standpoint, it must ensure that the correct information is always discovered no matter how inept or tangential the search strategy. In short, it must make the work resemble as little as possible the kind of intellectual environment envisioned by a liberal humanities viewpoint. On the other hand, if the intention is to guide the worker in exercising judgment - that is, to indicate the principles to be applied rather than the act to be performed - then the documentation will necessarily imagine its readers as part of an ongoing investigation into the project's methods and standards.

To give these reflections some concreteness, the finished paper will also consider an actual documentation system currently in use in a major text encoding project, which has evolved over a period of seven years and is used both for training and reference. Although developed for a particular set of needs and by no means perfect, this system and the process of its development offer an example which may be of use to people currently designing or redesigning documentation systems of their own.

Return to ALLC/ACH Programme

(1.1.2) Integration of Markup Languages and Object-Oriented Techniques in a Hypermedia Methodology

Antonio Navarro
Alfredo Fernandez-Valmayor
Baltasar Fernandez-Manjon
Jose Luis Sierra
Universidad Complutense de Madrid, Spain

In this paper we present a hypermedia design and production methodology integrating markup languages and object-oriented techniques. This methodology tries to solve the main problems presented during hypermedia development, improving the communication between customers (content providers and interaction designers), and developers (software designers and programmers).

1. Introduction

Hypermedia production is a complex and costly task with a specific need to involve experts of very different fields during all the phases of the software development. Usually in traditional software production, the customer gives the functional and operational requisites to developers, who in a very independent way implement the program. On the contrary, in hypermedia production we need more implication of the customer providing all the information needed to build the application. We have learned all these lessons during the construction of Galatea, an educational hypermedia for French text comprehension, developed in collaboration with a linguists team (complementary information about Galatea can be found in [Fernandez-Manjon et al. 98]).

This customer presence, providing contents and interaction, originates a severe problem in the design phase. In this phase we need a systematic and well-defined formalism to represent the application in an abstract way that facilitates the relationship between customers and software designers.

Two main approaches have been used to solve this problem. One solution is the use of hypermedia models in the design and development of these applications. The other one is to use object-oriented diagrams to cover the design phase. In this paper we analyze both approaches [Navarro 98], and propose a production hypermedia methodology integrating hypermedia models and object-oriented techniques. Our methodology [Navarro&Fernandez-Manjon 00, Navarro&Sierra 00] tries to solve some of the problems identified in previous approaches facilitating the interaction between customers and developers, easing the code generation based on design phase, and improving the application maintenance.

2. Hypermedia models and object-oriented design and development techniques

Hypermedia models as Dexter Hypermedia Model [Halasz and Schwartz 94], Amsterdam Hypermedia Model [Hardman et al. 94], Hypertext Abstract Machine HAM [Campbell and Goodman 88], Hypergraph Model [Tompa 89], Trellis Model [Stotts and Furuta 89], and Hypertext Design Model HDM [Garzotto et al. 93] present important advantages [Garzotto et al. 93], and a few drawbacks. They are closed systems, making it impossible (or very difficult) to include complex computational activities in the hypermedia application, if the model doesn't support this activity; some of them are too hard to be managed by non computer science people; and none of them enables the design of an hypermedia application centred on the information structure, with a real independence of the presentation structure.

Object-oriented software development methodologies, as Booch [Booch 94] or UML [Rumbaugh et al. 98], are extensively and successfully applied in computer projects development, because they improve software quality and maintenance, but these methodologies also present some drawbacks. They are not primarily intended for the development of hypermedia applications; and they use diagrams which are valuable for software designers, but very difficult to be understood by a customer team.

Our approach combines ideas for both domains, and integrates them through the use of XML. XML, the Extensible Markup Language [W3C XML], is the evolution of the first attempt to represent markup languages in a standardized way, SGML, the Standard Generalized Markup Language [ISO/IEC SGML]. XML is based on descriptive markup (the tag semantic is not specified in the tag definition); the separation between the structure, content and treatment of a document; and the platform independence. To achieve these goals XML defines the set of tags that conform to the markup language (that is the document structure) through an XML construction called DTD (Document Type Definition). This DTD is the grammar that formally describes the structure of a class of text, and a document that includes the DTD tags to structure its content is called an instance of the DTD.

3. Our approach

As previously stated, our methodology tries to solve the problems identified in previous approaches, improving the communication between customers and developers. In our approach, developers are divided into software designers that must provide a representation of the application (code independent), and programmers that translate this representation into real code. Customers also play a double role. They are the content providers that organize the knowledge included in the application (this knowledge has a double structure: natural and hyperlink structure), and they are the interaction designers who decide the time and space of content presentation.

Interaction between the content providers team and the software designers team is one of the problems that our methodology solves. We use an XML DTD, called the content DTD, to represent the contents of the application, and the hyperlinks between these contents. The content provider team describes the structure of the contents (using natural language), and the software designers team use this information to build the content DTD. Then, the content provider team generates an instance of this content DTD that organizes the contents of hypermedia application in a formal way. The use of meaningful tags, and the inclusion of attributes (properties) in these tags, solves the problem of content providers and software designers interaction.

Our methodology also eases the communication problems between the interaction designers team and software designer team. We use another XML DTD, called the presentation DTD to characterize the presentational structure of hypermedia applications. The elements of the presentation DTD describe the application presentational elements (screens, windows, buttons, etc.) and the hyperlinks between them. This DTD is common to all (or most) hypermedias, and is provided by the software designers team. Moreover we are working in the assignment of a concrete semantic to the presentation DTD, based on an object-oriented windows class hierarchy, to provide a consistent connection between the markup view, and the object-oriented view of our methodology.

This separation from content and presentation provides the means to associate different presentations with the same content. The relationship between content DTD and presentation DTD is accomplished through the overmarkup. Overmarkup basic idea is a very simple one: when we build the instance of the presentation DTD, to describe the presentation and interaction framework of the application, the elements of the content DTD are the content of the elements of the presentation DTD.

We apply this overmarkup in two phases. In phase 1, structural overmarkup, there are no real contents, and when the interaction designers (helped by software designers) build the instance 1 of the presentation DTD, the elements of the presentation DTD overmark the elements (only the name of the element) of the content DTD, enabling a better understanding of the structure of the application.

In phase 2, content overmarkup, when the interaction designers (helped by software designers) build the instance 2 of the presentation DTD, the elements of the presentation DTD overmark the instances of the elements of the content DTD (the real content) to represent the final hypermedia application. If we need to represent some complex computational activity in the application (for example an exercise that evaluates the learner knowledge) we use object-oriented diagrams (mainly class and state transition diagrams) that are attached to instance 2 of the presentation DTD. This instance 2 is what we call the application design document, and provides a real representation of the total application used by customers and programmers. Customers (content providers and interaction designers) use the design document to evaluate if it conforms to its requirements, and make any change (obviously they ignore the object-oriented diagrams). Programmers use part of this document in the coding phase directly, whereas other parts represent the application design that they must translate in real code. This task is facilitated by the relation between presentation DTD and real object-oriented code, and facilitates the maintenance of the final application.

4. Conclusions and future work

We think that our approach provides a solution for the development of hypermedia applications, solving the problems of hypermedia models and object-oriented construction techniques. Indeed our solution is not closed (we have integrated all the power provided by object-oriented development techniques), and is specifically created to deal with hypermedia software. Our experience in the Galatea development has showed us that XML markup (and its supporting tools) is easy enough to be used by customers (a similar approach is used in [Nanard and Nanard 95]), and the design phase is totally covered by overmarkup.

Content and presentation DTD improve the communication between customers and developers, and provide the means to capture the content and presentation structure in different stages. Overmarkup phases integrate these structures: structural overmarkup represents a fast application "prototype", and the design document is a complete application representation that solves the interaction problem between customers and software designers. Moreover we can use the structure provided by the presentation DTD to generate part of the object-oriented code (improving the communication between software designers and programmers), and the existence of the design document facilitates the application maintenance.

Present work includes the total assignment of an object-oriented semantic to elements of presentation-DTD. The next step is the development of a CASE tool that facilitates the overmarkup process, and that provides different views of the application (overmarkup view, window view - an application preview - and object-oriented view).

5. References

Booch, G. (1994). Object-oriented analysis and design with applications. Second Edition. Benjamin Cummings Publishing Company.
Campbell, B. and Goodman, J.M. (1988). HAM: A general purpose hypertext abstract machine, Communications of the ACM 31(7) 856-861.
Fernandez-Manjon, B., Navarro, A., Cigarran, J. and Fernandez-Valmayor A. (1998). Using Standard Markup in the design and development of Web educational software. Proceedings de TeleTeaching 98.
Garzotto, F., Paolini, P. And Schwabe, D. (1993). HDM: A model-based approach to hypertext application design. ACM Transactions on Information Systems 11(1) 1-26.
Halasz, F. and Schwartz, M. (1994). The Dexter Hypertext Reference Model, Communications of the ACM 37(2) 30-39.
Hardman, L., Bulterman, D.C.A. and van Rossum, G. (1994). The Amsterdam Hypermedia Model: Adding Time and Context to the Dexter Model. Communications of the ACM 37(2) 50-62.
International Standards Organization (1986). Standard Generalized Markup Language (SGML), ISO/IEC IS 8879.
Nanard, J. and Nanard, M. (1995). Hypertext design environments and the hypertext design process. Communications of the ACM, 38 (8) 49-56.
Navarro, A. (1998). Aplicaciones de los lenguajes de marcado en la abstraccion del diseño de un sistema hipermedia. Trabajo de Investigacion Departamento de Sistemas Informaticos y Programacion Universidad Complutense de Madrid.
Navarro, A., Fernandez-Manjon, B., Fernandez-Valmayor, A. and Sierra J.L. (in press 2000). A Practical Methodology for the Development of Educational Hypermedias. Proceedings of ICEUT 2000, 16th IFIP World Computer Congress 2000, Information Processing Beyond Year 2000.
Navarro, A., Sierra, J.L., Fernandez-Manjon, B. and Fernandez-Valmayor, A. (in press 2000). XML-based Integration of Hypermedia Design and Component-Based Techniques in the Production of Educational Applications.Ref. M. Ortega and J. Bravo (eds) Computers and Education in the 21st Century: Invited papers from the Spanish Congress on Computers in Education (Conied'99), Kluwer Academic Publisher.
Rumbaugh, J., Booch, G. and Jacobson, I. (1998). Unified Modeling Language Reference Manual, Addison-Wesley Object-Oriented Series.
Stotts, P.D. and Furuta, R. (1989). Petri-Net-Based Hypertext: Document Structure with Browsing Semantics. ACM Transactions on Office Information Systems 7(1) 3-29.
Tompa, F. (1989). A Data Model for Flexible Hypertext Database Systems. ACM Transactions on Information Systems 7(1) 85-100.
World Wide Web Consortium (1998). W3C Extensible Markup Language XML. <http://www.w3.org/TR/REC-xml>

Return to ALLC/ACH Programme

(1.1.3) A Workbook Application for Digital Text Analysis

Worthy N. Martin
Olga Gurevich
University of Virginia, USA

Thomas B. Horton
Florida Atlantic University, USA

Robert Bingler
University of Virginia, USA


The workbook facility for scholars in the humanities aims to help them organize the results of their research in a convenient and easily accessible manner. The recent proliferation of marked-up electronic corpora has produced a need for tools that would allow structured search and extraction from texts. However, not all potentially interesting features of a text can be described in terms of the mark-up hierarchy: some features involve overlapping elements of markup, others are too fine-grained to be marked up. Thus we need a mark-up independent method of searching, and tools that combine both. The text region-based approach to processing digital text resources is the proposed method of searching and extracting both marked-up and non marked-up information from a collection of texts. The workbook facility is a prototype application of this method that allows extraction and linking of portions of XML-formatted texts. The selection of the regions can be based either on their structural characteristics or on other features.


A text region (a.k.a. span) is a continuous portion of a document identified by its start and end offsets. It can be a complete XML element or just a string of characters. A text region can be created through a variety of operations described below. In our use of the concept, most XML elements are assigned a unique identifier within the document. The offsets for text regions are therefore relative to the nearest preceding ID within the document. A text occurrence object (TOO) consists of one or more text regions as well as notes and a user-defined name. The text regions can come from one or more documents and do not have to be contiguous. SGREP (structured grep) is a command-line search tool. It allows structured searches on XML and SGML formatted texts and collections of texts, as well as simple searches. The search results are returned as a set of text regions that can be organized into a text object occurrence. SGREP allows nested searches (for example an XML element labeled "verse" containing the word "Hamlet" within a DIV1 element) as well as unions and intersections of search expressions.

Workbook Facility

The workbook facility aims to help humanities scholars to organize the results of their research in a convenient and easily accessible manner. It provides a way to bookmark and annotate documents without changing the original texts, and to store and link annotated extractions from texts. The workbook consists of a set of TOO's, each of which contains one or more text regions that can originate from different documents. A TOO can thus link portions of texts from different places in a document or from different documents. The workbook facility has a built-in XML parser that creates a DOM structure. We are using IBM's Java-based parser, and the rest of the workbook is also written in Java. The following operations are available to the user:


The proposed workbook facility will be useful for several research goals. Extracting, ordering and naming textual fragments is a convenient way for an instructor to prepare for a lecture about a particular text. Scholars that study different versions of the same text (i.e. versions in different languages or different editions) can use the workbook to link parallel passages and annotate the resulting TOO. Since the creation of text regions can be markup-independent, this can be done even if the parallel passages in two documents are not contained within a single XML element. Moreover, extracting regions that share particular features can be automated with the help of SGREP. The word distribution feature of the program is intended to demonstrate that operations found in software like TACT and similar tools can be easily integrated with our workbook approach. Once a workbook is created, it still contains links to the original documents and the history of how the extractions were made. That is, the process is completely retraceable, and the user can view the context from which any text region came.


DOM (Document Object Model) standard <http://www.w3.org/DOM/>
Horton, Thomas B. (1999). A region-based approach for Processing Digital Text Resources, Digital Resources for the Humanities, King's College, London, Sept. 12-15, 1999, pp. 47-49.
Jaakkola, Jani and Kilpeläinen, Pekka. SGREP (structured grep), at the University of Helsinki, Finland.
XML standard <http://www.w3.org/XML/>
XML4J, the XML parser for Java produced by IBM: <http://www.alphaworks.ibm.com/tech/XML4>

Return to ALLC/ACH Programme

(1.2) Computational/Corpus Linguistics (Panel Session)

Electronic Resources for Historical Linguists. Part 1: Medieval Studies.

Chair: Christian Kay
University of Glasgow

This session and session 8.1 on Monday 24 July will introduce a range of resources of particular interest to historical linguists and to those concerned with the development of text corpora and databases. Each of today's papers will be followed by discussion, but major issues may also be raised at the group discussion on Monday.

(1.2.1) The Middle English Grammar Project

Jeremy Smith
Simon Horobin
University of Glasgow, UK

The study of linguistic variation in Middle English has undergone a revolution in recent years, with the publication of the "Linguistic Atlas of Late Mediaeval English" (McIntosh, Samuels and Benskin 1986) and other important surveys. However, no thorough-going attempt has as yet been made to harness this new information to a wider programme of linguistic description which is oriented from both structural and variationist perspectives. The Middle English Grammar Project, a British-Academy funded venture now underway at the University of Glasgow and at Stavanger College, Norway, is designed to address this gap, with the production of surveys covering the whole field of ME linguistic studies: spelling, phonology, grammar and lexicology. The Project is currently focused in the UK within the Institute for the Historical Study of Language, a research-centre within Glasgow's Faculty of Arts. The first research area being addressed by the Project is the creation of a new history of ME spelling and phonology. In order to carry out this analysis a corpus of machine-readable texts is currently being assembled. These texts are subsequently classified according to both Present-Day and etymological reflexes and the results of this process are entered in a database. This database includes a variety of extralinguistic information in addition to the classified spelling data, such as genre and script, which allows the corpus to be interrogated according to a number of different factors. This presentation will demonstrate this database and discuss its uses for the study of linguistic variation in Middle English, and historical linguistics more generally.

Return to ALLC/ACH Programme

(1.2.2) Two Historical Linguistic Atlases

Margaret Laing
Keith Williamson
University of Edinburgh, UK

The principal aim of the "Linguistic Atlas of Early Middle English" (LAEME ) and the "Linguistic Atlas of Older Scots" (LAOS) is to produce historical linguistic atlases complementary to "A Linguistic Atlas of Late Mediaeval English" (LALME)1. Computer-based data-processing and analysis have been employed in the projects from their inception in 1987 . Their methodology differs radically from that used in LALME. Instead of recording data by a questionnaire of prescribed items, entire texts are diplomatically transcribed and keyed onto disk, where they can be analysed linguistically using programs written in-house. Each word or morpheme in a text is tagged according to its lexical meaning and grammatical function, and each newly tagged text is added to the corpus. The tagging creates a taxonomy of the linguistic material in the texts and permits systematic comparison of their dialects. Information on particular items (defined by one or more tags) may be abstracted from the corpus to identify spatial and/or temporal distributions of the forms associated with the item. The programs generate dictionaries, concordances, chronological charts and input files to mapping software. Maps are produced to show distribution of features or full text forms. This method of analysis has considerable advantages over the questionnaire. Items for study can be selected from a complete inventory of linguistic forms rather than from some predetermined sample. Tagged texts are immediately and constantly available to be processed and compared in whatever ways are desired. While not all the material in the corpus will be useful for dialectal analysis, it remains available for a wide range of future studies: historical phonology, morphology, syntax or semantics. We will demonstrate our method of lexico-grammatical tagging and illustrate how it may be exploited not only for linguistic geography, but also for phonological and for syntactic investigations.


1 McIntosh, Angus, Samuels, M.L. and Benskin, Michael (1986). A Linguistic Atlas of Late Mediaeval English, 4 vols. AUP, Aberdeen.

Return to ALLC/ACH Programme

(1.3) The Electronic Classroom

(1.3.1) WebCAPE - Language Placement Testing over the Web

Charles D. Bush
Brigham Young University, USA

WebCAPE is a web-based implementation of the BYU Computer-Adaptive Placement Exam (CAPE) series. These exams use adaptive procedures to assess language ability, drawing from a large bank of calibrated test items. Tests are administered from a web server computer through the internet to a browser application on students' computers. Test security in WebCAPE is maintained through a combination of application design and standard web methods.


Students entering a university language program come with a wide range of previous language training and experience. Thus, determining which class students should enroll in becomes an enormous task for language departments. A placement exam can be used, but paper-based standardized placement exams bring their own headaches: students have to be brought in at a fixed time and place, the test takes a long time to take and everyone has to wade through all the questions, and then you have to wait while the tests are sent in for scoring and for the results to come back.

The Humanities Research Center at Brigham Young University has developed a set of language placement exams that overcome these problems. The exams are delivered by computer and thus do not require a lock-step controlled environment. The exams are adaptive, effectively eliminating questions far above or below the students' ability. And since questions are drawn from a large bank of test items, each student gets what amounts to a unique test, thus avoiding problems with test security. The computer-adaptive approach also means the test need only take long enough to determine a particular student's ability level and produces a placement score on the spot.

Underlying the BYU CAPE (Computer-Adaptive Placement Exam) application is a large bank of calibrated test items. Initially, several hundred questions were written, testing a variety of language skills: vocabulary, grammar, reading comprehension, etc. These were then statistically calibrated for difficulty and discrimination among ability levels. Test item banks have been developed for Spanish, German, French, Russian, and most recently, English as a Second Language.

Originally CAPE tests were implemented as individual application programs. But more recently, a new implementation has been developed for an internet/web environment. This version, called WebCAPE, uses a web server for the core functionality and test item banks, but administers the actual tests over the internet through a standard browser application. Thus WebCAPE tests can be given on any computer with an internet connection and Netscape 4.0 or equivalent.

How It Works

Students enter WebCAPE through their school's menu page. This page is customized for the school, incorporating their seal or logo, a background campus scene, and school colors into the page design. Page content includes a brief explanation of the tests and how the school utilizes their results, along with basic instructions for taking the test.

When students select their language from the menu page, they go to a registration page. This page is also specific to the school. Students enter their identification information in the top section of the page. They may also enter their e-mail address for an e-mail copy of their test results.

Clicking the Begin Exam button takes students into the actual exam. After some initialization, a new browser window opens to the exam environment. This is served completely from the WebCAPE server and is the same for all tests. The top frame contains title information. The middle section displays the current test item. The bottom frame contains the exam control panel where the students indicate their answer, then click Confirm Response to register their response.

First, the exam presents six level check questions selected from the full difficulty range. Based on these answers, the algorithm computes a preliminary ability estimate. It then begins to probe with questions to fine-tune that estimate. In essence, it presents harder and easier questions until it can focus to a statistically reliable value. On average, the entire testing process takes 20-25 minutes.

When the exam finishes, students are returned to the registration page and their results are posted in the bottom section of the page. Here their final ability estimate is mapped to a recommended course by reference to a table of cut-off points established by the school. Beginning and ending time-stamps are also posted, for validation and timing purposes. In addition, the exam returns details of the students' session. These are not normally displayed for the student to see, but are sent to the school for analysis.

As a final step, students click Submit Results. This generates an e-mail message with all of the information to the school/department and a summary message to the student's e-mail address. A confirmation page is also generated by this process, which in turn takes them back to the menu page, ready for the next student.


WebCAPE is designed to be reasonably secure for its intended use. Access control is maintained through the menu page, registration pages only accept entrance from a corresponding menu page, and the exam environment can only be entered from a properly configured registration page. Both are maintained on the WebCAPE server and isolated from outside access. The host school controls access to their menu page, typically by either setting up links or bookmarks in the lab where they administer the exam, or by only giving the URL to their properly registered students. When needed, the WebCAPE server can also be configured to restrict access to a particular IP address or range (a designated lab, for example), or to require a userID and password.

Test security is maintained mostly by the design of the page set. Test items are individual html files that do not contain answer information or other clues. The answer key is read into the programming of the control frame when it loads. And should a student manage to hack into the answer table, there is no way to identify which answer goes with which question. At the server level, hackers are thwarted by the built-in resistance of the server and operating system configuration.

Foolproofing security is more problematic. Because it uses a standard internet browser, WebCAPE cannot keep students from doing things the browser allows: like closing the window or using the back button or even quitting the browser program entirely. All that can be done is to write clear instructions and incorporate warning alerts. This suggests administering WebCAPE tests in a controlled environment (like a student lab) with a standardized browser configuration and a proctor to monitor things.

The biggest vulnerability to cheating in WebCAPE comes at the student workstations. The exam cannot prevent students from using a dictionary, getting help from friends or even having someone else take the test. This clearly demands a proctored environment.

WebCAPE test security measures mentioned thus far try to prevent cheating to get a higher score. But students taking a placement test may also try to get a lower score. The current version of WebCAPE does not directly address this problem. However the results message it sends to the department does include details of the test session, which can be analyzed when needed - an alert human would immediately suspect a test with all-wrong answers. Other suspicious patterns like all-one-letter answers or only a few seconds between answers can also be easily recognized by a human. The next version of WebCAPE will probably include processing to at least flag patterns such as these.

Ultimately there may still be a few students that get misplaced, either by faking out the test or by slipping through the statistical margin of error. These will still have to be dealt with by human administrative procedures. But WebCAPE should keep the number small enough to manage.


WebCAPE is implemented as a service rather than as a program package. Schools pay for access for their students to take the tests rather than buy the program itself. For most schools, the best alternative is a flat fee for unlimited tests, but a lower-cost option for a fixed number of tests is also available. In each case there is also a one-time setup fee for creating the customized menu and registration pages. Planning is underway to also implement a pay-per-test entrance to WebCAPE. This would allow someone to take the test on their own initiative to see how they might place into college-level courses. A third configuration as a high school exit exam has been proposed. This would be for high school language programs, allowing students to find out where they would place into college courses.


WebCAPE is currently available for French, German and Spanish, with Russian to be added shortly. For all four languages, the test items are strictly text-based questions. While different test items assess different aspects of language ability, they are still confined to written text. But the latest CAPE exam under development, English-as-a-Second-Language, goes beyond that text-only limit. ESL-CAPE incorporates sound clips into many test items and thus adds listening comprehension to the language skills it assesses. ESL-CAPE also has an option to calculate separate ability estimate scores for grammar, reading comprehension, and listening comprehension - the other exams only give a composite estimate. A stand-alone implementation of ESL-CAPE is now being piloted. Web implementation will be ready for production testing next year.

Selected Bibliography

Madsen, Harold S. and Larson, Jerry W. (1985). Computerized Adaptive Language Testing: Moving Beyond Computer Assisted Testing. CALICO Journal 2.3 (March 1985), pp 32-36.
Larson, Jerry W. (1989). S-CAPE: A Spanish Computerized Adaptive Placement Exam. William Flint Smith, ed., Modern Technology in Foreign Language Education: Applications and Projects. ACTFL Foreign Language Education series: National Textbook Co. Skokie, IL. pp 277-289.
Larson, Jerry W. (1996). Computerized Adaptive Placement Exams in French, German, Russian, and Spanish. Foreign Language Notes, Newsletter of Foreign Language Educators of New Jersey, Spring 1996, Vol. XXXVIII, No. 2, pp. 13-15.
Larson, Jerry W. (1996). An Argument for Computer Adaptive Language Testing. Methods and Applications of CALL for Foreign Language Education in Korean Universities. Proceedings of the Conference on Applications of Computer-Assisted Language Learning (CALL), Dongduck Women's University. Seoul, Korea, November 29, 1996. pp. 51-80.

Return to ALLC/ACH Programme

(1.3.2) Technophobes, or the Nintendo Generation? A Study of the Use of ICT in Teaching and Learning in Modern Languages

Claire Warwick
University of Sheffield, UK


This paper discuses the results of the application of a methodology typical to information science to humanities computing. User studies are widely performed in the library and information science community. However, although some research has been carried out into the information needs of researchers in the humanities, very little research has been done into the actual use of electronic resources. (Warwick, 2000). Modern languages is an area in which the usage of electronic resources in teaching is known to be widespread. Yet the recent HEFCE report (1998) found that over one third of universities felt that computer-assisted learning (CAL) and information and communication technology (ICT) resources were being under-utilised. It concluded that there is a need for more research into their use in HE and recommends a "focus on the information and knowledge needs of the real end-users". This paper seeks to address this need, and considers the way that both teachers and students of Modern Languages use electric resources, and what their perceptions about them are. It is based on work conducted in the department by myself and a Masters student as part of her dissertation. (Pine-Coffin, 1999). We argue that such research is an important contribution to the area of humanities computing, since without an accurate idea of the way in which resources are used and perceived it is impossible to tell whether computer methodologies are useful and successful in aiding teaching and learning. Without this type of user study it is difficult to plan for possible future developments.

Methodology Research was of a mainly qualitative nature, and was undertaken by means of structured interviews and questionnaires. Questionnaires were given to students and both students and academics were interviewed. Three university Modern language departments were chosen as a sample, all of which were identified as having links with humanities computing projects, Sheffield (Humanities Research Institute), Hull (CTI Modern Languages) and Exeter (Project Pallas).

Results and discussion

Use of ICT: We found a surprisingly small amount of computer use by students. They used computers mainly for Word Processing, reading foreign newspapers and accessing the internet. Despite the advice available from the excellent CTI centre at Hull, the use of CAL packages at all universities surveyed lagged behind all these generic applications in terms of frequency of use. Despite promotion by libraries, there is also a worryingly low level of use of library web pages and of subject gateways, of BIDS and other bibliographic packages.

Attitudes to ICT usage: Preferences in terms of computer use (ie what applications the students liked using) do not always mirror frequency. For example, students found they often needed to use Word Processors, but did not especially enjoy doing so. We also found that despite what academics tend to assume, their students do not necessarily enjoy using electronic resources. The 'Nintendo generation' is, it appears, still technophobic and surprisingly conservative in its preference for paper resources. We also found that students make 'tactical' use of resources. Despite the perception amongst lecturers that students will enjoy playing with computers, once introduced to them, we found that they tended to use them only to the extent that they had been convinced of the necessity of doing so. If they became convinced that they could pass an assignment by limited use of an electronic resource, they were often unwilling to explore further or practise the use of it, even when some packages had been specifically designed for certain courses.

Departmental and academic attitudes

A constant theme of the research was that student reluctance to use electronic resources can be combated, at least to some extent, by the recommendations of their lecturers. This is not always easy to achieve, however, as the attitude of academics themselves is vital. We found some interesting and imaginative use of ICT, whether in the form of internet usage or of CAL tutorials. It is perhaps not surprising that CAL is what academics used most enthusiastically, since it could be used in a unique fashion which printed resources could not replicate. However, we also found a lack of awareness of ICT amongst academics, many of whom were also wary of computer use. Many felt there was little incentive to use ICT when traditional resources were adequate for the job in hand. With multiple demands on their time, computer sceptics were also unwilling to give up time to learn new ICT skills. Even those who were enthusiastic about the use of computers were wary of publishing their research in e-journals. They expressed anxiety about whether conservatism on RAE panels would lead to electronically published articles being dismissed as insufficiently prestigious. Some expressed a view that older, more established scholars who tended not to use computers were likely to be on RAE and promotion panels, and so computer enthusiasts might find their work was undervalued. This all led to a sense of conservatism in research, even if in teaching they tended to use ICT more widely.


Even if students felt that the department was encouraging them to use ICT, the most important factor in its successful use was support. However, awareness of what support was on offer was still low, as was takeup of it. Students often hesitated to ask for help, even when aware of it, and chose always to ask friends for help in the first instance. They tended to prefer human advice to online help and would rather receive help from their academic tutors than computer support professionals. They tended to assume that lecturers were more important than 'some bloke from the computing services' who came in to show them how to use a package. Thus, if a lecturer could demonstrate use of ICT him/herself, students tended to presume that this was indeed important. The opposite assumption was also made, although lecturers were often unaware of the messages they were involuntarily delivering. We also found a large discrepancy between the level of support which students perceived they needed, and which academics thought was acceptable. Academics tended to assume that most students would cope easily with computer use, because they thought that all teenagers were computer enthusiasts who has been trained to a high level of ICT skill at school. Many students, however, felt that the amount and level of support was too low, and that they needed far more help than they received. In general the level of confidence which students expressed in the use of ICT was surprisingly low. Some students also felt that they lacked skills in important areas such as internet searching. The librarians we interviewed were aware of this problem, but the university lecturers tended not to be. Unfortunately, the librarians were pessimistic about their role in the official teaching of such skills, since they felt that students and academic staff alike tended to undervalue the skills they had to offer, and there seemed to be few channels of communication from librarians to academics.


The paper will discuss various conclusions which may be drawn from this data. The most important one, however, seems to be that despite enthusiasm about the potential of ICT in modern languages on the behalf of some academics and many humanities computing professionals, there are several problems in its practical implementation. It is only when user surveys are performed that such problems come to light, and we discover the reality of the situation, as opposed to what we think ought to be happening. Another strength of the technique is to discover attitudes that users have to the technology available, and we will argue that this is vital, since attitudes must shape the way in which computers are used. Our research has also uncovered a significant discrepancy between assumed and actual levels of usage and enthusiasm. It is clear that despite the expectations and assumptions made by lecturers, students do not necessarily enjoy using ICT, nor do they always find it easy to use. This has important implications for the use and support of ICT in the field of modern languages, and we will end the paper by discussing how our findings might be used to improve the experience of students and academics alike.


HEFCE (1998) An evaluation of the Computers in Teaching Initiative and Teaching and Learning Technology Support Network. Available at <www.niss.ac.uk/education/hefce/pub98/98_47.html>
Pine-Coffin, H. (1999) An investigation into the use of electronic resources in the field of Modern Languages. Unpublished MA dissertation, University of Sheffield.
Warwick, C. (2000) English literature, electronic text and computer analysis: an impossible combination? Forthcoming in Computers and the Humanities.

Return to ALLC/ACH Programme

(1.3.3) Computer-Aided Acquisition of Language Teaching Materials from Corpora

Svetlana Sheremetyeva
New Mexico State University, USA

Awareness of domain-tuned linguistic peculiarities present in expository texts is a relevant concept in helping students' reading and writing competency in terms of genre literacies.

Support for this point of view comes from the analysis of academic written genres, competing demands for limited resources, the tyranny of scheduling and from graduate students' verbal protocols about their reading process (Sengupta 1997).

Genre literacy or sublanguage approach in instructed SLA advocated in this paper tries to exploit lexical, morphological, syntactic and semantic restrictions on the specialized languages used by experts in certain fields of knowledge for communication or in particular types of texts (technical and scientific articles, instructions, installation manuals, etc.).

Notions of sublanguage distinctiveness rely on linguistic knowledge concerning different kinds of sublanguage regularities and restrictions. (Kittredge and Lehrberger 1982). Sublanguages are special subsystems of a natural language with restricted vocabulary and grammar which, on the one hand, share some properties with a language as a whole and, on the other hand, are characterized by some deviations from "general" language.

As far as language instruction is concerned, both defining the content of this knowledge and ways of sublanguage knowledge elicitation are problems which do not have a single answer. Despite a long-standing interest in the analysis of written genres (sublanguages), little research has focused on how to really use genre specificity in language instruction.

This presentation explores critical issues in the selection of an appropriate methodological framework for the analysis of profession-related texts. It aims to provide suggestions as to the kind of sublanguage analysis method that is supposed to form the basis for developing a system of typological parameters useful in acquisition of teaching materials and thus tuning language instruction to the needs of professional communication.

To describe a particular sublanguage it is necessary to study laws underlying natural language phenomena and laws which make a sublanguage differ from a language. Sublanguages can be described in many ways. Language instruction is influenced by such practical parameters as scope and nature of vocabulary, grammar specificity, potential for ambiguity, lexical and grammar correlation, if any, which can and should be discovered on the basis of corpus analysis (Biber et al. 1998; Wichmann et al. 1997).

This study focuses on verbs, as they are central to the structure of a sentence and consequently to text structure (Levin 1993; Aarts and Meyer 1995). The reason is that in professional reading most problems usually derive not from technical nouns and noun expressions which are relatively easy to find in specialized dictionaries but from grammar which is often characterized by extended sentences with frequently long and telescopic embedded structures.

The current study also proposes and tests a sublanguage-specific hypothesis of correlation between lexical meaning, morphological representation (tense, voice, finiteness) and syntactic realization (subject, object, predicate, attribute, etc.) of a particular verb in a sublanguage.

Material for the research includes five corpora of 50,000 words each from different technical sublanguages: aerospace engineering, automobile engineering, mechanical engineering, technology engineering and patents. The sample corpora are taken from four technical journals (Space Flight, Automobile and Tractor, Materials Engineering, and Machine Design) and a corpus of US patent claims.

The main method of analysis is a computer-aided corpus-based combination of qualitative and quantitative (statistical) techniques applied to a pre-tagged corpus, which proved to be useful for linguistic knowledge elicitation (Sheremetyeva 1998). Tagging, done manually by trained linguists, codes morphosyntactic realizations of sublanguage verbs. For example, in the sentence "Making_TIA this apparatus they used_2IA a new technology", the tag TIA means that the verb "make" is used as an adverbial modifier in the form of Present Participle, the tag 2IA shows that the verb "use" is realized as a predicate in the form of Past Simple Active.

This methodology allows for a standard automatic frequency count procedure to be applied to provide:

a) a verb inventory and its size in terms of verb occurrences;

b) a verb morphology and grammar inventory and their sizes in terms of occurrences of specific values of tense, aspect, voice, finiteness/nonfiniteness and syntactic functions as well as in terms of co-occurrence of grammatical features (for example, in the sublanguage of automobile engineering the most frequently used nonfinite realization of verbs is the Past Participle in the function of attribute, while no realization of verbs as Gerunds or Infinitives in the function of subject was found);

c) an inventory of lexical and morphosyntactic correlations (for example, in the sublanguage of automobile engineering the verb "use" is most often realized as the Past Participle in the function of attribute while the most frequent realization of the same verb in the aerospace engineering sublanguage is the Present Participle in the function of adverbial modifier).

Qualitative analysis of each of the above inventories included sense analysis. The number of senses for each lexeme in a sublanguage is, on average, much smaller than in the language as a whole. Thus, of the seven senses of the word engage in the Cobuild English Language Dictionary, the patent sublanguage uses only one, which includes this word in the following synonym set: engage, hold, attach, lock, join, clamp, fasten. Clearly,paradigmatic and syntagmatic relations are different in a sublanguage.


The paper presents a computer-aided methodology and the results of selecting teaching materials for optimizing students' reading and writing competencies in terms of genre literacies on the material of four technical sublanguages.

The results of the study show "deviations" of every sublanguage from the general language and from each other. They also confirm that there exists a correlation between lexical meanings of many sublanguage verbs and their morphosyntactic realizations. These deviations can be used for selecting professionally oriented language teaching materials to most effectively foster language proficiency development.

The approach was tested and proved to be very useful at the Department of Foreign Languages of South Ural State University (Russia). It is expected to be portable to other sublanguages and can be used both for developing theoretical and practical issues in applied linguistics.


Aarts, B. and Meyer, Ch.F. (eds). (1995) The Verb in Contemporary English: Theory and Description. Cambridge University Press, Cambridge.
Biber, D., Conrad, S. and Reppen, D. (1998). Corpus Linguistics. Investigating Language Structure and Use. Cambridge University Press, Cambridge.
Kittredge, R. and Lehrberger, J. (1982) Sublanguage: studies of language in restricted domains. Berlin.
Levin, B. (1993) English Verb Classes and Alternations. University of Chicago Press, Chicago.
Sengupta, S. (1997) Academic reading skills for L2 learners: Does teaching selective reading help? Proceedings of the Annual Conference of American Association for Applied Linguistics. Seattle, March 13-17.
Sheremetyeva, S. (1998) Acquisition of Language Resources for Special Applications. Proceedings of the workshop Adapting Lexical and Corpus resources to Sublanguages and Applications in conjunction with The First International Conference on Language Resources and Evaluation, Granada, Spain, May.
Wichmann, A., Fligelstone, S., McEnery, T., and Knowles, G. (eds) (1997). Teaching and large Corpora. Eddison Wesley Longman Inc., New York.

Return to ALLC/ACH Programme

(1.4) Posters & Demonstrations

(1.4.1) Records to Go: Building the Humbul Humanities Hub

Michael Fraser
Oxford University, UK

The Humbul Humanities Hub is a service of the Resource Discovery Network (RDN), receiving funding from the Joint Information Systems Committee (JISC). The RDN comprises a number of hubs each of which offers a range of services to a broad subject-based community within the UK. A number of the hubs have evolved from centrally-funded subject gateways (e.g. SOSIG) whilst at least two of the hubs, including Humbul, are building their services from the foundations upwards.

Humbul has a comparatively long history, commencing life in the mid-1980s as a bulletin board for humanities scholars interested in information technology based within the University of Bath's Office for Humanities Communication. Since 1991 it has been based at and supported by Oxford University. Humbul evolved into a gateway to Web resources in 1994 and from static Web pages to a database in 1997 (Stephens, 1997). Humbul became part of the RDN in August 1999 and has three years' funding to develop both the service itself and also a business plan for financial viability beyond the period of central funding. Developing the Humanities Hub more than five years into the existence of the Web, four years after the launch of comparable UK subject gateways, surrounded by multi-million pound portals, is proving to be something of a challenge.

The poster session will outline two sides of developing the Hub: the vision and the diary.

The vision

The Humbul Humanities Hub seeks to combine structured metadata with the scholarly review process. The first milestone for the Hub to reach is the construction of a database which holds rich descriptions of Web resources relating to the study of selected humanities disciplines (English, history, archaeology, classics, European literature and language, religion, philosophy). Each record consists of fields following qualified Dublin Core metadata which enables us to record information about both the Web site and its intellectual content including location, creator, language, coverage (both temporal and spatial), date created and modified, and relationship to other resources. The qualifiers to the Dublin Core elements follow those proposed by the Dublin Core working groups. Each record itself has a set of metadata describing such things as the creator of the record, date created and edited, and rights associated with it.

It is essential for Humbul to build up a critical mass of data in the shortest time possible whilst remaining true to its claim to describe only sites which comply with the Hub's published selection criteria (in summary, resources fit for higher education teaching and research). Populating a subject gateway requires substantially more human effort than populating a Web search engine (which requires substantial computing effort). However, the key lies in finding a balance that can be struck between automation and manual work to ensure that labour is distributed in an appropriate fashion.

The process of getting data into the Hub should work something like this: an automated harvester gathers basic metadata from web sites linked from 'trusted' subject gateways (which in another context were referred to as 'amateur gateways' - see Fraser, 1999). Basic metadata includes title, URL, date modified, subject area and, where present, further metadata present in the page header. Records from the harvested metadata are then presented via a Web interface to the Hub's data providers for review, editing, completion of the metadata and submission to the Hub's main database. Of course, data providers may also create records from scratch but at least they may check that a basic or full record does not already exist for the site. To enable the delivery of relevant records to data-providers it is necessary to provide a means of authentication. Data-providers register with the Hub and, through the use of a username and password, can access their own customised editing environment. Within this environment records relevant to the provider's subject expertise may be created, edited, rejected. Records may also be reviewed and links automatically checked. And the combination of authentication (which automates the creation of the metadata about the metadata) and a customised environment allows data-providers to export the records they have created for use, for example, within their own site or imported into a library catalogue (exporting will be in the form of HTML, XML/RDF, and MARC in the first instance).

The reuse of records by data-providers also places them within the sphere of the Hub's users. Indeed, it is an aim of the Hub to market itself as a 'gateway provider' as well as a gateway. The development of systems which permit the exporting of data and its reuse locally will hopefully assist individuals and organisations who have the subject expertise to evaluate and describe resources but have neither the resources nor the desire to maintain links, develop databases, or code HTML. Eventually, we envisage a user, whether a researcher, librarian or student, visiting the Hub, searching and browsing for records relevant to their interests, selecting and saving records as they proceed through Humbul, and then, on finishing, being presented with some piece of code for insertion within their own local Web page. In this manner, a lecturer, for example, can continue to create course pages but, on pasting a piece of code within the local web page, records served direct from Humbul can be dynamically inserted within the page each time the page is accessed. The lecturer need only select the records, create a basic web page, insert the magic code, and leave the maintenance of the records (like link checking) to Humbul.

The Diary

By the end of June 2000 the new Humbul will hopefully have been launched complete with new input and retrieval systems together with enough data to make it worth visiting. The poster session will show via visual images and oral discussion a comparison between the partial vision outlined above and the actual process by which the modules were built and tied together; the combining of human and automated effort (and the risks inherent with undertaking either); an outline of how the Hub intends to cut costs and make money; and a summary of how collaborating with Humbul as an advisor, an evaluator, a describer, or a cataloguer might result in mutual and tangible benefits for all parties.


Fraser, M. (1999) "Selecting Resources for a Subject Gateway: Who Decides?" ACH-ALLC Conference 1999. University of Virginia.
Stephens, C. (1997) "HUMBUL Updated: Gateway to Humanities Resources on the Web." Computers & Texts 15: 23.

Return to ALLC/ACH Programme

(1.4.2) LOOKSEE: Software Tools for Image-Based Humanities Computing

Matthew G. Kirschenbaum
University of Kentucky, USA

This poster will document LOOKSEE, a Web site and discussion list for image-based humanities computing <http://www.rch.uky.edu/~mgk/looksee/> (first announced at the 1999 ACH/ALLC in Charlottesville). Its primary focus is not imaging and image acquisition - areas with well-established technical literatures - but rather the creation and manipulation of images as structured data in digital editions, archives, and libraries.

LOOKSEE is therefore intended as a community focal point for discussion and development of next generation image-based humanities computing projects. Although the term "image-based" humanities computing has been in circulation for some time, we are now approaching a watershed: a number of pioneering projects (many of them begun in the early nineties) whose promise could heretofore be discussed only in speculative terms are now coming to fruition, while new software tools and data standards (notably JPEG 2000) are poised to redefine the way we create, access, and work with digital images. All of this activity, moreover, is transpiring at a moment when there is an unprecedented level of interest in visual culture and representation in the academic humanities at large.

At present, LOOKSEE consists of:

1. The Web materials at the URL above, collecting resources ranging from computer science to medical informatics to art history, as well as demos and proofs of concept to create a kind of sketchbook for image-based humanities computing.

2. A listserv discussion forum. Though the list is (technically) unmoderated, it is run as a structured discussion in which topics are brought forward at set intervals for the participants' consideration. The first discussion, held in November and December of 1999, revolved around humanities applications of techniques in medical image display and image processing; a second discussion (February/March 2000) will feature artist Johanna Drucker's Wittgenstein's Gallery, a series of over one hundred conceptual drawings constituting a working model of vision, perception, and (re)cognition. (We will use Drucker's work as the basis for a discussion of computer-assisted image analysis.) Later discussions will be given over to producing specs for a suite of open source image analysis tools and will attempt to identify the range of activities that can be supported by computational tools.

3. The LOOKSEE Web site will then expand to include source code, demos, and documentation.

A poster at ALLC/ACH will allow for an opportunity to present preliminary results - both technical experiments and conceptual formulations - to the international humanities computing community while (just as importantly) enlisting participants and establishing contacts with others working with images in humanities settings.

LOOKSEE is edited and directed by Matthew G. Kirschenbaum and hosted by the collaboratory for Research in Computing for Humanities at the University of Kentucky.

Return to ALLC/ACH Programme

(1.4.3) A Wake Newslitter - Electronic Edition

Ian Gunn
Napier University, UK

The 'A Wake Newslitter - Electronic edition' reproduces in a hyperlinked and searchable format the complete print run of the seminal journal on Joyce's Finnegans Wake which started out as a mimeographed news-sheet passed between a few scholars in the early nineteen-sixties. Large areas of the scholarship contained in these journals has never been superseded and yet the journal itself is very hard to find as it was not published or archived in traditional forums.


The Genesis

The Wake Newslitter Project had its Genesis in a chance remark made on the FWAKE-L electronic discussion list on the Internet in November 1996. The FWAKE-L list and its companion list FW-READ have filled some of the gaps left by the demise of 'A Wake Newslitter'. Although the expected brevity of email messages precludes posting in-depth studies, these lists have become effective forums for the discussion of both annotation and explication of Finnegans Wake.

It was in this context that a cry went up as to why 'A Wake Newslitter' was not being reprinted and as to whether there were any enterprising publishers prepared to take up the task.

Email inquiries were sent to Fritz Senn in Zurich and Clive Hart in Essex to canvas their views and also to offer possible solutions. Fritz Senn was amenable to the idea and Clive Hart actively supportive of any project.

The Problems

My own holdings of 'A Wake Newslitter' material were sparse and covered mainly the later issues. Initial tests made it clear that the journals would need to be retyped in order to digitise and edit the material. Clive Hart informed me that all the Old Series were produced using stencils and, even though I knew some of the issues were produced by letterpress, I decided that the whole publication run would need to be re-keyed into computers. A mammoth and error-prone task - I needed help.

So some issues were mimeographs; some issues were letterpress or litho and some were typed and photocopied. But there was more - some issues included illustrations, diagrams, tables and even a few halftone photographs and illustrations - it would be necessary to scan, redraw and source original material. Finnegans Wake scholarship and explication covers a very diverse range of cultures and languages. The typing and printing limitations of the time of original publication of the Newslitter meant that a large number of issues were retouched by hand to provide the appropriate accents or signs. The constantly changing methods of production had a strong impact on the ability to maintain any consistent editorial and typographical style. Typefaces, sizes, leading, paragraph styles and punctuation would vary from issue to issue, printer to printer and even sometimes page to page. Italic type was not always available and an overuse of bold face was inevitable in the original publications including the use of some very exotic faces in later volumes.

To compound all this 'A Wake Newslitter' had become a significant reference source in Finnegans Wake studies and therefore its material had been heavily cited in critical literature. Any new electronic edition would have to accommodate the pagination of the originals due to critical - page-specific - citation. This was even more daunting considering that the old series and the new series had vastly different page sizes. The Old Series was mimeographed on crown quarto paper (10x7.5in) with a text density of about 400 words per page. New Series issues started on demy octavo (8.75x5.5in approx.) at a text density of 450 words per page but by Volume X the page size was A5 (8.25x5.75in approx.) and the text density over 600 words per page. The type size in parts of some issues was as small as 6 point.

It was decided to set all the issues in A5 format and a basic template was created with running heads, rules and page numbers. This meant that the Old Series material was condensed down to A5 - which due to the monospaced original text was fairly easy - and a cover page conforming to the New Series was added. A number of liberties had to be taken with standard typographical practice in order to fit diverse material into the new format. Some pages had to be hit quite hard with the 'digital' equivalent of a mallet.

The Collaboration

As some issues required retyping into computers and all material would need proofreading, it was obvious I needed further help. I turned to the same forum that had voiced the desire for 'A Wake Newslitter' to be made available. An email was sent to the FWAKE-L requesting volunteers to type, and eventually proofread A Wake Newslitter issues. The response was immediate with about twenty volunteers coming forward in the first few weeks. Reflecting the membership of FWAKE-L and demographic access to the Internet the majority of the responses came from the United States but there were also offers of help from Australia, England, Finland, France, Italy and Thailand.

The Technology

Electronic mail and the Internet were the main communications tools of the project - enabling the speedy exchange of messages and the ability to view the progress of the project itself on the World-Wide Web. There was in excess of 700,000 words to be typed and then proofread by the team and this ultimately would be achieved by using both old and new technologies. Photocopies were posted to the members of the team and electronic copy returned using FTP (File Transfer Protocol) to an Internet drop zone.

As photocopies of the original issues were received from Clive Hart it became evident that a larger proportion of the New Series issues than had originally been expected were printed by letterpress or litho. This raised the possibility of avoiding the retyping of all the issues. A few trial issues were processed using scanners and OCR (Optical Character Recognition) software and the results were encouraging. Although the scanned material was by no means error-free the speed of input made the scanning of the letterpress and litho printed material a viable alternative to - the also error-prone - retyping option. Scanned material processed using OCR software - while offering a faster input mode - also requires more diligent proofreading due to the unique and sometimes insidious errors created by such processes.

All material whether typed or scanned would require formatting and typesetting using desktop publishing (DTP) software. Artwork and diagrams were either redrawn on computer or scanned and tables were rebuilt within the DTP software. Most accents and special characters were available on the computer systems but the Finnegans Wake sigla required the use and adaptation of a Wake typeface which is under development as part of another project.

From the outset it was determined that the electronic reproduction would be required to respect the pagination and layout integrity of the original material. The mechanism chosen for this was Adobe Acrobat PDF (Portable Document Format). Once the material has been desktop-published this software enables the replication of the individual pages of a publication within a single electronic document. The PDF format is designed to be platform-independent and offers a range of features such as indexing, keyword searching and hyperlinking. Another advantage of this format is that all text is indexed and multiple documents can be searched for keywords using Adobe Catalog software.

Editorial policy

The initial plan was to keep editing to a minimum with proofreading and fidelity to the original being the main criteria. Italic type was substituted where bold face had been overused and typographical features and references, (AWN instead of the earlier WNL), were standardised. Variations in punctuation and grammar were left untouched except in very obvious cases of setting errors. Proofreading threw up some errors in the original material, (Huge B. Staples instead of Hugh B. Staples), and these were generally amended only when they were factual matters such as names, page references and citations from Finnegans Wake. These citations were only corrected when it was obvious they were in error and would have no possible impact on the article interpretation. An instance reminiscent of the production of Ulysses occurred in New Series Volume III No 3 pages 51-53 where the typesetter appears to have corrected Joyce's Finnegans Wake passages by altering 'figuratleavely' to 'figuratively' and 'yoursell' to 'yourself'. Where corrections appeared in later issues these were only implemented in the original if the corrections were supplied by the original author. Danis Rose supplied an explanation of a correction to New Series Volume X No 3 page 45 which also updated the correction in Volume XI No 2 page 34. Roland McHugh confirmed what initially looked like a typo in a transcription of a FW notebook, ('girsl' in Volume XVI No 6 p83), and then on reflection and consultation amended the original entry.


The 1968 Sydney University Press publication A Wake Digest has been added to the archive. This contained selected and updated material from the Old Series of A Wake Newslitter along with some new articles. The conversion of A Wake Newslitter to digital format both preserves and opens up this valuable archive of material to a new and wider audience of Joyce scholars.

Return to ALLC/ACH Programme

(1.4.4) Modeling the Crystal Palace

Chris Jessee
Michael Levenson
Will Rourk
Dave Cosca
University of Virginia, USA


The Institute for Advanced Technology in the Humanities at the University of Virginia is providing historians, architects and students new insights into the history, construction and appearance of the Crystal Palace. These insights are derived from the construction of 3-dimensional computer models and simulations. This project is a small part of the larger "Monuments and Dust". In "Monuments and Dust" an international group of scholars will assemble a complex visual, textual, and statistical representation of Victorian London - the largest city of the nineteenth-century world and its first urban metropolis. A full description of "Monuments and Dust" can be found online. See the "WEB-LINKS" section.


Popularly known as the Crystal Palace, the 1851 London Exhibition Building was designed by Joseph Paxton for the Royal Exhibition. On June 20th 1850 Paxton delivered his original design to the Industrial Exhibition's executive committee. Only two weeks had passed since the building commission had introduced a clause to allow Paxton to submit a design. Controversy swirled around the exhibition as Col. Charles de L.Waldo Sibthorp's concern over the destruction of elm trees prompted the Times to question the sanity of constructing any permanent structure on the Hyde Park site. The commissioners responded by requesting that Paxton alter his design to include a barrel-vaulted transept that would cover and save the elms. The Industrial Exhibition's opening day was less than a year away on May 1st 1851. The Crystal Palace design received popular approval in the press. All other proposed designs for the exhibition building were monolithic masonry and could not be completed in time for the scheduled opening. With no other reasonable plan at hand the building commission accepted Paxton's design with the provision the building be removed from Hyde Park by June 1st 1852. As the foundations were being laid, Paxton's lack of architecture or engineering credentials drew criticism regarding the stability and safety of his design. The design is in essence an elaborately scaled copy of the Duke of Devonshire's Chatsworth Conservatory. Paxton designed the conservatory while employed as the Duke's gardener. In response to the criticism all iron girders were tested on site prior to installation. Wooden cross-bracing was added to visually reassure visitors the wrought iron trusses were sturdy and could withstand the load of the thousands of anticipated visitors. Paxton's reliance on his previous glass-enclosed conservatory design influenced the details of all the Crystal Palace's building systems. The primary role of the building was as exhibit space. The design is composed of a central transept and nave with the nave lined with a series of 24x24 foot structural bays. Second level galleries ring the bays below. The structural bays served to organize the 13,973 exhibitors. Red banners with white letters indicating specific exhibits hung from the 24-foot girders facing the nave. The nave, galleries and transept served as circulation corridors for the exhibition's more than 6 million visitors. The large expanse of overhead glass that serves as the enclosure and lighting systems and gives the building its nickname allowed excessive amounts of light and heat into the exhibit spaces. While this arrangement served Paxton well in the Chatsworth greenhouse, the crowded exhibit spaces quickly overheated. The decision to use extensive amounts of glass without regard to the comfort of the occupants is the primary failure of the design. To remedy this flaw the troughs of the roof system were retrofitted with canvas covers draped between the roof ridges. These tarps shaded the glass and visitors, reduced solar gain and provided a more diffuse and tolerable light in the galleries. The tarps had a seam down the middle to allow rapid water drainage into the Paxton gutters. The secondary role of the building was as exhibit. Initially there was much skepticism, doubt and criticism surrounding the design of the Crystal Palace and the Great Exhibition. But as the building rapidly grew above the trees of Hyde Park the press and public rallied around the glistening structure. The public began to realize and take great pride that the building was revolutionary in every way. The building handily satisfied its most demanding design criteria. It was large enough to accommodate tens of thousands of exhibitors and visitors. It fostered the orderly display of exhibits. It could be manufactured and assembled in a timely manner. It clearly demonstrated the nation's industrial and manufacturing competence. The Crystal Palace was a departure from the past and a vision of the future. The Crystal Palace was designed, manufactured and assembled in less than one year. This feat was made possible by manufacturing technology that should still be considered state of the art. The building is an integrated system or kit of parts, where each part serves multiple functions. The columns support the girders and act as downspouts for the gutters. The Paxton gutters shed rain-water and support the roof gables. The glass roof panels are both building enclosure and lighting system. Each part is machine manufactured by skilled labor under controlled conditions that insure accuracy and high quality. Once a machine or technique is devised to make a specific part, the part can be made very quickly and in large quantity. Parts were delivered to the site and erected as quickly as they were manufactured. The small size and light weight of each part made erection easy. Each structural bay is self-supporting so the 2,000 unskilled workers could assemble parts without waiting for whole systems to be in place. In contrast, other buildings of the time were made of stone and required years to construct. The stone was cut from quarries and transported at great expense to the building site, where skilled masons custom fit one block to another. It is important to study this building in context, as it is evidence of a fundamental cultural shift. As compared to the artifacts in the exhibition the Crystal Palace building has a clean sparse appearance with a minimum of ornament. With the exception of girder connection covers there are few components that could be considered ornamental. The real difference lies in the method of manufacture and the value placed on the resulting product by society. Once Londoners saw the efficiency and grace of its construction the Crystal Palace became a focus of pride. The design marks a cultural shift in values from garish hand made ornament of the past to the clean, streamlined machine-made products of the future. The Crystal Palace is more closely related to buildings of the next century, such as Ludwig Mies van der Rohe's 1929 Barcelona Pavilion and Phillip Johnson's 1949 Glass House, than with its monolithic, masonry contemporaries.


Three different types of 3-dimensional computer models are being created for the project. Working from original drawings, a high detail model of each unique building component is constructed in FormZ software. The components are assembled to illustrate the relationship of parts within larger sections of the building. Rendered images of these assemblies and the computer model itself is delivered through the Internet. Rendered orthographic views of the component assemblies are used as texture or image maps on a low detail VRML (virtual reality modeling language) model of the entire building. The VRML model can be viewed through a web browser. Given the state-of-the-art of photography in 1851 and the limits it imposes on our ability to understand and appreciate the lighting in the Crystal Palace our third model is a realistic lighting simulation. Radiance software, running on Supercomputers, enables us to render images and animations of the building's lighting as it existed in 1851. Test images have been rendered on a SGI Power Onyx at the San Diego Supercomputer Center (SDSC) and on SUN workstations at IATH. Additional images and animations will be rendered on a 64-Processor SUN HPC 10000 Supercomputer at SDSC.


Architects, historians and students can gain a better understanding of the building and its significance by studying the images and models we have created. By freely providing the computer models, variations of the building's structure, lighting and design can be tested and used as a tool for teaching. Exploring the use of realistic lighting simulations (both quantitative and qualitative) on Supercomputers provides not only a new set of tools and methods for the humanities researcher but paves the road for interaction and collaboration with computer scientists.


"Modeling the Crystal Palace" <http://www.iath.virginia.edu/london/model/>
"Monuments and Dust" <http://www.iath.virginia.edu%2Flondon/>
The Institute for Advanced Technology in the Humanities <http://www.iath.virginia.edu/>

Return to ALLC/ACH Programme