This page briefly describes the technical background of this website. We will give an introduction into the techniques we have been applying and their wider possibilities, followed by some details of the procedures followed.
Web pages are basically built using a markup language called HTML, the Hypertext Markup Language. HTML provides the framework for webpages, in conjunction other techniques or languages can be used to enhance its possibilities.
However, HTML has some important limitations:
Of course, since the inception of the World Wide Web, a lot of work has been done to take away these limitations with the use of other software.
XML, or eXtensible Markup Language, can be seen as the main ingredient in trying to overcome HTML’s limitations. It was designed with the following principles in mind:
An XML vocabulary designed for a specific terrain is technically known as a Document Type Definition (DTD). DTD’s are files with a syntax of their own, these describe the kind of markup allowed within XML files. These XML files are then said to conform to the DTD.
Recently, there has been a move away from DTD’s, which are being replaced by 'schemas'. Schemas can express even more constraints than DTD’s can, but they still have the advantage of being XML documents themselves.
XML has, in a short time, become very popular as a storage and interchange format for text and data. It is used on the World Wide Web, in conventional programming and/or in publishing environments. It has been endorsed as a recommendation by the W3C, the body which establishes WWW-standards.
XML’s capacities and popularity provide sound reasons, to use its encoding techniques in building text corpora for the web.
As the name implies, XML was designed to be extensible. Such an extension has been provided by the Text Encoding Initiative, which has built an XML vocabulary for text markup in the humanities. The vocabulary focuses on encoding structural, interpretative and grammatical features of widely disparate text types. For example, it is used in dictionaries, novels, and poetry, to name a few. Originally developed for SGML, the vocabulary has been used since the beginning of the 1990’s in a wide range of European and American Universities and libraries.
Though from its inception TEI was primarily oriented towards capturing text features, the Guidelines also fully allow for describing and indexing image material. However, for the present time,there are no schema’s available for TEI encoding.
After encoding a text using XML, something more is needed to make the results available to a reader. After all, the raison d’être of XML is the desire to separate content and form. A way to transform the XML document into a web page(or any other kind of document), is clearly needed.
Though there are many ways to accomplish this, the easiest way is to use XSLT, or eXtensible Stylesheet Language Transformations. Using XSLT (itself an XML-format) rules may be specified which define transformations to be applied to the contents of an XML document. The result is that a transformation is either a new XML document, a plain text file or an HTML document which is suitable for viewing over the web.
The word 'stylesheet' may suggest that XSLT is something like the Cascading Stylesheet Language used to define styles for web documents. However, XSLT is much more powerful than that. It is a complete programming language which allows for counting, sorting, selecting and changing any part of the XML document.
There are several ways to configure this transformation process. The XML document may be sent to the web page viewer; the transformation into HTML can then be done by the browser program, which applies an XSLT-stylesheet. Another possible configuration, is that the web server will load the XML file and transform it into HTML when the viewer chooses a certain page. In the simplest configuration the transformations will be run only once, resulting in HTML files which may then be stored and handled like any other HTML file.
These configurations have their advantages and disadvantages. Transforming XML in the browser will only work if the user has a browser installed which can handle XML and XSLT stylesheets. At present there are no browsers which have out-of-the-box correct XSLT handling. Dynamic transformation on the server, upon request, presupposes software installed on the server. Static, one time only, transformation has less flexible results than one might be looking for. In this project, we are working with software (Cocoon) that transforms our XML-data ’on the ly’, meaning that our server provides dynamic generated HTML-files for any visitor on our site.
At the Emblem Project Utrecht, we have decided to use XML and the TEI vocabulary to encode our editions of the emblem books. The HTML-pages are generated using Cocoon, the search option of this site runs on Lucene.
All EPU-files can be found elsewhere on this site, see the option 'Project' in the top menu.
© This work is licensed under a Creative Commons License. |