Understanding the Basics of TEI-XML
What is XML?
XML stands for Extensible Markup Language.
It is "extensible" because it is basically a language for writing other languages.
It is a markup language, like HTML, not a programming language. In other words, it is used to structure and describe data, not to do things.
It is a platform-independent and human-readable. XML files are flat text files.
It is commonly used to transport or exchange data between systems, including those that may have different internal formats.
To learn more, see the classic tutorial on XML from W3Schools.
What is TEI-XML?
TEI-XML is an implementation of XML that is designed to describe written texts. It was developed and is maintained by the Text Encoding Initiative, an international consortium of scholars. You can find the full specification for the current standard (P5) on the TEI website. We will be using a subset of the elements available in that standard.
How is TEI-XML structured?
The following explanation draws examples from the TEI standard, but, of course, everything that is said here is true of XML in general, as TEI-XML is an implementation of XML.
TEI-XML is made up of elements that appear in pairs. The opening element and the closing element are identical, except that the closing element begins with a forward slash mark. For example:
<name>Bob the Builder</name>
Some elements do not contain data, but simple rather mark a spot in the text, so we refer to these as "empty elements." They consist of just the opening element with the forward slash at the end (essentially the opening and closing elements combined into one). For example:
<lb/> marks the end of a line of text.
Elements can have attributes, which go after the name of the element, and are joined to their values by an equals sign (=). The value of an attribute must appear in double quotation marks. For example:
<pb n="1"/> is the element with which we begin a page, and the value of the attribute n (short for number) indicates that this is page 1.
We can add comments to an XML file in the following format:
<!-- some text here -->
Such comments are visible when we view the XML file in an XML editor, but do not appear when viewing the XML file in a web browser. In other words, these are internal notes not available to the casual viewer of the file.
The high-level element in a TEI-XML file is <TEI>, which contains two main items:
- <teiHeader>, the TEI Header, which contains metadata about the file itself and the document being edited, and
- <text>, which contains the text itself.
The basic structure of the TEI-XML file, then, is as follows:
<TEI> <!-- opens the TEI file -->
<teiHeader> <!-- opens the TEI Header -->
</teiHeader> <!-- closes the TEI Header -->
<text> <!-- opens the text -->
</text> <!-- closes the text -->
</TEI> <!-- closes the TEI file -->