This is, essentially, a bash wrapper around a custom pandoc writer and template, a simple regular expression (using sed
), and an XSL script. It will convert a markdown file to an XML file conforming to the TEI Lite standard. Issues / pull requests, welcomed.
In order to run, this script depends on:
- pandoc
sed
xsltproc
For now, this script recognizes a limited subset of elements for a TEI header. These are all essentially translated into fields in the tei-lite.template
file using the pandoc template system. (Links below will take one to the documentation for TEI Lite.) The fields currently implemented privilege metadata related to document transcription---they provide fields, therefore, for author/title of the electronic file as well as fields for a bibliographic citation of its source, a list of editors, and information about sources.
Currently, it requires only:
- title: A title for the document. (For the titleStmt.)
- author: at least one author. Each author's name is stored as two variables:
forename
andsurname
. titleStmt.)
Additionally, it also recognizes the optional fields:
- editor: One or more "editors."
- publicationStmt: Some prose describing the publication/distribution, contained in the
publicationStmt
. If nopublicationStmt
is provided, the template inserts simply, "Generated by pandoc.
The following (optional) fields are all stored as part of a bibliographic entry (bibl) under the source description (sourceDesc).
- citation.title: Stored as
<title level='a'>
, that is, as an analytic title. - citation.container-title: For works (essays, articles, etc) which originally appeared as part of a larger work,
container-title
contains the name of the larger work. It is stored in the TEI header as<title>
. - citation.date: A date, presumably of publication. Format is not specified.
- citation.publisher: A publisher.
- citation.publisher-place: Place of publication, stored as pubPlace.
- citation.page: A page range, stored as biblScope.
Any sources used for a document or transcription can be described as one or more source
s. These will be stored in a list.
Finally, one can describe the source for a document in unstructured prose in the citation.note
field, which is converted to a <p>
under the sourceDesc.
Additional metadata fields in the YAML header will simply be ignored. There is currently validation done on the header, so invalid field names or other problems will simply be passed over (unless they generate a YAML error). In principle, anything possibile in a TEI Lite header should be capable of being represented in YAML.