Presentation on TEI (particularly as it relates to the Integrating Digital Papyrology project). Given at U. of South Carolina Center for Digital Humanities, 2/4/2013
1 of 17
Download to read offline
More Related Content
A Spot of TEI
1. A spot of TEI
Hugh Cayless, NYU
philomousos@gmail.com
follow me on Twitter: @hcayless
February 4th, 2013
2. Who am I?
Ph.D. in Classics, M.S. in Information Science
Worked as a software engineer for the last 12 years or so
the last 4 have been for NYU doing Digital Classics and similar
cultural heritage digital access projects
recently elected to the TEI Technical Council.
One of the founders of EpiDoc, a TEI-based standard for encoding
ancient inscriptions (and now papyri too).
3. What am I talking about?
How we use TEI/XML in projects
Why TEI?
Current projects
4. Integrating Digital Papryology
Uni鍖cation of several long-running projects:
Duke Databank of Documentary Papyri (DDbDP)
Heidelberg Gesamtverzeichnis (directory of Greek documentary
papyri HGV)
Advanced Papyrological Information System (APIS)
Bibliographie Papyrologique
Trismegistos
5. State of play at the beginning
DDbDP: TEI SGML 鍖les
HGV: Filemaker Pro database + web interface
APIS: idiosyncratic text-based catalog + images + web interface
BP: database only, published annually in print/on disk
TM: database + web interface
TM is a going concern, working with IDP, but with no plans to be
subsumed by it
6. What we did
DDbDP: converted TEI SGML to EpiDoc (TEI) XML
HGV: converted to EpiDoc XML
APIS: converted to EpiDoc XML
BP: converted to TEI <bibl> fragments
TM: inserted TM ids into IDP documents, generated linkages to TM
site
7. Structure
The core of the system is just TEI 鍖les in a Git repository.
These are transformed, using XSLT, into RDF, HTML, plain text, and
add documents for our search index.
They are pulled into an editing work鍖ow system as needed, which
allows editing the 鍖les using a web form or (for texts) a non-XML
syntax based on papyrological/epigraphic editing conventions.
An automated process syncs data from the editors repo and a Github
repo, and publishes them to the site.
8. Or, visually
Automated Document Sync
Canonical
Git Repo
Numbers Papyri.info
Server Editor
Search Git Repo
Database
Github Repo
Engine
Git Repos
Leiden+
Conversion
API
search Navigator SPARQL
Github Editor
API Interface API
XSLT API
9. So why TEI?
Lots of reasons:
Granular control over records
Attribution
Multiple outputs
Mixture of controlled and free-form data
Relatively easy to obtain / create tools
Engaged and responsive community
10. What Im working on now
Fixing the TEI Pointer spec
Annotation of documents to mark things like personal and place
names
Linguistic annotation
Linking text and image
11. Some examples
http://papyri.info/ddbdp/cpr;8;72
鍖ne-grained attribution / version control (click on Editorial
History) and Detailed at the bottom of the text)
http://papyri.info/ddbdp/c.ep.lat;;218
Whats going on underneath?
12. Beginning of a letter marked up according to the Leiden Conventions
r
味[味味味]味c味[味味 Aelio Fel]ici plu畊[imam]
畊a此[lutem]
opto deos 揃 ut mi[hi v]畉leas 揃 quod 畊畉[um votum est]
ego enim 揃 vale畛 co畊p畛畊e 味味味[ -ca.?- ]
te non vide畛 rog味畛 畊e 揃 fac味畛a畊 [ -ca.?- ]
f味味味味味味味味味味[- ca.9 -]uma味[ -ca.?- ]
味味味味味味味味味味 [ -ca.?- ]
v
Aelio Felici
16. What is the text and what is the markup?
There is no text, only readings. EpiDoc allows you to
produce models of readings.
Slicing the text up into bits isnt adulterating it, it just
adds hooks for transforming the text in useful ways.
17. How to get involved
Mailing list: TEI-L@LISTSERV.BROWN.EDU
http://listserv.brown.edu/archives/cgi-bin/wa?SUBED1=tei-l&A=1
http://listserv.brown.edu/archives/cgi-bin/wa?A0=tei-l
TEI Sourceforge:
Report a bug:
http://sourceforge.net/tracker/?func=add&group_id=106328&atid=644062
Make a feature request:
http://sourceforge.net/tracker/?func=add&group_id=106328&atid=644065
IRC: #tei-c on http://freenode.net/