際際滷

際際滷Share a Scribd company logo
A spot of TEI
Hugh Cayless, NYU
philomousos@gmail.com
follow me on Twitter: @hcayless

February 4th, 2013
Who am I?

   Ph.D. in Classics, M.S. in Information Science

   Worked as a software engineer for the last 12 years or so

   the last 4 have been for NYU doing Digital Classics and similar
    cultural heritage digital access projects

   recently elected to the TEI Technical Council.

   One of the founders of EpiDoc, a TEI-based standard for encoding
    ancient inscriptions (and now papyri too).
What am I talking about?



   How we use TEI/XML in projects

   Why TEI?

   Current projects
Integrating Digital Papryology

   Uni鍖cation of several long-running projects:

       Duke Databank of Documentary Papyri (DDbDP)
       Heidelberg Gesamtverzeichnis (directory of Greek documentary
        papyri  HGV)
       Advanced Papyrological Information System (APIS)
       Bibliographie Papyrologique
       Trismegistos
State of play at the beginning

   DDbDP: TEI SGML 鍖les

   HGV: Filemaker Pro database + web interface

   APIS: idiosyncratic text-based catalog + images + web interface

   BP: database only, published annually in print/on disk

   TM: database + web interface

       TM is a going concern, working with IDP, but with no plans to be
        subsumed by it
What we did

   DDbDP: converted TEI SGML to EpiDoc (TEI) XML

   HGV: converted to EpiDoc XML

   APIS: converted to EpiDoc XML

   BP: converted to TEI <bibl> fragments

   TM: inserted TM ids into IDP documents, generated linkages to TM
    site
Structure

   The core of the system is just TEI 鍖les in a Git repository.

   These are transformed, using XSLT, into RDF, HTML, plain text, and
    add documents for our search index.

   They are pulled into an editing work鍖ow system as needed, which
    allows editing the 鍖les using a web form or (for texts) a non-XML
    syntax based on papyrological/epigraphic editing conventions.

   An automated process syncs data from the editors repo and a Github
    repo, and publishes them to the site.
Or, visually
                             Automated Document Sync




                                                             Canonical
                                                             Git Repo
                                  Numbers      Papyri.info
                                   Server                                                            Editor
                    Search                      Git Repo
                                                                                                    Database
      Github Repo
                    Engine
                                                                                        Git Repos



                                                                            Leiden+
                                                                           Conversion
                                                                              API
                    search        Navigator         SPARQL
       Github                                                                                        Editor
                     API          Interface           API

                                                                         XSLT API
So why TEI?

   Lots of reasons:

       Granular control over records

       Attribution

       Multiple outputs

       Mixture of controlled and free-form data

       Relatively easy to obtain / create tools

       Engaged and responsive community
What Im working on now


   Fixing the TEI Pointer spec

   Annotation of documents to mark things like personal and place
    names

   Linguistic annotation

   Linking text and image
Some examples


   http://papyri.info/ddbdp/cpr;8;72

       鍖ne-grained attribution / version control (click on Editorial
        History) and Detailed at the bottom of the text)

   http://papyri.info/ddbdp/c.ep.lat;;218

       Whats going on underneath?
Beginning of a letter marked up according to the Leiden Conventions


 r
 味[味味味]味c味[味味 Aelio Fel]ici plu畊[imam]
 畊a此[lutem]
 opto deos 揃 ut mi[hi v]畉leas 揃 quod 畊畉[um votum est]
 ego enim 揃 vale畛 co畊p畛畊e 味味味[ -ca.?- ]
 te non vide畛 rog味畛 畊e 揃 fac味畛a畊 [ -ca.?- ]
 f味味味味味味味味味味[- ca.9 -]uma味[ -ca.?- ]
 味味味味味味味味味味 [ -ca.?- ]
 v
 Aelio Felici
The same letter, diplomatic(ish) edition


 r
 味[味味味]味C味[味味ca.9 ]ICIPLU畊[. . . .]
 畊≡[. . . . .]
 OPTODEOS 揃 UTMI[. . . ]畉LEAS 揃 QUOD畊畉[ ca.10 ]
 EGOENIM 揃 VALE畛CO畊P畛畊E 味味味[ -ca.?- ]
 TENONVIDE畛ROG味畛畊E 揃 FAC味畛A畊 [ -ca.?- ]
 F味味味味味味味味味味[- ca.9 -]UMA味[ -ca.?- ]
 味味味味味味味味味味 [ -ca.?- ]
 v
 AELIO FELICI
The same letter marked up in EpiDoc (TEI) XML

<div xml:lang="la" type="edition" xml:space="preserve"><div n="r" type="textpart"><!--
milestone unit="4"--><ab>
<lb n="1"/><gap reason="illegible" quantity="1" unit="character"/><gap reason="lost"
quantity="3" unit="character"/><gap reason="illegible" quantity="1" unit="character"/>c<gap
reason="illegible" quantity="1" unit="character"/><gap reason="lost" quantity="2"
unit="character"/><supplied reason="lost"> Aelio Fel</supplied>ici plu<unclear>r</
unclear><supplied reason="lost">imam</supplied>
<lb n="2"/><unclear>sa</unclear><supplied reason="lost">lutem</supplied>
<lb n="3"/>opto deos <g type="middot"/> ut mi<supplied reason="lost">hi v</
supplied><unclear>a</unclear>leas <g type="middot"/> quod <unclear>me</unclear><supplied
reason="lost">um votum est</supplied>
<lb n="4"/>ego enim <g type="middot"/> vale<unclear>o</unclear> co<unclear>r</
unclear>p<unclear>or</unclear>e <gap reason="illegible" quantity="3" unit="character"/><gap
reason="lost" extent="unknown" unit="character"/>
<lb n="5"/>te non vide<unclear>o</unclear> ro<unclear>go</unclear> <unclear>n</unclear>e <g
type="middot"/> fa<unclear>ci</unclear>a<unclear>s</unclear> <gap reason="lost"
extent="unknown" unit="character"/>
<lb n="6"/>f<gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible"
quantity="4" unit="character"/><gap reason="illegible" quantity="2" unit="character"/><gap
reason="lost" quantity="9" unit="character"/>uma<gap reason="illegible" quantity="1"
unit="character"/><gap reason="lost" extent="unknown" unit="character"/>
<lb n="7"/><gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible"
quantity="4" unit="character"/><gap reason="illegible" quantity="2" unit="character"/> <gap
reason="lost" extent="unknown" unit="character"/>
</ab></div><div n="v" type="textpart"><!--milestone unit="4"--><ab>
<lb n="1"/>Aelio Felici </ab></div></div>
The same letter, visualization of the tree structure of the XML
   What is the text and what is the markup?

   There is no text, only readings. EpiDoc allows you to
    produce models of readings.

   Slicing the text up into bits isnt adulterating it, it just
    adds hooks for transforming the text in useful ways.
How to get involved

   Mailing list: TEI-L@LISTSERV.BROWN.EDU
       http://listserv.brown.edu/archives/cgi-bin/wa?SUBED1=tei-l&A=1
       http://listserv.brown.edu/archives/cgi-bin/wa?A0=tei-l

   TEI Sourceforge:
       Report a bug:
           http://sourceforge.net/tracker/?func=add&group_id=106328&atid=644062
       Make a feature request:
           http://sourceforge.net/tracker/?func=add&group_id=106328&atid=644065

   IRC: #tei-c on http://freenode.net/

More Related Content

A Spot of TEI

  • 1. A spot of TEI Hugh Cayless, NYU philomousos@gmail.com follow me on Twitter: @hcayless February 4th, 2013
  • 2. Who am I? Ph.D. in Classics, M.S. in Information Science Worked as a software engineer for the last 12 years or so the last 4 have been for NYU doing Digital Classics and similar cultural heritage digital access projects recently elected to the TEI Technical Council. One of the founders of EpiDoc, a TEI-based standard for encoding ancient inscriptions (and now papyri too).
  • 3. What am I talking about? How we use TEI/XML in projects Why TEI? Current projects
  • 4. Integrating Digital Papryology Uni鍖cation of several long-running projects: Duke Databank of Documentary Papyri (DDbDP) Heidelberg Gesamtverzeichnis (directory of Greek documentary papyri HGV) Advanced Papyrological Information System (APIS) Bibliographie Papyrologique Trismegistos
  • 5. State of play at the beginning DDbDP: TEI SGML 鍖les HGV: Filemaker Pro database + web interface APIS: idiosyncratic text-based catalog + images + web interface BP: database only, published annually in print/on disk TM: database + web interface TM is a going concern, working with IDP, but with no plans to be subsumed by it
  • 6. What we did DDbDP: converted TEI SGML to EpiDoc (TEI) XML HGV: converted to EpiDoc XML APIS: converted to EpiDoc XML BP: converted to TEI <bibl> fragments TM: inserted TM ids into IDP documents, generated linkages to TM site
  • 7. Structure The core of the system is just TEI 鍖les in a Git repository. These are transformed, using XSLT, into RDF, HTML, plain text, and add documents for our search index. They are pulled into an editing work鍖ow system as needed, which allows editing the 鍖les using a web form or (for texts) a non-XML syntax based on papyrological/epigraphic editing conventions. An automated process syncs data from the editors repo and a Github repo, and publishes them to the site.
  • 8. Or, visually Automated Document Sync Canonical Git Repo Numbers Papyri.info Server Editor Search Git Repo Database Github Repo Engine Git Repos Leiden+ Conversion API search Navigator SPARQL Github Editor API Interface API XSLT API
  • 9. So why TEI? Lots of reasons: Granular control over records Attribution Multiple outputs Mixture of controlled and free-form data Relatively easy to obtain / create tools Engaged and responsive community
  • 10. What Im working on now Fixing the TEI Pointer spec Annotation of documents to mark things like personal and place names Linguistic annotation Linking text and image
  • 11. Some examples http://papyri.info/ddbdp/cpr;8;72 鍖ne-grained attribution / version control (click on Editorial History) and Detailed at the bottom of the text) http://papyri.info/ddbdp/c.ep.lat;;218 Whats going on underneath?
  • 12. Beginning of a letter marked up according to the Leiden Conventions r 味[味味味]味c味[味味 Aelio Fel]ici plu畊[imam] 畊a此[lutem] opto deos 揃 ut mi[hi v]畉leas 揃 quod 畊畉[um votum est] ego enim 揃 vale畛 co畊p畛畊e 味味味[ -ca.?- ] te non vide畛 rog味畛 畊e 揃 fac味畛a畊 [ -ca.?- ] f味味味味味味味味味味[- ca.9 -]uma味[ -ca.?- ] 味味味味味味味味味味 [ -ca.?- ] v Aelio Felici
  • 13. The same letter, diplomatic(ish) edition r 味[味味味]味C味[味味ca.9 ]ICIPLU畊[. . . .] 畊≡[. . . . .] OPTODEOS 揃 UTMI[. . . ]畉LEAS 揃 QUOD畊畉[ ca.10 ] EGOENIM 揃 VALE畛CO畊P畛畊E 味味味[ -ca.?- ] TENONVIDE畛ROG味畛畊E 揃 FAC味畛A畊 [ -ca.?- ] F味味味味味味味味味味[- ca.9 -]UMA味[ -ca.?- ] 味味味味味味味味味味 [ -ca.?- ] v AELIO FELICI
  • 14. The same letter marked up in EpiDoc (TEI) XML <div xml:lang="la" type="edition" xml:space="preserve"><div n="r" type="textpart"><!-- milestone unit="4"--><ab> <lb n="1"/><gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" quantity="3" unit="character"/><gap reason="illegible" quantity="1" unit="character"/>c<gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" quantity="2" unit="character"/><supplied reason="lost"> Aelio Fel</supplied>ici plu<unclear>r</ unclear><supplied reason="lost">imam</supplied> <lb n="2"/><unclear>sa</unclear><supplied reason="lost">lutem</supplied> <lb n="3"/>opto deos <g type="middot"/> ut mi<supplied reason="lost">hi v</ supplied><unclear>a</unclear>leas <g type="middot"/> quod <unclear>me</unclear><supplied reason="lost">um votum est</supplied> <lb n="4"/>ego enim <g type="middot"/> vale<unclear>o</unclear> co<unclear>r</ unclear>p<unclear>or</unclear>e <gap reason="illegible" quantity="3" unit="character"/><gap reason="lost" extent="unknown" unit="character"/> <lb n="5"/>te non vide<unclear>o</unclear> ro<unclear>go</unclear> <unclear>n</unclear>e <g type="middot"/> fa<unclear>ci</unclear>a<unclear>s</unclear> <gap reason="lost" extent="unknown" unit="character"/> <lb n="6"/>f<gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="2" unit="character"/><gap reason="lost" quantity="9" unit="character"/>uma<gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" extent="unknown" unit="character"/> <lb n="7"/><gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="2" unit="character"/> <gap reason="lost" extent="unknown" unit="character"/> </ab></div><div n="v" type="textpart"><!--milestone unit="4"--><ab> <lb n="1"/>Aelio Felici </ab></div></div>
  • 15. The same letter, visualization of the tree structure of the XML
  • 16. What is the text and what is the markup? There is no text, only readings. EpiDoc allows you to produce models of readings. Slicing the text up into bits isnt adulterating it, it just adds hooks for transforming the text in useful ways.
  • 17. How to get involved Mailing list: TEI-L@LISTSERV.BROWN.EDU http://listserv.brown.edu/archives/cgi-bin/wa?SUBED1=tei-l&A=1 http://listserv.brown.edu/archives/cgi-bin/wa?A0=tei-l TEI Sourceforge: Report a bug: http://sourceforge.net/tracker/?func=add&group_id=106328&atid=644062 Make a feature request: http://sourceforge.net/tracker/?func=add&group_id=106328&atid=644065 IRC: #tei-c on http://freenode.net/