This paper was presented at the COST Digital Humanities Conference: Reassembling the Republic of Letters. My brief was to give a general, introductory discussion of the history, limits and future of encoded text, particularly XML and particularly the Text Encoding Initiative.
Convert to study materialsBETA
Transform any presentation into ready-made study materialselect from outputs like summaries, definitions, and practice questions.
1 of 35
Downloaded 12 times
More Related Content
Reborn Digital: coding text
1. Reborn Digital: coding text
Pip Willcox
Curator of Digital Special Collections
Bodleian Libraries, University of Oxford
@pipwillcox
Bodleian Libraries
UNIVERSITY OF OXFORD
COST Digital Humanities Conference: Reassembling the Republic of Letters
2223 March 2015, University of Oxford
http://www.slideshare.net/PipWillcox/reborn-digital-coding-text
2. Republics of Letters
Quod feliciter vortat academici
Oxoniens bibliothecam hanc
vobis reipublicaeque
literatorum T.B.P.
Thomas Bodley has built
this library for you and for
the Republic of the Letters.
May the gift turn out well.
Bodleian Libraries
UNIVERSITY OF OXFORD
Photo:PipWillcox
3. The many forms of digital text
Metadata Early Modern Letters Online
Image Early English Books Online (EEBO)
Optical Character Recognition (OCR) Google Books
Handwritten Character Recognition (HCR) Transcribe Bentham
Transcribed EEBO Text Creation Partnership (EEBO-TCP)
Encoded Shakespeare QuartosArchive
Edited Digital Renaissance Editions
Digital print Oxford Scholarly Editions Online
Bodleian Libraries
UNIVERSITY OF OXFORD
4. The many forms of digital text
Bodleian Libraries
UNIVERSITY OF OXFORD
Publisher-led editions
Library-led editions
Academic-led editions
Social editions
5. The many forms of digital text
Bodleian Libraries
UNIVERSITY OF OXFORD
Publisher-led editions
Library-led editions
Academic-led editions
Social editions
Licensedforreuse
Freelyavailable
Subscription
Private
6. The many forms of digital text
Bodleian Libraries
UNIVERSITY OF OXFORD
25
Licensedforreuse
Freelyavailable
Subscription
Private
D
iscoverable
C
itable
Reusable
Sustainable
7. The many forms of digital text
Bodleian Libraries
UNIVERSITY OF OXFORD
25
Licensedforreuse
Freelyavailable
Subscription
Private
D
iscoverable
C
itable
Reusable
Sustainable
Provenance
Conditions of re-use
Editorial principles
8. A鍖ordances of digital text
Bodleian Libraries
UNIVERSITY OF OXFORD
Read it dissemination, preservation
Free text search
Distant reading
At scale
Automated tagging, e.g. linguistic, geographic
Photo:Pip
W
illcox
9. Bodleian Libraries
UNIVERSITY OF OXFORD
A鍖ordances of hand-encoded text
First pick your Extensible Markup Language (XML):
Resource Description Framework (RDF)
Encoded Archival Description (EAD)
Text Encoding Initiative (TEI)
anything to separate your data from your interface
10. Bodleian Libraries
UNIVERSITY OF OXFORD
A鍖ordances of hand-encoded text
First pick your Extensible Markup Language (XML):
Resource Description Framework (RDF)
Encoded Archival Description (EAD)
Text Encoding Initiative (TEI)
anything to separate your data from your interface
11. A鍖ordances of XML
Bodleian Libraries
UNIVERSITY OF OXFORD
Machine-readable and human-readable(ish)
Interoperable open standard (W3C)
Extensible semantic markup
12. A鍖ordances of XML
Bodleian Libraries
UNIVERSITY OF OXFORD
Machine-readable and human-readable(ish)
Interoperable open standard (W3C)
Extensible semantic markup
Not always the answer
Not an end in itself: a research/
publication tool
Hierarchical structure
13. A鍖ordances of the Text Encoding
Initiative (TEI)
Bodleian Libraries
UNIVERSITY OF OXFORD
An XML international standard
A set of Guidelines
For encoding historical text
A community of practice:
conference, mailing list, journal,
wiki, SourceForge, toolchain
Future-proof
14. A鍖ordances of the Text Encoding
Initiative (TEI)
Bodleian Libraries
UNIVERSITY OF OXFORD
An XML international standard
A set of Guidelines
For encoding historical text
A community of practice:
conference, mailing list, journal,
wiki, SourceForge, toolchain
Future-proof(ish)
16. A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
Early English Books Online Text Creation
Partnership: books in the Short Title Catalogue
Scale:
c.130,000 metadata records and image sets
TCP Phase I: c.25,000 digital texts
TCP Phase II: c. 40,000 digital texts (and counting)
Scope: searchable, readable, marked-up, digital, full
texts
17. A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
Early English Books Online Text Creation
Partnership: books in the Short Title Catalogue
Scale:
c.130,000 metadata records and image sets
TCP Phase I: c.25,000 digital texts: available!
TCP Phase II: c. 40,000 digital texts (and counting)
Scope: searchable, readable, marked-up, digital, full
texts
18. A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
http://gateway.proquest.com/openurl?
ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:98209
19. A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
http://gateway.proquest.com/openurl?
ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:98209
20. A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
http://gateway.proquest.com/openurl?
ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:98209
21. A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
Some things we mark up:
Textual divisions, with descriptions
Opening material, e.g. arguments, salutes
Closing material, e.g. signatures, dates
Letters, lists and tables
Speakers, speeches, stage directions, quotations
Textual notes
22. A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
and some things we dont:
Non-Roman alphabets
Music
Complex mathematical material
Illegible characters
Manuscript
Damaged or missing material
23. EEBO-TCP: a buildable resource
Bodleian Libraries
UNIVERSITY OF OXFORD
Distant reading Duhaime and Zimmer, DocuScope, AdornMorph
Close reading Verse Miscellanies Online, Digital Anthology of
Early English Drama, Forms Online Renaissance to Modern
24. Early English Print in the HathiTrust
Bodleian Libraries
UNIVERSITY OF OXFORD
KevinPage&PipWillcox
Anon.Atrueandperfectdescriptionofthestrangeandwonderfulshe-elephant,sentfromtheIndies,whicharrivedat
London,August1.1683.Withthetrueportraictureofthatwonderinnature(London:1683).Ashm.H24[42].Image:
BodleianLibraries.
Coryate,Thomas,ThomasCoriatetrauellerfortheEnglishvvits:greetingFromthecourtoftheGreatMogul,
residentatthetowneofAsmere,ineasterneIndia(London:1616).ViaEEBOhttp://gateway.proquest.com/
openurl?ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:9182
Anon,AfullandtruerelationoftheelephantthatisbroughtoverintoEnglandfromtheIndies,andlandedat
London,August3d.1675.Givinglikewiseatrueaccountofthewonderful
nature,understanding,breeding,takingandtamingofelephants(London,1675).ViaEEBO:http://
gateway.proquest.com/openurl?ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:184581.
TerhiNurmikko-Fuller
25. A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
The origins: pre-1642 quartos from
JISC/NEH Transatlantic Digitization
Collaboration Grant
http://quartos.org/
26. A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Bodleian, Arch. G d.41
27. A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Bodleian, Arch. G d.41
28. A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Folger, STC 22279, copy 5
29. A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Folger, STC 22279, copy 5
<l>With all my imperfections on my head.</l>
<l><add place=margin-left hand=#af type=intervention
resp=#fol>Ham</add>Oh horrible, O horrible, most horrible,</l>
<l>If thou hast nature in thee beare it not,</l>
30. A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Folger, STC 22279, copy 5
<l>With all my imperfections on my head.</l>
<l><add place=margin-left hand=#af type=intervention
resp=#fol>Ham</add>Oh horrible, O horrible, most horrible,</l>
<l>If thou hast nature in thee beare it not,</l>
<delSpan> surrounding the original <l>
<anchor> (for the <delSpan>)
<addSpan>
closing </sp> (speech)
opening <sp>
opening <speaker> with its associated attributes
the line, in its entirety
second closing </sp>
<anchor> (for the <addSpan>)
opening <sp> (to reopen the printed speech)
opening <speaker> (to repeat the original speaker)
31. A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Nobody has ever answered yes to
Let me show you my XML
...except a computer
HeatherFroehlich
DavidDeRoure
http://鍖rstfolio.bodleian.ox.ac.uk/
32. Limitations of the Text Encoding
Initiative (TEI)
Bodleian Libraries
UNIVERSITY OF OXFORD
Time and funding
Expert editors
A learning curve
An extended subset
An XML international standard
A set of Guidelines
For encoding historical text
A community of practice:
conference, mailing list, journal,
wiki, SourceForge, toolchain
Future-proof(ish)
33. The Future, or,An Invitation to Hubris
Bodleian Libraries
UNIVERSITY OF OXFORD
DavidDeRoure
http://www.slideshare.net/davidderoure/future-of-scholarly-communications
34. The Future, or,An Invitation to Hubris
Bodleian Libraries
UNIVERSITY OF OXFORD
More connections across: texts, programs, communities
More integration between: semantic interoperability
More tools, more animation
Co-constitution
Heterogeneous actors, human and machine
Performative and social
SusanHalfordetal:
http://eprints.soton.ac.uk/271033/
35. Find out more
Bodleian Libraries
UNIVERSITY OF OXFORD
Teach Yourself TEI:
http://www.tei-c.org/Support/Learn/tutorials.xml
TEI Massive Open Online Course (MOOC) is coming
TEI Conference, 2831 October 2015, Lyon, France:
Text Encoding Initiative: connect, animate, innovate