際際滷

際際滷Share a Scribd company logo
Reborn Digital: coding text
Pip Willcox
Curator of Digital Special Collections
Bodleian Libraries, University of Oxford
@pipwillcox
Bodleian Libraries
UNIVERSITY OF OXFORD
COST Digital Humanities Conference: Reassembling the Republic of Letters
2223 March 2015, University of Oxford
http://www.slideshare.net/PipWillcox/reborn-digital-coding-text
Republics of Letters
Quod feliciter vortat academici
Oxoniens bibliothecam hanc
vobis reipublicaeque
literatorum T.B.P.
Thomas Bodley has built
this library for you and for
the Republic of the Letters.
May the gift turn out well.
Bodleian Libraries
UNIVERSITY OF OXFORD
Photo:PipWillcox
The many forms of digital text
 Metadata  Early Modern Letters Online
 Image  Early English Books Online (EEBO)
 Optical Character Recognition (OCR)  Google Books
 Handwritten Character Recognition (HCR)  Transcribe Bentham
 Transcribed  EEBO Text Creation Partnership (EEBO-TCP)
 Encoded  Shakespeare QuartosArchive
 Edited  Digital Renaissance Editions
 Digital print  Oxford Scholarly Editions Online
Bodleian Libraries
UNIVERSITY OF OXFORD
The many forms of digital text
Bodleian Libraries
UNIVERSITY OF OXFORD
 Publisher-led editions
 Library-led editions
 Academic-led editions
 Social editions
The many forms of digital text
Bodleian Libraries
UNIVERSITY OF OXFORD
 Publisher-led editions
 Library-led editions
 Academic-led editions
 Social editions
Licensedforreuse
Freelyavailable
Subscription
Private
The many forms of digital text
Bodleian Libraries
UNIVERSITY OF OXFORD
 25
Licensedforreuse
Freelyavailable
Subscription
Private
D
iscoverable
C
itable
Reusable
Sustainable
The many forms of digital text
Bodleian Libraries
UNIVERSITY OF OXFORD
 25
Licensedforreuse
Freelyavailable
Subscription
Private
D
iscoverable
C
itable
Reusable
Sustainable
 Provenance
 Conditions of re-use
 Editorial principles
A鍖ordances of digital text
Bodleian Libraries
UNIVERSITY OF OXFORD
 Read it  dissemination, preservation
 Free text search
 Distant reading
 At scale
 Automated tagging, e.g. linguistic, geographic
Photo:Pip
W
illcox
Bodleian Libraries
UNIVERSITY OF OXFORD
A鍖ordances of hand-encoded text
 First pick your Extensible Markup Language (XML):
 Resource Description Framework (RDF)
 Encoded Archival Description (EAD)
 Text Encoding Initiative (TEI)
 anything to separate your data from your interface
Bodleian Libraries
UNIVERSITY OF OXFORD
A鍖ordances of hand-encoded text
 First pick your Extensible Markup Language (XML):
 Resource Description Framework (RDF)
 Encoded Archival Description (EAD)
 Text Encoding Initiative (TEI)
 anything to separate your data from your interface
A鍖ordances of XML
Bodleian Libraries
UNIVERSITY OF OXFORD
 Machine-readable and human-readable(ish)
 Interoperable open standard (W3C)
 Extensible semantic markup
A鍖ordances of XML
Bodleian Libraries
UNIVERSITY OF OXFORD
 Machine-readable and human-readable(ish)
 Interoperable open standard (W3C)
 Extensible semantic markup
 Not always the answer
 Not an end in itself: a research/
publication tool
 Hierarchical structure
A鍖ordances of the Text Encoding
Initiative (TEI)
Bodleian Libraries
UNIVERSITY OF OXFORD
 An XML international standard
 A set of Guidelines
 For encoding historical text
 A community of practice:
 conference, mailing list, journal,
wiki, SourceForge, toolchain
 Future-proof
A鍖ordances of the Text Encoding
Initiative (TEI)
Bodleian Libraries
UNIVERSITY OF OXFORD
 An XML international standard
 A set of Guidelines
 For encoding historical text
 A community of practice:
 conference, mailing list, journal,
wiki, SourceForge, toolchain
 Future-proof(ish)
Future-proof
Bodleian Libraries
UNIVERSITY OF OXFORD
http://hqdesktop.net/wallpapers/l/1440x900/44/twitter_advertisement_website_portuguese_skype_old_fashion_1440x900_43938.jpg
A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
 Early English Books Online Text Creation
Partnership: books in the Short Title Catalogue
 Scale:
 c.130,000 metadata records and image sets
 TCP Phase I: c.25,000 digital texts
 TCP Phase II: c. 40,000 digital texts (and counting)
 Scope: searchable, readable, marked-up, digital, full
texts
A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
 Early English Books Online Text Creation
Partnership: books in the Short Title Catalogue
 Scale:
 c.130,000 metadata records and image sets
 TCP Phase I: c.25,000 digital texts: available!
 TCP Phase II: c. 40,000 digital texts (and counting)
 Scope: searchable, readable, marked-up, digital, full
texts
A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
http://gateway.proquest.com/openurl?
ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:98209
A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
http://gateway.proquest.com/openurl?
ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:98209
A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
http://gateway.proquest.com/openurl?
ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:98209
A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
 Some things we mark up:
 Textual divisions, with descriptions
 Opening material, e.g. arguments, salutes
 Closing material, e.g. signatures, dates
 Letters, lists and tables
 Speakers, speeches, stage directions, quotations
 Textual notes
A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
 and some things we dont:
 Non-Roman alphabets
 Music
 Complex mathematical material
 Illegible characters
 Manuscript
 Damaged or missing material
EEBO-TCP: a buildable resource
Bodleian Libraries
UNIVERSITY OF OXFORD
Distant reading  Duhaime and Zimmer, DocuScope, AdornMorph
Close reading  Verse Miscellanies Online, Digital Anthology of
Early English Drama, Forms Online Renaissance to Modern
Early English Print in the HathiTrust
Bodleian Libraries
UNIVERSITY OF OXFORD
KevinPage&PipWillcox
Anon.Atrueandperfectdescriptionofthestrangeandwonderfulshe-elephant,sentfromtheIndies,whicharrivedat
London,August1.1683.Withthetrueportraictureofthatwonderinnature(London:1683).Ashm.H24[42].Image:
BodleianLibraries.
Coryate,Thomas,ThomasCoriatetrauellerfortheEnglishvvits:greetingFromthecourtoftheGreatMogul,
residentatthetowneofAsmere,ineasterneIndia(London:1616).ViaEEBOhttp://gateway.proquest.com/
openurl?ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:9182
Anon,AfullandtruerelationoftheelephantthatisbroughtoverintoEnglandfromtheIndies,andlandedat
London,August3d.1675.Givinglikewiseatrueaccountofthewonderful
nature,understanding,breeding,takingandtamingofelephants(London,1675).ViaEEBO:http://
gateway.proquest.com/openurl?ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:184581.
TerhiNurmikko-Fuller
A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
 The origins: pre-1642 quartos from
JISC/NEH Transatlantic Digitization
Collaboration Grant
http://quartos.org/
A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Bodleian, Arch. G d.41
A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Bodleian, Arch. G d.41
A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Folger, STC 22279, copy 5
A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Folger, STC 22279, copy 5
<l>With all my imperfections on my head.</l>
<l><add place=margin-left hand=#af type=intervention
resp=#fol>Ham</add>Oh horrible, O horrible, most horrible,</l>
<l>If thou hast nature in thee beare it not,</l>
A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Folger, STC 22279, copy 5
<l>With all my imperfections on my head.</l>
<l><add place=margin-left hand=#af type=intervention
resp=#fol>Ham</add>Oh horrible, O horrible, most horrible,</l>
<l>If thou hast nature in thee beare it not,</l>
<delSpan> surrounding the original <l>
<anchor> (for the <delSpan>)
<addSpan>
closing </sp> (speech)
opening <sp>
opening <speaker> with its associated attributes
the line, in its entirety
second closing </sp>
<anchor> (for the <addSpan>)
opening <sp> (to reopen the printed speech)
opening <speaker> (to repeat the original speaker)
A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Nobody has ever answered yes to
Let me show you my XML
...except a computer
HeatherFroehlich
DavidDeRoure
http://鍖rstfolio.bodleian.ox.ac.uk/
Limitations of the Text Encoding
Initiative (TEI)
Bodleian Libraries
UNIVERSITY OF OXFORD
 Time and funding
 Expert editors
 A learning curve
 An extended subset
 An XML international standard
 A set of Guidelines
 For encoding historical text
 A community of practice:
 conference, mailing list, journal,
wiki, SourceForge, toolchain
 Future-proof(ish)
The Future, or,An Invitation to Hubris
Bodleian Libraries
UNIVERSITY OF OXFORD
DavidDeRoure
http://www.slideshare.net/davidderoure/future-of-scholarly-communications
The Future, or,An Invitation to Hubris
Bodleian Libraries
UNIVERSITY OF OXFORD
 More connections across: texts, programs, communities
 More integration between: semantic interoperability
 More tools, more animation
 Co-constitution
 Heterogeneous actors, human and machine
 Performative and social
SusanHalfordetal:
http://eprints.soton.ac.uk/271033/
Find out more
Bodleian Libraries
UNIVERSITY OF OXFORD
 Teach Yourself TEI:
http://www.tei-c.org/Support/Learn/tutorials.xml
 TEI Massive Open Online Course (MOOC) is coming
 TEI Conference, 2831 October 2015, Lyon, France:
Text Encoding Initiative: connect, animate, innovate

More Related Content

Reborn Digital: coding text

  • 1. Reborn Digital: coding text Pip Willcox Curator of Digital Special Collections Bodleian Libraries, University of Oxford @pipwillcox Bodleian Libraries UNIVERSITY OF OXFORD COST Digital Humanities Conference: Reassembling the Republic of Letters 2223 March 2015, University of Oxford http://www.slideshare.net/PipWillcox/reborn-digital-coding-text
  • 2. Republics of Letters Quod feliciter vortat academici Oxoniens bibliothecam hanc vobis reipublicaeque literatorum T.B.P. Thomas Bodley has built this library for you and for the Republic of the Letters. May the gift turn out well. Bodleian Libraries UNIVERSITY OF OXFORD Photo:PipWillcox
  • 3. The many forms of digital text Metadata Early Modern Letters Online Image Early English Books Online (EEBO) Optical Character Recognition (OCR) Google Books Handwritten Character Recognition (HCR) Transcribe Bentham Transcribed EEBO Text Creation Partnership (EEBO-TCP) Encoded Shakespeare QuartosArchive Edited Digital Renaissance Editions Digital print Oxford Scholarly Editions Online Bodleian Libraries UNIVERSITY OF OXFORD
  • 4. The many forms of digital text Bodleian Libraries UNIVERSITY OF OXFORD Publisher-led editions Library-led editions Academic-led editions Social editions
  • 5. The many forms of digital text Bodleian Libraries UNIVERSITY OF OXFORD Publisher-led editions Library-led editions Academic-led editions Social editions Licensedforreuse Freelyavailable Subscription Private
  • 6. The many forms of digital text Bodleian Libraries UNIVERSITY OF OXFORD 25 Licensedforreuse Freelyavailable Subscription Private D iscoverable C itable Reusable Sustainable
  • 7. The many forms of digital text Bodleian Libraries UNIVERSITY OF OXFORD 25 Licensedforreuse Freelyavailable Subscription Private D iscoverable C itable Reusable Sustainable Provenance Conditions of re-use Editorial principles
  • 8. A鍖ordances of digital text Bodleian Libraries UNIVERSITY OF OXFORD Read it dissemination, preservation Free text search Distant reading At scale Automated tagging, e.g. linguistic, geographic Photo:Pip W illcox
  • 9. Bodleian Libraries UNIVERSITY OF OXFORD A鍖ordances of hand-encoded text First pick your Extensible Markup Language (XML): Resource Description Framework (RDF) Encoded Archival Description (EAD) Text Encoding Initiative (TEI) anything to separate your data from your interface
  • 10. Bodleian Libraries UNIVERSITY OF OXFORD A鍖ordances of hand-encoded text First pick your Extensible Markup Language (XML): Resource Description Framework (RDF) Encoded Archival Description (EAD) Text Encoding Initiative (TEI) anything to separate your data from your interface
  • 11. A鍖ordances of XML Bodleian Libraries UNIVERSITY OF OXFORD Machine-readable and human-readable(ish) Interoperable open standard (W3C) Extensible semantic markup
  • 12. A鍖ordances of XML Bodleian Libraries UNIVERSITY OF OXFORD Machine-readable and human-readable(ish) Interoperable open standard (W3C) Extensible semantic markup Not always the answer Not an end in itself: a research/ publication tool Hierarchical structure
  • 13. A鍖ordances of the Text Encoding Initiative (TEI) Bodleian Libraries UNIVERSITY OF OXFORD An XML international standard A set of Guidelines For encoding historical text A community of practice: conference, mailing list, journal, wiki, SourceForge, toolchain Future-proof
  • 14. A鍖ordances of the Text Encoding Initiative (TEI) Bodleian Libraries UNIVERSITY OF OXFORD An XML international standard A set of Guidelines For encoding historical text A community of practice: conference, mailing list, journal, wiki, SourceForge, toolchain Future-proof(ish)
  • 15. Future-proof Bodleian Libraries UNIVERSITY OF OXFORD http://hqdesktop.net/wallpapers/l/1440x900/44/twitter_advertisement_website_portuguese_skype_old_fashion_1440x900_43938.jpg
  • 16. A case study: EEBO-TCP Bodleian Libraries UNIVERSITY OF OXFORD Early English Books Online Text Creation Partnership: books in the Short Title Catalogue Scale: c.130,000 metadata records and image sets TCP Phase I: c.25,000 digital texts TCP Phase II: c. 40,000 digital texts (and counting) Scope: searchable, readable, marked-up, digital, full texts
  • 17. A case study: EEBO-TCP Bodleian Libraries UNIVERSITY OF OXFORD Early English Books Online Text Creation Partnership: books in the Short Title Catalogue Scale: c.130,000 metadata records and image sets TCP Phase I: c.25,000 digital texts: available! TCP Phase II: c. 40,000 digital texts (and counting) Scope: searchable, readable, marked-up, digital, full texts
  • 18. A case study: EEBO-TCP Bodleian Libraries UNIVERSITY OF OXFORD http://gateway.proquest.com/openurl? ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:98209
  • 19. A case study: EEBO-TCP Bodleian Libraries UNIVERSITY OF OXFORD http://gateway.proquest.com/openurl? ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:98209
  • 20. A case study: EEBO-TCP Bodleian Libraries UNIVERSITY OF OXFORD http://gateway.proquest.com/openurl? ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:98209
  • 21. A case study: EEBO-TCP Bodleian Libraries UNIVERSITY OF OXFORD Some things we mark up: Textual divisions, with descriptions Opening material, e.g. arguments, salutes Closing material, e.g. signatures, dates Letters, lists and tables Speakers, speeches, stage directions, quotations Textual notes
  • 22. A case study: EEBO-TCP Bodleian Libraries UNIVERSITY OF OXFORD and some things we dont: Non-Roman alphabets Music Complex mathematical material Illegible characters Manuscript Damaged or missing material
  • 23. EEBO-TCP: a buildable resource Bodleian Libraries UNIVERSITY OF OXFORD Distant reading Duhaime and Zimmer, DocuScope, AdornMorph Close reading Verse Miscellanies Online, Digital Anthology of Early English Drama, Forms Online Renaissance to Modern
  • 24. Early English Print in the HathiTrust Bodleian Libraries UNIVERSITY OF OXFORD KevinPage&PipWillcox Anon.Atrueandperfectdescriptionofthestrangeandwonderfulshe-elephant,sentfromtheIndies,whicharrivedat London,August1.1683.Withthetrueportraictureofthatwonderinnature(London:1683).Ashm.H24[42].Image: BodleianLibraries. Coryate,Thomas,ThomasCoriatetrauellerfortheEnglishvvits:greetingFromthecourtoftheGreatMogul, residentatthetowneofAsmere,ineasterneIndia(London:1616).ViaEEBOhttp://gateway.proquest.com/ openurl?ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:9182 Anon,AfullandtruerelationoftheelephantthatisbroughtoverintoEnglandfromtheIndies,andlandedat London,August3d.1675.Givinglikewiseatrueaccountofthewonderful nature,understanding,breeding,takingandtamingofelephants(London,1675).ViaEEBO:http:// gateway.proquest.com/openurl?ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:184581. TerhiNurmikko-Fuller
  • 25. A case study: SQA Bodleian Libraries UNIVERSITY OF OXFORD The origins: pre-1642 quartos from JISC/NEH Transatlantic Digitization Collaboration Grant http://quartos.org/
  • 26. A case study: SQA Bodleian Libraries UNIVERSITY OF OXFORD Bodleian, Arch. G d.41
  • 27. A case study: SQA Bodleian Libraries UNIVERSITY OF OXFORD Bodleian, Arch. G d.41
  • 28. A case study: SQA Bodleian Libraries UNIVERSITY OF OXFORD Folger, STC 22279, copy 5
  • 29. A case study: SQA Bodleian Libraries UNIVERSITY OF OXFORD Folger, STC 22279, copy 5 <l>With all my imperfections on my head.</l> <l><add place=margin-left hand=#af type=intervention resp=#fol>Ham</add>Oh horrible, O horrible, most horrible,</l> <l>If thou hast nature in thee beare it not,</l>
  • 30. A case study: SQA Bodleian Libraries UNIVERSITY OF OXFORD Folger, STC 22279, copy 5 <l>With all my imperfections on my head.</l> <l><add place=margin-left hand=#af type=intervention resp=#fol>Ham</add>Oh horrible, O horrible, most horrible,</l> <l>If thou hast nature in thee beare it not,</l> <delSpan> surrounding the original <l> <anchor> (for the <delSpan>) <addSpan> closing </sp> (speech) opening <sp> opening <speaker> with its associated attributes the line, in its entirety second closing </sp> <anchor> (for the <addSpan>) opening <sp> (to reopen the printed speech) opening <speaker> (to repeat the original speaker)
  • 31. A case study: SQA Bodleian Libraries UNIVERSITY OF OXFORD Nobody has ever answered yes to Let me show you my XML ...except a computer HeatherFroehlich DavidDeRoure http://鍖rstfolio.bodleian.ox.ac.uk/
  • 32. Limitations of the Text Encoding Initiative (TEI) Bodleian Libraries UNIVERSITY OF OXFORD Time and funding Expert editors A learning curve An extended subset An XML international standard A set of Guidelines For encoding historical text A community of practice: conference, mailing list, journal, wiki, SourceForge, toolchain Future-proof(ish)
  • 33. The Future, or,An Invitation to Hubris Bodleian Libraries UNIVERSITY OF OXFORD DavidDeRoure http://www.slideshare.net/davidderoure/future-of-scholarly-communications
  • 34. The Future, or,An Invitation to Hubris Bodleian Libraries UNIVERSITY OF OXFORD More connections across: texts, programs, communities More integration between: semantic interoperability More tools, more animation Co-constitution Heterogeneous actors, human and machine Performative and social SusanHalfordetal: http://eprints.soton.ac.uk/271033/
  • 35. Find out more Bodleian Libraries UNIVERSITY OF OXFORD Teach Yourself TEI: http://www.tei-c.org/Support/Learn/tutorials.xml TEI Massive Open Online Course (MOOC) is coming TEI Conference, 2831 October 2015, Lyon, France: Text Encoding Initiative: connect, animate, innovate