際際滷

際際滷Share a Scribd company logo
Digitization Projects at the State Library of Pennsylvania:  Where the Past and Future Meet   Bill Nork Head of Systems & Preservation William Fee Digital Collections Librarian Kurt Bodling Digital Resources Cataloger Pennsylvania Department of Education,  State Library of Pennsylvania
www.statelibrary.state.pa.us/digital_projects Or Visit the State Library of PA Website www.statelibrary.state.pa.us Select Digital Projects of the State Library
Digitization things learned the hard way Or why do I drink so much coffee? By Bill Fee Try to plan things out as much as you can before starting a project No matter how much you plan, something will blow up in your face. Its often better to throw people at a problem than equipment (if they hit just right, this also counts as percussive maintenance) Loud, obnoxious and driving punk rock and techno really improve the workflow (though that could be just a personal preference)
Hardware & Software We run a Dell Optiplex GX260 with a 2.26 Ghz non-hyperthreaded processor.  Alas, were a PC shop. Scanner-wise, we have a $25,000 Minolta PS 7000 overhead engine book scanner and an HP ScanJet 7400C thats up for replacement.
Hardware & Software- again Direct scans into Photoshop.油油 I can save the archival TIFF, then edit it and create the access JPEG right there.油 As a library you should be able to get an educational license, which is a heck of a lot cheaper.油 The program itself油 may seem more full featured than you need, but things like batch process when you're doing a whole directory of images with the same edits and sizing really save time.油 Get them to pay for classes, though- about 200 per but well worth it.
Still More Hardware & Software We use Omnipage for OCR.油 You'll save yourself a heck of a lot of correction time by doing a dual scan- 1 into Photoshop, one directly into the OCR program, whichever you use.油 Omnipage has about a 98 or 99 percent accuracy for anything but newspapers, but there are others just as good.油 Hit up ComputerShopper.com and read reviews. If I'm doing a web page, I use the Composer feature in Mozilla or Netscape. Ive been using these programs and essentially the same hardware since the bad old pre-standards Dark Ages of 5 years ago, and they seem to work.
What criteria do you use to have an item digitized? Must be PA related. Usually in such poor shape that it cannot circulate, or from the Rare Book Room, or ordered by the Director or Commissioner. Must have less than 5-10 holding libraries in FirstSearch (not counting us). Usually fits a theme- current is the VLaT project- Violence, Labor and Transportation = riots, train wrecks, mine accidents, etc.
Other problems you will find Bureaucracy Shipment File and folder nomenclature Poor scans and OCR Storage Personnel  High-priority projects New software, new uses for software, new problem with software that only come up because its a new project.
Metadata Considerations Kurt A.T. Bodling Digital Resources Cataloger State Library of Pennsylvania
The Starting Place What is the digital object? Something newly created? Already cataloged? A collection? A single item? A selection from an item? Who is it for?
油
油
Ben Franklin solutions Easy call: siphon data from OPAC Tougher: dealing with chapters and single letters
油
General solution to obit challenges Sampling and testing Hunting down exceptions Creating a data dictionary And, of course, going back later to make changes
Data Dictionary defined MARC : AACR2 ::  Dublin Core : Data Dictionary
油
油
Creating the data dictionary Simple issues first: Steal data from the catalog Use boilerplate rights management statement Get repeated data into a template
Creating the data dictionary More difficult challenges Names of the deceased Citation to original source newspapers Omissions Enhancements Difficulties caused by original scrapbooking
Names of the deceased Not authority controlled Variations between two obit versions Variations within one obit Lacking first name
Name variations:
Anonymous child:
Names of the deceased Solutions: Enter only surname, but Enter all spellings that appear
Citations to original sources Visible on microfilm, but NOT in jpeg Easily recoverable
Citations to original sources Solution: Leave this information out of metadata
Omissions Blank pages Pages glued together Military unit information
Military unit info:
Omissions Solutions: Record page numbers as they appear Note when pages dont appear Omit unit information
Enhancements Geographic info Occupational info Marital status And on and on and on.
油
油
Enhancements Solutions: Forego most enrichment Include former slave Include some terms like suicide and murder
Scrapbook difficulties Running on to second page Running on to 3 rd , 4 th , 5 th   pages
Multiple page obit:
Scrapbook difficulties Repeated obituaries
Scrapbook difficulties Label at bottom of page, obit on next
Text and title split:
Scrapbook difficulties Year-end cumulative death notice Articles that were not obits at all Volumes containing two years
Cumulative notice:
Not an obit:
My Lessons Learned  Metadata isnt (arent?) scary Patience and perseverance win out Small crew = quick decisions
What Did we Learn? More man-hours than we thought More staffing to complete task Decisions about how deep to go with metadata
Questions? Call or email one of us Bill Fee  717-783-7014 wfee@state.pa.us Kurt Bodling 717-783-5996 kbodling@state.pa.us Bill Nork 717-787-9128 [email_address]
Ad

Recommended

ODP
Why Open Source Software Matters
Goodnight Memorial Library
PPTX
From Early Modern Printing to Post-Modern Indie Publishing: Using eMOP on AFP
Matt Christy
PPTX
Web technology: Web search
Victor de Boer
PPTX
mchristy-DH2014-emop-bookhistory-tools
Matt Christy
PPTX
eMOP-PennSt-lunch
Matt Christy
PDF
[NEW] PDF MI-ARCHITECTURE NOW! TEMPORARY
NettieMortimore
PPT
WaterWise Plant Choices
zavesond
PPT
QP user group ALA MW 2011
OCLC
PPTX
Paris Barcelona vacation Nov. 2011
zavesond
PPT
Ws6report
Regina Koury
PPTX
WaterWise Design Templates
zavesond
PPT
A Theory & Practice of the Crusades & the Inquisition
Joffre Balce
PPTX
Arbutus Garden Arts Display Garden Tour
zavesond
PPT
MyLifeBits van Microsoft
Edwin Mijnsbergen
PPTX
Cleaning and sorting data
Nina Sandlin
PPTX
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
National Information Standards Organization (NISO)
ODP
Even internet computers want to be free: Using Linux and open source software...
North Bend Public Library
PPT
Log Mining: Beyond Log Analysis
Anton Chuvakin
PDF
The Big Data Developer (@pavlobaron)
Pavlo Baron
PPT
Software Engineering 9960 Library Lecture
dansich
PPT
Zen and the Art of ILS Migration--KUDOSCon 2011
D Ruth Bavousett
PPT
PSU Guest Lecture: Database Programming
borkweb
PDF
Nuts and bolts
NBER
PPTX
2013.01.17 the mechanics of setting up and running a successful law practice
Alan Klevan
PPT
Introduction to The Master Genealogist
Teresa Pask
PDF
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
Daniel Zivkovic
PPT
Blogs Logs Pods: Smart Labs
Jeremy Frey
PPT
Kellogg XML Holland Speech
Dave Kellogg
PDF
Rental Cars and Industrialized Learning to Rank with Sean Downes
Databricks

More Related Content

Viewers also liked (6)

PPT
QP user group ALA MW 2011
OCLC
PPTX
Paris Barcelona vacation Nov. 2011
zavesond
PPT
Ws6report
Regina Koury
PPTX
WaterWise Design Templates
zavesond
PPT
A Theory & Practice of the Crusades & the Inquisition
Joffre Balce
PPTX
Arbutus Garden Arts Display Garden Tour
zavesond
QP user group ALA MW 2011
OCLC
Paris Barcelona vacation Nov. 2011
zavesond
Ws6report
Regina Koury
WaterWise Design Templates
zavesond
A Theory & Practice of the Crusades & the Inquisition
Joffre Balce
Arbutus Garden Arts Display Garden Tour
zavesond

Similar to Digitization Projects Tech Con 2006 (20)

PPT
MyLifeBits van Microsoft
Edwin Mijnsbergen
PPTX
Cleaning and sorting data
Nina Sandlin
PPTX
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
National Information Standards Organization (NISO)
ODP
Even internet computers want to be free: Using Linux and open source software...
North Bend Public Library
PPT
Log Mining: Beyond Log Analysis
Anton Chuvakin
PDF
The Big Data Developer (@pavlobaron)
Pavlo Baron
PPT
Software Engineering 9960 Library Lecture
dansich
PPT
Zen and the Art of ILS Migration--KUDOSCon 2011
D Ruth Bavousett
PPT
PSU Guest Lecture: Database Programming
borkweb
PDF
Nuts and bolts
NBER
PPTX
2013.01.17 the mechanics of setting up and running a successful law practice
Alan Klevan
PPT
Introduction to The Master Genealogist
Teresa Pask
PDF
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
Daniel Zivkovic
PPT
Blogs Logs Pods: Smart Labs
Jeremy Frey
PPT
Kellogg XML Holland Speech
Dave Kellogg
PDF
Rental Cars and Industrialized Learning to Rank with Sean Downes
Databricks
ODP
Jared Whitlock Open Source In The Enterprise Plone @ Novell
Vincenzo Barone
PDF
Data Workflows for Machine Learning - SF Bay Area ML
Paco Nathan
PPT
The L R C Orientation Seminar
Meridian Career Institute
PDF
Semantic web, python, construction industry
Reinout van Rees
MyLifeBits van Microsoft
Edwin Mijnsbergen
Cleaning and sorting data
Nina Sandlin
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
National Information Standards Organization (NISO)
Even internet computers want to be free: Using Linux and open source software...
North Bend Public Library
Log Mining: Beyond Log Analysis
Anton Chuvakin
The Big Data Developer (@pavlobaron)
Pavlo Baron
Software Engineering 9960 Library Lecture
dansich
Zen and the Art of ILS Migration--KUDOSCon 2011
D Ruth Bavousett
PSU Guest Lecture: Database Programming
borkweb
Nuts and bolts
NBER
2013.01.17 the mechanics of setting up and running a successful law practice
Alan Klevan
Introduction to The Master Genealogist
Teresa Pask
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
Daniel Zivkovic
Blogs Logs Pods: Smart Labs
Jeremy Frey
Kellogg XML Holland Speech
Dave Kellogg
Rental Cars and Industrialized Learning to Rank with Sean Downes
Databricks
Jared Whitlock Open Source In The Enterprise Plone @ Novell
Vincenzo Barone
Data Workflows for Machine Learning - SF Bay Area ML
Paco Nathan
The L R C Orientation Seminar
Meridian Career Institute
Semantic web, python, construction industry
Reinout van Rees
Ad

Recently uploaded (20)

PDF
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
PDF
MPU+: A Transformative Solution for Next-Gen AI at the Edge, a Presentation...
Edge AI and Vision Alliance
PPTX
Wenn alles versagt - IBM Tape sch端tzt, was z辰hlt! Und besonders mit dem neust...
Josef Weingand
PPTX
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
PDF
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
PDF
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
DOCX
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
PDF
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
PPTX
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
PDF
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
PDF
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
PPTX
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
PDF
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
PDF
The Growing Value and Application of FME & GenAI
Safe Software
PPTX
UserCon Belgium: Honey, VMware increased my bill
stijn40
PDF
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
PDF
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
PDF
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
PDF
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
PDF
The Future of Product Management in AI ERA.pdf
Alyona Owens
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
MPU+: A Transformative Solution for Next-Gen AI at the Edge, a Presentation...
Edge AI and Vision Alliance
Wenn alles versagt - IBM Tape sch端tzt, was z辰hlt! Und besonders mit dem neust...
Josef Weingand
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
The Growing Value and Application of FME & GenAI
Safe Software
UserCon Belgium: Honey, VMware increased my bill
stijn40
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
The Future of Product Management in AI ERA.pdf
Alyona Owens
Ad

Digitization Projects Tech Con 2006

  • 1. Digitization Projects at the State Library of Pennsylvania: Where the Past and Future Meet Bill Nork Head of Systems & Preservation William Fee Digital Collections Librarian Kurt Bodling Digital Resources Cataloger Pennsylvania Department of Education, State Library of Pennsylvania
  • 2. www.statelibrary.state.pa.us/digital_projects Or Visit the State Library of PA Website www.statelibrary.state.pa.us Select Digital Projects of the State Library
  • 3. Digitization things learned the hard way Or why do I drink so much coffee? By Bill Fee Try to plan things out as much as you can before starting a project No matter how much you plan, something will blow up in your face. Its often better to throw people at a problem than equipment (if they hit just right, this also counts as percussive maintenance) Loud, obnoxious and driving punk rock and techno really improve the workflow (though that could be just a personal preference)
  • 4. Hardware & Software We run a Dell Optiplex GX260 with a 2.26 Ghz non-hyperthreaded processor. Alas, were a PC shop. Scanner-wise, we have a $25,000 Minolta PS 7000 overhead engine book scanner and an HP ScanJet 7400C thats up for replacement.
  • 5. Hardware & Software- again Direct scans into Photoshop.油油 I can save the archival TIFF, then edit it and create the access JPEG right there.油 As a library you should be able to get an educational license, which is a heck of a lot cheaper.油 The program itself油 may seem more full featured than you need, but things like batch process when you're doing a whole directory of images with the same edits and sizing really save time.油 Get them to pay for classes, though- about 200 per but well worth it.
  • 6. Still More Hardware & Software We use Omnipage for OCR.油 You'll save yourself a heck of a lot of correction time by doing a dual scan- 1 into Photoshop, one directly into the OCR program, whichever you use.油 Omnipage has about a 98 or 99 percent accuracy for anything but newspapers, but there are others just as good.油 Hit up ComputerShopper.com and read reviews. If I'm doing a web page, I use the Composer feature in Mozilla or Netscape. Ive been using these programs and essentially the same hardware since the bad old pre-standards Dark Ages of 5 years ago, and they seem to work.
  • 7. What criteria do you use to have an item digitized? Must be PA related. Usually in such poor shape that it cannot circulate, or from the Rare Book Room, or ordered by the Director or Commissioner. Must have less than 5-10 holding libraries in FirstSearch (not counting us). Usually fits a theme- current is the VLaT project- Violence, Labor and Transportation = riots, train wrecks, mine accidents, etc.
  • 8. Other problems you will find Bureaucracy Shipment File and folder nomenclature Poor scans and OCR Storage Personnel High-priority projects New software, new uses for software, new problem with software that only come up because its a new project.
  • 9. Metadata Considerations Kurt A.T. Bodling Digital Resources Cataloger State Library of Pennsylvania
  • 10. The Starting Place What is the digital object? Something newly created? Already cataloged? A collection? A single item? A selection from an item? Who is it for?
  • 11.
  • 12.
  • 13. Ben Franklin solutions Easy call: siphon data from OPAC Tougher: dealing with chapters and single letters
  • 14.
  • 15. General solution to obit challenges Sampling and testing Hunting down exceptions Creating a data dictionary And, of course, going back later to make changes
  • 16. Data Dictionary defined MARC : AACR2 :: Dublin Core : Data Dictionary
  • 17.
  • 18.
  • 19. Creating the data dictionary Simple issues first: Steal data from the catalog Use boilerplate rights management statement Get repeated data into a template
  • 20. Creating the data dictionary More difficult challenges Names of the deceased Citation to original source newspapers Omissions Enhancements Difficulties caused by original scrapbooking
  • 21. Names of the deceased Not authority controlled Variations between two obit versions Variations within one obit Lacking first name
  • 24. Names of the deceased Solutions: Enter only surname, but Enter all spellings that appear
  • 25. Citations to original sources Visible on microfilm, but NOT in jpeg Easily recoverable
  • 26. Citations to original sources Solution: Leave this information out of metadata
  • 27. Omissions Blank pages Pages glued together Military unit information
  • 29. Omissions Solutions: Record page numbers as they appear Note when pages dont appear Omit unit information
  • 30. Enhancements Geographic info Occupational info Marital status And on and on and on.
  • 31.
  • 32.
  • 33. Enhancements Solutions: Forego most enrichment Include former slave Include some terms like suicide and murder
  • 34. Scrapbook difficulties Running on to second page Running on to 3 rd , 4 th , 5 th pages
  • 37. Scrapbook difficulties Label at bottom of page, obit on next
  • 38. Text and title split:
  • 39. Scrapbook difficulties Year-end cumulative death notice Articles that were not obits at all Volumes containing two years
  • 42. My Lessons Learned Metadata isnt (arent?) scary Patience and perseverance win out Small crew = quick decisions
  • 43. What Did we Learn? More man-hours than we thought More staffing to complete task Decisions about how deep to go with metadata
  • 44. Questions? Call or email one of us Bill Fee 717-783-7014 wfee@state.pa.us Kurt Bodling 717-783-5996 kbodling@state.pa.us Bill Nork 717-787-9128 [email_address]