Challenges and solutions in creating a european historic newspapers browser Europeana NewspapersThis document discusses the challenges and solutions in creating a European historic newspapers browser. Key points include:
- The European Library is building an interface to allow cross-searching of digitized historic newspapers from various European partners.
- The beta version will launch in September 2013 with limited content and functionality, with ongoing updates through 2015. Usability testing will occur in 2014.
- The interface aims to provide unique value to users while respecting the wishes of contributing library partners and reflecting the relationship to original newspaper collections. It also seeks to offer added value to contributors and be sustainable over time.
Europeana Newspapers Amsterdam workshop introductionEuropeana NewspapersThe document summarizes the Europeana Newspapers Workshop. The workshop aimed to aggregate 18 million digitized historic newspaper pages from 12 European libraries to improve search capabilities. It discussed aggregating and presenting newspaper content across cultures while sharing best practices to improve availability and accessibility. The project involves 12 content providers, 2 networking partners, 4 technology providers, and 1 aggregator working to refine search through optical character recognition, layout recognition, and entity recognition.
ENP_SEEDI_2013_UBEuropeana NewspapersThe document discusses the Europeana Newspapers project which digitized over 408,000 pages of Serbian newspapers from 1850-1945. It describes how the University Library "Svetozar Markovic" partnered with the project to provide materials. Optical character recognition will be performed to make the texts searchable. Historical advertisements and commercials found in the newspapers will provide valuable insights into cultural and social history. Researchers will be able to study the development of industries and marketing over time through these primary sources. The digitized newspapers will help facilitate access to information about European history and identity.
ENP Belgrade WS IntroductionEuropeana NewspapersThis document outlines an agenda for a workshop on the Europeana Newspapers Project. The workshop will include introductions and icebreakers like "Meet & Greet" and "Democracy Wall" where attendees can share one unique thing about themselves or something they discovered. There will also be presentations on topics like dissemination and quality assessment from organizations like the National Library of Scotland and US National Archive. Upcoming related events are listed and attendees are encouraged to share information through social media tags and accounts.
eluxemburgensia: the portal for Luxembourg's historic newspapersEuropeana NewspapersThe document details the digital archive 'eLuxemburgensia,' which hosts 11 historical newspaper titles amounting to 60,000 issues and 400,000 pages, developed by the Bibliothèque Nationale de Luxembourg. It discusses the timeline of data scanning, challenges in search functionality and user interface design, as well as user feedback highlighting desired features. An open-source viewer is also mentioned, with a URL for access.
ENP Belgrade WS refinement introductionEuropeana NewspapersThis document summarizes a workshop on refining digitized newspaper collections. It discusses analyzing newspaper collections from project partners to identify subsets suitable for refinement. The objectives are to coordinate processing 10 million digitized newspaper pages using refinement technologies and provide recommendations on best practices. Challenges include balancing processing quality and speed given the large volumes of diverse content. The refinement workflow involves binarization, file renaming, analysis, optical character recognition to extract text, optical layout recognition to separate articles and columns, and named entity recognition to identify people, locations and organizations.
The European(a) Newspapers ProjectEuropeana NewspapersThe European Newspapers Project aims to aggregate and refine over 18 million digitized newspaper pages from European libraries to provide to Europeana and The European Library. The project consortium includes 17 national libraries and university libraries from 12 countries. The project will perform optical character recognition (OCR) on 10 million pages and OCR with article segmentation on 2 million pages. It will also conduct named entity recognition. The project seeks to improve access to digitized European newspaper collections through enhanced search capabilities of full newspaper texts. Dissemination activities will help increase awareness and usage of Europeana.
On the two sides of the pondEuropeana NewspapersThe document summarizes a partnership between UCLA Library and Staatsbibliothek zu Berlin to digitize a unique collection of German newspapers from 1936-1940 held only by UCLA. The newspapers were in fragile condition and digitization involved optical character recognition (OCR) in Germany, as the text was in Fraktur font. The digitized files were made freely available online and indexed to make the text searchable, with the goal of preserving this one-of-a-kind resource and making it globally accessible.
Europeana Newspapers Project - German infoday Europeana NewspapersDas Europeana Newspapers Projekt zielt darauf ab, historische Zeitungen digital zu erfassen und zugänglich zu machen, um deren kulturellen und historischen Wert zu erhalten. Die Initiative umfasst mehrere europäische Partnerbibliotheken und bietet durch diverse Arbeitspakete Dienstleistungen wie die Digitalisierung, qualitative Bewertung und Aggregation der Zeitungsinhalte an. Insgesamt werden etwa 10 Millionen Seiten aus Zeitungen in 20 Sprachen bearbeitet, wobei verschiedene Technologien wie OCR und Named Entity Recognition zum Einsatz kommen.
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisEuropeana NewspapersLe document traite des collections numérisées de journaux de la BNF, présentant leurs défis et réussites, notamment en matière d'accès, d'OCR et de recherche. Il souligne les choix documentaires réalisés, l'attente des utilisateurs pour une navigation améliorée et l'identification des contenus. Enfin, il aborde les objectifs de recherche pour enrichir l'OCR et mieux structurer les articles, tout en posant des questions sur l'avenir et l'utilisation des résultats.
Presentation of Ioannis Anagnostopoulos at BnF Information DayEuropeana NewspapersLe projet Europeana Newspapers permet de rassembler des journaux européens libres de droits et d'améliorer leur accessibilité via la plateforme Europeana, avec des fonctionnalités avancées de recherche sémantique et de reconnaissance optique de caractères. La Bibliothèque Nationale de France contribue à ce projet avec environ 3 millions de pages, visant à optimiser les processus de numérisation et à enrichir les données disponibles. Ce projet facilite également la recherche et la valorisation de contenus, en adoptant des standards tels que mets/alto pour une meilleure structuration des articles.
Presentation of Clemens Neudecker, BnF Information DayEuropeana NewspapersThe document discusses the development of Named Entity Recognition (NER) for French newspapers, highlighting its definition, the types of entities recognized (persons, organizations, places), and the project's funding by the European community. It outlines challenges faced, such as OCR errors and historical spelling variations, and provides insights into training methods, software, and initial results. The goal of NER is to improve information extraction from unstructured text, particularly within cultural heritage digitization.
Presentation of Hans-Jörg Lieder, BnF Information DayEuropeana NewspapersThe Europeana Newspapers project, funded under the ICT Policy Support Programme, ran from February 2012 to January 2015, involving 18 partners across 28 countries with a budget of €5.16 million. It aimed to digitize, OCR, and enhance access to historical newspapers, providing over 20 languages and more than 18 million pages of metadata. Key functionalities included text search, navigation by calendar, and geographical browsing to facilitate user interaction with the digital resources.
Présentation Günter Mühlberger, BnF Information DayEuropeana NewspapersOptical Character Recognition (OCR) technology can help users in their research by digitizing printed texts and enabling full-text search. However, OCR quality varies and error rates can be as high as 10-40% depending on factors like language and publication date. This can negatively impact researchers seeking all occurrences of search terms. Crowd-sourcing corrections for searched words and utilizing external knowledge sources like Wikipedia could help improve search results and researchers' experiences. Machine learning applied to large digitized collections also has potential to extract additional useful information and insights not readily apparent from the text alone.
Presentation of Claus Gravenhorst, BnF Information DayEuropeana NewspapersThe document discusses Optical Layout Recognition (OLR) to convert scanned newspaper pages into structured digital files. It describes CCS's role in providing OLR technology and services to structure over 2 million newspaper pages from 5 European library partners. The general OLR workflow involves scanning, layout analysis to identify text blocks and zones, OCR, and quality assurance. CCS will analyze page layouts to recognize elements like articles, headlines, images and classify page types. Libraries can perform final quality assurance checking on the structured output, which is packaged in METS and ALTO formats for preservation and improved search and access capabilities.
Presentation of Alaa Abi Haidar at the BnF Information DayEuropeana NewspapersCe document présente la reconnaissance d'entités nommées (NER) et son application sur les données de la BNF, en mettant en avant des exemples et des partenaires. Il aborde des méthodes comme UNERD pour la désambiguïsation contextuelle, tout en discutant des défis et des solutions à cette problématique. Les applications envisagées incluent des outils d'annotation et la visualisation des données à des fins d'analyse et de recherche.
IFLA 2014 Europeana Newspapers Rossitza AtanassovaEuropeana NewspapersThe Europeana Newspapers project is digitizing newspapers from the 17th-20th centuries across 22 European languages. It has provided full text for over 2 million newspaper pages and metadata for over 18 million additional pages. Usability testing was conducted with researchers and improvements were made to search, browsing, and display functionality based on feedback. Researchers value the project for enabling new large-scale, interdisciplinary, and computational analyses of digitized newspaper archives.
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana NewspapersThe document discusses the Europeana Newspapers project, which aims to digitize over 18 million newspaper pages from various European newspapers ranging from the 17th to 20th centuries. The project involves 12 content providers, 2 networking partners, 4 technology providers and 1 aggregator working together to improve access to historical newspapers. Key aspects of the project include cultural cooperation, skills sharing, improved search capabilities through technologies like optical character recognition. The project highlights how digitization has improved access to historical newspapers and their coverage of events like the Titanic disaster across different European countries.
Europeana Newspapers Estonian Infoday Fred PussEuropeana NewspapersEuropeana Newspapers Estonian Information day presentation by Fred Puss
Europeana Newpapers LFT Infoday NeudeckerEuropeana NewspapersDas Dokument beschreibt das Konzept der Named Entity Recognition (NER) als Methode zur automatischen Extraktion und Klassifikation von Informationen aus unstrukturierten Inhalten, insbesondere in historischen Zeitungen. Es betont die Herausforderungen wie OCR-Fehler und historische Schreibvarianten und zeigt erste Ergebnisse des NER-Systems in Bezug auf die Präzision und den Rückruf. Das Projekt wird teilweise aus dem ICT Policy Support Programme der Europäischen Gemeinschaft finanziert.
Europeana Newspapers LFT Infoday RossiEuropeana NewspapersIl documento descrive la creazione della Biblioteca Digitale Piemontese (BDP) e la Biblioteca Digitale dell'Informazione Giornalistica (BDIG), evidenziando l'importanza della valorizzazione e conservazione degli archivi storici dei giornali piemontesi. Il progetto include un motore di ricerca avanzato basato su tecnologie semantiche per facilitare l'accesso alle informazioni. Vengono presentati anche i futuri sviluppi, inclusa l'implementazione dell'Archivio Storico dell'Editoria Piemontese (ASEP).
ENP Belgrade WS IntroductionEuropeana NewspapersThis document outlines an agenda for a workshop on the Europeana Newspapers Project. The workshop will include introductions and icebreakers like "Meet & Greet" and "Democracy Wall" where attendees can share one unique thing about themselves or something they discovered. There will also be presentations on topics like dissemination and quality assessment from organizations like the National Library of Scotland and US National Archive. Upcoming related events are listed and attendees are encouraged to share information through social media tags and accounts.
eluxemburgensia: the portal for Luxembourg's historic newspapersEuropeana NewspapersThe document details the digital archive 'eLuxemburgensia,' which hosts 11 historical newspaper titles amounting to 60,000 issues and 400,000 pages, developed by the Bibliothèque Nationale de Luxembourg. It discusses the timeline of data scanning, challenges in search functionality and user interface design, as well as user feedback highlighting desired features. An open-source viewer is also mentioned, with a URL for access.
ENP Belgrade WS refinement introductionEuropeana NewspapersThis document summarizes a workshop on refining digitized newspaper collections. It discusses analyzing newspaper collections from project partners to identify subsets suitable for refinement. The objectives are to coordinate processing 10 million digitized newspaper pages using refinement technologies and provide recommendations on best practices. Challenges include balancing processing quality and speed given the large volumes of diverse content. The refinement workflow involves binarization, file renaming, analysis, optical character recognition to extract text, optical layout recognition to separate articles and columns, and named entity recognition to identify people, locations and organizations.
The European(a) Newspapers ProjectEuropeana NewspapersThe European Newspapers Project aims to aggregate and refine over 18 million digitized newspaper pages from European libraries to provide to Europeana and The European Library. The project consortium includes 17 national libraries and university libraries from 12 countries. The project will perform optical character recognition (OCR) on 10 million pages and OCR with article segmentation on 2 million pages. It will also conduct named entity recognition. The project seeks to improve access to digitized European newspaper collections through enhanced search capabilities of full newspaper texts. Dissemination activities will help increase awareness and usage of Europeana.
On the two sides of the pondEuropeana NewspapersThe document summarizes a partnership between UCLA Library and Staatsbibliothek zu Berlin to digitize a unique collection of German newspapers from 1936-1940 held only by UCLA. The newspapers were in fragile condition and digitization involved optical character recognition (OCR) in Germany, as the text was in Fraktur font. The digitized files were made freely available online and indexed to make the text searchable, with the goal of preserving this one-of-a-kind resource and making it globally accessible.
Europeana Newspapers Project - German infoday Europeana NewspapersDas Europeana Newspapers Projekt zielt darauf ab, historische Zeitungen digital zu erfassen und zugänglich zu machen, um deren kulturellen und historischen Wert zu erhalten. Die Initiative umfasst mehrere europäische Partnerbibliotheken und bietet durch diverse Arbeitspakete Dienstleistungen wie die Digitalisierung, qualitative Bewertung und Aggregation der Zeitungsinhalte an. Insgesamt werden etwa 10 Millionen Seiten aus Zeitungen in 20 Sprachen bearbeitet, wobei verschiedene Technologien wie OCR und Named Entity Recognition zum Einsatz kommen.
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisEuropeana NewspapersLe document traite des collections numérisées de journaux de la BNF, présentant leurs défis et réussites, notamment en matière d'accès, d'OCR et de recherche. Il souligne les choix documentaires réalisés, l'attente des utilisateurs pour une navigation améliorée et l'identification des contenus. Enfin, il aborde les objectifs de recherche pour enrichir l'OCR et mieux structurer les articles, tout en posant des questions sur l'avenir et l'utilisation des résultats.
Presentation of Ioannis Anagnostopoulos at BnF Information DayEuropeana NewspapersLe projet Europeana Newspapers permet de rassembler des journaux européens libres de droits et d'améliorer leur accessibilité via la plateforme Europeana, avec des fonctionnalités avancées de recherche sémantique et de reconnaissance optique de caractères. La Bibliothèque Nationale de France contribue à ce projet avec environ 3 millions de pages, visant à optimiser les processus de numérisation et à enrichir les données disponibles. Ce projet facilite également la recherche et la valorisation de contenus, en adoptant des standards tels que mets/alto pour une meilleure structuration des articles.
Presentation of Clemens Neudecker, BnF Information DayEuropeana NewspapersThe document discusses the development of Named Entity Recognition (NER) for French newspapers, highlighting its definition, the types of entities recognized (persons, organizations, places), and the project's funding by the European community. It outlines challenges faced, such as OCR errors and historical spelling variations, and provides insights into training methods, software, and initial results. The goal of NER is to improve information extraction from unstructured text, particularly within cultural heritage digitization.
Presentation of Hans-Jörg Lieder, BnF Information DayEuropeana NewspapersThe Europeana Newspapers project, funded under the ICT Policy Support Programme, ran from February 2012 to January 2015, involving 18 partners across 28 countries with a budget of €5.16 million. It aimed to digitize, OCR, and enhance access to historical newspapers, providing over 20 languages and more than 18 million pages of metadata. Key functionalities included text search, navigation by calendar, and geographical browsing to facilitate user interaction with the digital resources.
Présentation Günter Mühlberger, BnF Information DayEuropeana NewspapersOptical Character Recognition (OCR) technology can help users in their research by digitizing printed texts and enabling full-text search. However, OCR quality varies and error rates can be as high as 10-40% depending on factors like language and publication date. This can negatively impact researchers seeking all occurrences of search terms. Crowd-sourcing corrections for searched words and utilizing external knowledge sources like Wikipedia could help improve search results and researchers' experiences. Machine learning applied to large digitized collections also has potential to extract additional useful information and insights not readily apparent from the text alone.
Presentation of Claus Gravenhorst, BnF Information DayEuropeana NewspapersThe document discusses Optical Layout Recognition (OLR) to convert scanned newspaper pages into structured digital files. It describes CCS's role in providing OLR technology and services to structure over 2 million newspaper pages from 5 European library partners. The general OLR workflow involves scanning, layout analysis to identify text blocks and zones, OCR, and quality assurance. CCS will analyze page layouts to recognize elements like articles, headlines, images and classify page types. Libraries can perform final quality assurance checking on the structured output, which is packaged in METS and ALTO formats for preservation and improved search and access capabilities.
Presentation of Alaa Abi Haidar at the BnF Information DayEuropeana NewspapersCe document présente la reconnaissance d'entités nommées (NER) et son application sur les données de la BNF, en mettant en avant des exemples et des partenaires. Il aborde des méthodes comme UNERD pour la désambiguïsation contextuelle, tout en discutant des défis et des solutions à cette problématique. Les applications envisagées incluent des outils d'annotation et la visualisation des données à des fins d'analyse et de recherche.
IFLA 2014 Europeana Newspapers Rossitza AtanassovaEuropeana NewspapersThe Europeana Newspapers project is digitizing newspapers from the 17th-20th centuries across 22 European languages. It has provided full text for over 2 million newspaper pages and metadata for over 18 million additional pages. Usability testing was conducted with researchers and improvements were made to search, browsing, and display functionality based on feedback. Researchers value the project for enabling new large-scale, interdisciplinary, and computational analyses of digitized newspaper archives.
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana NewspapersThe document discusses the Europeana Newspapers project, which aims to digitize over 18 million newspaper pages from various European newspapers ranging from the 17th to 20th centuries. The project involves 12 content providers, 2 networking partners, 4 technology providers and 1 aggregator working together to improve access to historical newspapers. Key aspects of the project include cultural cooperation, skills sharing, improved search capabilities through technologies like optical character recognition. The project highlights how digitization has improved access to historical newspapers and their coverage of events like the Titanic disaster across different European countries.
Europeana Newspapers Estonian Infoday Fred PussEuropeana NewspapersEuropeana Newspapers Estonian Information day presentation by Fred Puss
Europeana Newpapers LFT Infoday NeudeckerEuropeana NewspapersDas Dokument beschreibt das Konzept der Named Entity Recognition (NER) als Methode zur automatischen Extraktion und Klassifikation von Informationen aus unstrukturierten Inhalten, insbesondere in historischen Zeitungen. Es betont die Herausforderungen wie OCR-Fehler und historische Schreibvarianten und zeigt erste Ergebnisse des NER-Systems in Bezug auf die Präzision und den Rückruf. Das Projekt wird teilweise aus dem ICT Policy Support Programme der Europäischen Gemeinschaft finanziert.
Europeana Newspapers LFT Infoday RossiEuropeana NewspapersIl documento descrive la creazione della Biblioteca Digitale Piemontese (BDP) e la Biblioteca Digitale dell'Informazione Giornalistica (BDIG), evidenziando l'importanza della valorizzazione e conservazione degli archivi storici dei giornali piemontesi. Il progetto include un motore di ricerca avanzato basato su tecnologie semantiche per facilitare l'accesso alle informazioni. Vengono presentati anche i futuri sviluppi, inclusa l'implementazione dell'Archivio Storico dell'Editoria Piemontese (ASEP).
Enp lft infoday_neudeckerEuropeana NewspapersDas Dokument beschreibt Named Entity Recognition (NER) als ein Subfeld der Informationsextraktion, das darauf abzielt, Wissen aus unstrukturierten Inhalten automatisch zu extrahieren und zu klassifizieren. Es wird aufgezeigt, wie NER Personen, Orte und Organisationen identifizieren kann und welche Herausforderungen, wie OCR-Fehler und historische Schreibvarianten, bestehen. Zudem wird die Bedeutung von NER in digitalen Zeitungsarchiven hervorgehoben, da ein Großteil der Suchanfragen auf Personen oder Orte entfällt.
Europeana Newspapers LFT Infoday MuehlbergerEuropeana NewspapersThis document discusses optical character recognition (OCR) of historical newspapers. It describes the digitization process, which includes image capturing, text and structure recognition, natural language processing, and content representation. OCR accuracy can be improved through layout analysis, structural metadata extraction, and identifying different content units like articles, advertisements, and entertainment sections. The goal is to make the content and knowledge within digitized newspapers accessible beyond the scanned text.
Europeana Newspapers LFT Infoday MessinaEuropeana NewspapersIl documento esplora le recenti tendenze nella conservazione digitale, evidenziando l'importanza di un'infrastruttura europea per garantire l'integrità e l'accesso a lungo termine dei file. Viene discusso il ruolo delle e-infrastrutture e le sfide principali nell'adottare procedure comuni tra le istituzioni culturali. Inoltre, vengono presentati i servizi già disponibili e le azioni a breve termine necessarie per armonizzare e integrare i processi di conservazione digitale.
Europeana Newspapers Infoday MarchettiEuropeana NewspapersIl documento discute la visualizzazione di archivi storici di giornali e la promozione di archivi digitali attraverso esempi come ClaviusOnTheWeb e GeoMemories. Viene analizzata l'importanza di una corretta visualizzazione dei dati storici e le sfide legate alla traduzione e interpretazione di testi antichi. Inoltre, presenta una serie di tecnologie utilizzate per l'acquisizione e la visualizzazione, sottolineando l'impatto educativo e sociologico di tali iniziative.
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisEuropeana Newspapers
Presentation of Ioannis Anagnostopoulos at BnF Information DayEuropeana Newspapers
Europeana Newspapers: novo mesto susreta korisnika digitalnih sadržaja
1. “Europeana Newspapers”
- novo mesto susreta korisnika digitalnih sadržaja -
Nataša Dakić, Aleksandra Trtovac
Univerzitetska biblioteka
“Svetozar Markovic”, Beograd
Deveti međunarodni susreti bibliotekara slavista Sarajevo, april, 14.–17. 2013
2. 2
Projekat “Europeana Newspapers”
Zašto novine?
• relevantan izvor informacija za sve građane
• visoko relevantan izvor informacija za istraživače
Stanje novina u bibliotekama između raja i pakla
• solidni kompletni originali, izuzetne kopije na
mikrofilmovima
• krti, lomljivi originali, nekompletni brojevi, suplementi
3. 3
Projekat “Europeana Newspapers”
• Projekat traje od februara 2012. do januara 2015.
• Finansiran je od strane Evropske komisije u okviru petog
poziva Programa podrške razvoju politike informaciono-
komunikacionih tehnologija (CIP ICT-PSP Best Practice
Network 5thCall)
• Koordinator projekta je Državna biblioteka u Berlinu
(Staatsbibliothek zu Berlin)
4. 4
Partneri u projektu
• 17 partnera iz 12 zemalja:
• Nacionalne biblioteke
• Univerzitetske biblioteke
• LIBER i CSS
5. 5
Partneri u projektu
• Državna biblioteka u Berlinu
• Nacionalna biblioteka Estonije
• Univerzitet u Helsinkiju
• Nacionalna biblioteka Francuske
• Nacionalna biblioteka Letonije
• Univerzitet u Beogradu
• Biblioteka dr Fridrih Tesman
• Univerzitet u Salfordu
• CCS Content Conversion
Specialists GmbH
• Nacionalna biblioteka Holandije
• Austrijska nacionalna biblioteka
• Univerzitet u Hamburgu
• Nacionalna biblioteka Poljske
• Nacionalna biblioteka Turske
• Univerzitet u Insbruku
• Britanska biblioteka
• The European Library (TEL)
• LIBER fondacija
6. Pridruženi partneri u projektu
• Nacionalna biblioteka Češke
• Nacionalna i univerzitetska biblioteka, Ljubljana, Slovenija
• Nacionalna i univerzitetska biblioteka Islanda
• Nacionalna i univerzitetska biblioteka, Zagreb, Hrvatska
• Nacionalna biblioteka „Sveti Ćirilo i Metodije“, Bugarska
• Centralna univerzitetska biblioteka Lucian Blaga, Rumunija
• Nacionalna biblioteka Velsa
• Nacionalna biblioteka Portugala
• Nacionalna biblioteka Španije
• Nacionalna biblioteka Belgije
• Nacionalna biblioteka Luksemburga
7. 7
Ciljevi projekta “Europeana Newspapers”
• Europeana će postati najveći
provajder panevropskih novinskih
kolekcija
• Istražiti stanje novinskih zbirki u
evropskim bibliotekama
• Obezbediti kvalitetne kopije i
doneti preporuke za dalji rad što se
tiče digitalizacije, prečišćavanja
teksta, metapodataka...
• Poboljšati pristup zbirkama novina
u Evropi
8. 8
Ciljevi projekta “Europeana Newspapers”
• Tokom projekta Evropskoj
biblioteci (TEL – The European
Library) i Europeani će biti
dostavljeno oko 18 miliona
digitalizovanih strana novina od
kojih su mnoge u punom tekstu
9. 9
Agregacija i prečišćavanje teksta
•Korišćenje metoda za optičko prepoznavanje teksta (OCR -
Optical Character Recognition) je predviđeno za 10 miliona
strana
•Segmentacija članaka (OLR - Optical Layout Recognition)
je predviđena za 2 miliona strana
•Prepoznavanje imenovanih entiteta (NER - Named Entity
Recognition) obezbeđuje Nacionalna biblioteka Holandije
za sve materijale na nemačkom, engleskom i holandskom
jeziku
11. Metapodaci
•Razvoj i unapređenje standarda za opis tehničkih
metapodataka za optičko prepoznavanje teksta -
METS/ALTO
•Dalji razvoj Evropskog modela za metapodatke (EDM -
European Data Model) i usaglašavanje sa METS/ALTO
standardom
•Transformacija lokalnih metapodataka u EDM model
12. Pristup do punog teksta
• Složena pretraga sadržaja punog teksta preko:
• Ključnih reči
• Imenovanih entiteta
• Zbirki novina
• Datuma, godine...
• Pregledanje svake pojedinačne stranice
• Povezivanje sa relevantnim bibliotečkim izvorom
13. Univerzitetska biblioteka “Svetozar Marković” kao partner
u projektu
•Preko 400.000 digitalizovanih strana listova i novina na
srpskom jeziku, izdatih pre 1945. godine.
•Neki od naslova su Nova iskra, Podunavka, Srbadija,
Beogradske opštinske novine, Zora, Balkanski rat u slici i
reči, Ilustrovana ratna kronika, Pastir, Sion, Težak,
Peštansko-budimski skoroteča, Zemunski glasnik, Srđ,
Branič, Zvezda, Starmali, Starmladi, Stražilovo, Šumadinka
16. Kriterijumi za izbor građe
• Naslovi koje nije digitalizovala
Narodna biblioteka Srbije
• Očuvanost novina
• Tehnička ograničenja
17. Dosadašnji rezultati projekta
•ustanovljeni su određeni radni tokovi i tehnički procesi
• preciziran je obim i broj digitalizovanih novina koje su
predviđene za obradu naprednim tehnologijama
•izvršena je procena kvaliteta ovih materijala
•definisani su mogući modeli metapodataka koji će se koristiti
prilikom agregacije kao i načini prikaza digitalizovanih
sadržaja
•urađena je i specifikacija zahteva za optičko prepoznavanje
karaktera (OCR)
18. Dosadašnji rezultati projekta
• Anketa
• Sprovedena je anketa koja je imala za cilj identifikaciju i analizu svih
novinskih kolekcija koje su digitalizovale nacionalne, akademske i
javne biblioteke u Evropi do 2012. godine. U anketi je učestvovalo 47
institucija
• Rezultati:
• Pristup digitalizovanim novinama je skoro uvek besplatan u 85%
slučajeva
• 36% biblioteka nije koristilo nijednu vrstu optičkog prepoznavanja
karaktera (OCR) na digitalizovanom novinskom sadržaju
19. Planovi za naredni period
• U toku svoje druge godine projekat
će nastaviti da razvija postojeća
iskustva i da objavljuje rezultate u
cilju promocije projekta i
obezbeđivanja vidljivosti što veće
količine materijala na portalu
Europeana.
• U tom cilju planirano je održavanje
nekoliko info dana i radionica.
20. Informativni dani
•Nacionalna biblioteka Turske biće domaćin prvog info dana
3. maja 2013 u Ankari. Na ovom skupu na nacionalnom
nivou projektni partneri i turski govornici po pozivu će udružiti
snage da pokažu najnovija dostignuća u projektu vezana
prevashodno za prečišćavanja teksta u digitalizovanim
starim turskim novinama.
21. Radionice
•Prva radionica će imati za temu obradu digitalizovanih
novinskih materijala
•Druga radionica će specifikovati probleme vezane za
agregaciju i prikaz objekata
•Treća, završna radionica će se baviti evropskim novinskim
kolekcijama, kao i digitalnom agendom za Evropu
22. Radionice
•Prva radionica će se održati u Beogradu, u Univerzitetskoj
biblioteci “Svetozar Marković”, 13. i 14. juna 2013.
•Planirano je da se kroz prezentacije i demonstracije uživo
učesnici radionica upoznaju sa radom na OCRu, i NERu,
instrumentima za evaluaciju korišćenih tehnika, procenom
kvaliteta obrađenih sadržaja, kao i da razmene iskustva i
moguća rešenja na ovom polju delovanja.
•Budući da je proj učesnika radionice limitiran molimo
zainteresovane da se registruju na sledećoj adresi:
http://eurnews.eventbrite.com/
23. 23
Zaključak
• Budući da se novine nisu analitički obrađivale u bibliotekama, osim u
izuzetnim slučajevima, ovakav novi pristup pretraživavanju i
pronalaženju informacija umnogome će doprineti osvetljavanju
sadržaja novinskih kolekcija.
• Korisnici najrazličitijih interesovanja moći će da pronađu relevantne
informacije bilo da su one iz domena zabave, sporta ili svakodnevnog
života.
• Ova digitalna kolekcija će, nadamo se, postati i baza za naučno-
istraživački rad, proučavanje istorijskih i političkih prilika kao i kulturne
istorije određene regije.