際際滷

際際滷Share a Scribd company logo
A Robust Open-source  GEDCOM Parser Dallan Quass  [email_address] Ryan Knight  [email_address]
What's a GEDCOM? 0 HEAD 1 SOUR PAF 2 NAME Personal Ancestral File 2 VERS 5.2.18.0 2 CORP The Church of Jesus Christ of Latter-day Saints 3 ADDR 50 East North Temple Street 4 CONT Salt Lake City, UT 84150 4 CONT USA 1 DEST Other 1 DATE 9 Aug 2006 2 TIME 19:57:47 1 FILE temp-paf.ged 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED 1 CHAR UTF-8 1 LANG English 1 SUBM @SUB1@ 0 @SUB1@ SUBM 1 NAME Dallan Quass 0 @I1@ INDI 1 NAME Dallan /Quass/ 2 SURN Quass 2 GIVN Dallan If this looks unfamiliar to you, you may not get a lot out of this talk On the other hand, the purpose of this project is to  handle this for you, so you can develop cool projects in genealogy and let this be unfamiliar to you!
Why is parsing GEDCOMs so hard?
Challenge #1  Character set detection 0 HEAD 1 SOUR PAF 2 NAME Personal Ancestral File 2 VERS 5.2.18.0 2 CORP The Church of Jesus Christ of Latter-day Saints 3 ADDR 50 East North Temple Street 4 CONT Salt Lake City, UT 84150 4 CONT USA 1 DEST Other 1 DATE 9 Aug 2006 2 TIME 19:57:47 1 FILE temp-paf.ged 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED 1 CHAR UTF-8 1 LANG English 1 SUBM @SUB1@ 0 @SUB1@ SUBM 1 NAME Dallan Quass 0 @I1@ INDI 1 NAME Dallan /Quass/ 2 SURN Quass 2 GIVN Dallan Should be easy, except...
Challenge #1  Character set detection GeneWeb  ASCII  ->  ANSI Geni.com  ANSEL  ->  UTF8 Geni.com  UNICODE  ->  UTF8 GENJ  UNICODE  ->  UTF8 All others  UNICODE  ->  UTF16 ASCII/MacOS Roman  ->  x-MacRoman
Challenge #1  Character set detection ANSEL
Challenge #2  Custom tags The GEDCOM specification hasn't been updated in a  LONG  time
Challenge #3  Misused tags
Shout out Tim Forsythe VGed - GEDCOM validator http://ancestorsnow.blogspot.com/ 2011/07/vged.html
ALIA 1 SEX M 1 ALIA /Ted/ 1 BIRT
SOUR 0 @N6@ NOTE 1 CONT adopted surname Termaat 2 SOUR @S9@
DATA 2 SOUR @S2149874917@ 3 DATA 4 DATE 11 Sep 1924 3 NOTE ... 3 DATA 4 TEXT ... 2 SOUR @S99@ 3 DATA 4 TEXT William Donald ... 4 DATE 1 Sep 1997 2 SOUR @S28@ 3 PAGE Indian Prarie... 3 QUAY 3 3 DATE 28 Feb 2005
Challenge #4  Unused tags Event Phone Event Agency Source Citation Event Type
Challenge #5  Names
GEDCOM  Standard ? The code  is more what you'd call  " guidelines "  than actual rules .
Two goals
Goal #1  Parse GEDCOMs into a  de facto  object model De Facto: In fact or in practice; in actual use  or existence, regardless of official  or legal status.   Wictionary.org Model should be straightforward, easy to use and understand
Goal #2  Round-trip From GEDCOM To Object Model Back to GEDCOM without information loss
Nirvana
There is no Nirvana
But we can get pretty close 94%
How is it done? ???
Object model
People
Extensions
GedML Originally by Michael Kay http://users.breathe.com/mhkay/gedml/ Enhanced by Lynn Monson http://lmonson.com/blog/?page_id=64 Further enhanced by Nathan Powell & Dallan Quass part of this project GEDCOM -> SAX events ANSEL reader & writer
Parser Written in Java ~1500 LoC for parser + ~4000 LoC for POJOs Handles SAX events emitted by GedML  Separate functions called to handle each tag Maintains a stack of model objects Attach unexpected tags to model objects as extensions Fast Easily extendible Tree parser also available
GEDCOM Export Visitor pattern 600 LoC
JSON GEDCOM  POJO  JSON  POJO  GEDCOM Simple model persistence using Google GSON
Further thoughts
Do we need a radically-different  data-exchange model for genealogy?
I don't know A new proposed object model could use this project to migrate existing GEDCOMs to the  de facto  model, then translate the  de facto  model objects to the new model
Do we need GEDCOM validation tools?
Definitely! A list of standard custom tags would also be pretty helpful
We live in the real world
Purpose of this project
Demonstration of Gedcom Server Demonstrates GEDCOM -> model -> json -> model -> GEDCOM Built with Play 1.2.4 -  A Java Web framework Allows for rapid development of web applications with a fully integrated stack  Deployed to Heroku  Cloud Application Platform Heroku allows one step deployment with git
Demonstration of Gedcom Server
Demonstration of Gedcom Server
Conclusion Images appearing on these slides are copyrighted by the contributors to  http://commons.wikimedia.org and are used under license Parsing GEDCOMs is hard it's like parsing HTML in the 1990's But getting it right is pretty important especially if you want to retain existing information  Open source algorithm is now freely available http://github.com/DallanQ/Gedcom simple object model with extensions, 94% round-trip Hopefully others will benefit from this effort
油

More Related Content

What's hot (19)

Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
John Collins
Devopsdays.pl 2015 krzysztof_debski (2)
Devopsdays.pl 2015 krzysztof_debski (2)Devopsdays.pl 2015 krzysztof_debski (2)
Devopsdays.pl 2015 krzysztof_debski (2)
Krzysztof Debski
Deployments in one click!
Deployments in one click!Deployments in one click!
Deployments in one click!
Manuel de la Pe単a Pe単a
Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)
John Collins
Jenkins User Conference - Preparing for Enterprise Continuous Delivery: 5 Cri...
Jenkins User Conference - Preparing for Enterprise Continuous Delivery: 5 Cri...Jenkins User Conference - Preparing for Enterprise Continuous Delivery: 5 Cri...
Jenkins User Conference - Preparing for Enterprise Continuous Delivery: 5 Cri...
XebiaLabs
12 tricks to avoid hackers breaks your CI / CD
12 tricks to avoid hackers breaks your  CI / CD12 tricks to avoid hackers breaks your  CI / CD
12 tricks to avoid hackers breaks your CI / CD
Daniel Garcia (a.k.a cr0hn)
33degree Krzysztof Debski - Let's build a solid base for a scale
33degree Krzysztof Debski - Let's build a solid base for a scale33degree Krzysztof Debski - Let's build a solid base for a scale
33degree Krzysztof Debski - Let's build a solid base for a scale
Krzysztof Debski
Getting started with Go - Florin Patan - Codemotion Rome 2017
Getting started with Go - Florin Patan - Codemotion Rome 2017Getting started with Go - Florin Patan - Codemotion Rome 2017
Getting started with Go - Florin Patan - Codemotion Rome 2017
Codemotion
Rooted con 2020 - from the heaven to hell in the CI - CD
Rooted con 2020 - from the heaven to hell in the CI - CDRooted con 2020 - from the heaven to hell in the CI - CD
Rooted con 2020 - from the heaven to hell in the CI - CD
Daniel Garcia (a.k.a cr0hn)
仂仆亳仂亳仆亞 仂弍仍舒仆仂亶 CI-亳亠仄 仆舒 仗亳仄亠亠 Jenkins / 仍亠从舒仆亟 从弍舒亠于 (HERE T...
仂仆亳仂亳仆亞 仂弍仍舒仆仂亶 CI-亳亠仄 仆舒 仗亳仄亠亠 Jenkins / 仍亠从舒仆亟 从弍舒亠于 (HERE T...仂仆亳仂亳仆亞 仂弍仍舒仆仂亶 CI-亳亠仄 仆舒 仗亳仄亠亠 Jenkins / 仍亠从舒仆亟 从弍舒亠于 (HERE T...
仂仆亳仂亳仆亞 仂弍仍舒仆仂亶 CI-亳亠仄 仆舒 仗亳仄亠亠 Jenkins / 仍亠从舒仆亟 从弍舒亠于 (HERE T...
Ontico
Golang and Domain Specific Languages - Lorenzo Fontana - Codemotion Rome 2017
Golang and Domain Specific Languages - Lorenzo Fontana - Codemotion Rome 2017Golang and Domain Specific Languages - Lorenzo Fontana - Codemotion Rome 2017
Golang and Domain Specific Languages - Lorenzo Fontana - Codemotion Rome 2017
Codemotion
Graalvm with Groovy and Kotlin - Greach 2019
Graalvm with Groovy and Kotlin - Greach 2019Graalvm with Groovy and Kotlin - Greach 2019
Graalvm with Groovy and Kotlin - Greach 2019
Alberto De vila Hern叩ndez
The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce
The Blameless Cloud: Bringing Actionable Retrospectives to SalesforceThe Blameless Cloud: Bringing Actionable Retrospectives to Salesforce
The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce
J. Paul Reed
Technical Product Owner or How to build technical backing for services
Technical Product Owner or How to build technical backing for servicesTechnical Product Owner or How to build technical backing for services
Technical Product Owner or How to build technical backing for services
Krzysztof Debski
[2020 git lab commit] continuous infrastructure
[2020 git lab commit] continuous infrastructure[2020 git lab commit] continuous infrastructure
[2020 git lab commit] continuous infrastructure
Rodrigo Stefani Domingues
On the development and distribution of R packages
On the development and distribution of R packagesOn the development and distribution of R packages
On the development and distribution of R packages
Tom Mens
TDD on android. Why and How? (Coding Serbia 2019)
TDD on android. Why and How? (Coding Serbia 2019)TDD on android. Why and How? (Coding Serbia 2019)
TDD on android. Why and How? (Coding Serbia 2019)
Danny Preussler
Developing Apps With React Native
Developing Apps With React NativeDeveloping Apps With React Native
Developing Apps With React Native
Alvaro Viebrantz
DevOps, Waffles, and Superheroes
DevOps, Waffles, and SuperheroesDevOps, Waffles, and Superheroes
DevOps, Waffles, and Superheroes
Jessica Deen
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
John Collins
Devopsdays.pl 2015 krzysztof_debski (2)
Devopsdays.pl 2015 krzysztof_debski (2)Devopsdays.pl 2015 krzysztof_debski (2)
Devopsdays.pl 2015 krzysztof_debski (2)
Krzysztof Debski
Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)
John Collins
Jenkins User Conference - Preparing for Enterprise Continuous Delivery: 5 Cri...
Jenkins User Conference - Preparing for Enterprise Continuous Delivery: 5 Cri...Jenkins User Conference - Preparing for Enterprise Continuous Delivery: 5 Cri...
Jenkins User Conference - Preparing for Enterprise Continuous Delivery: 5 Cri...
XebiaLabs
12 tricks to avoid hackers breaks your CI / CD
12 tricks to avoid hackers breaks your  CI / CD12 tricks to avoid hackers breaks your  CI / CD
12 tricks to avoid hackers breaks your CI / CD
Daniel Garcia (a.k.a cr0hn)
33degree Krzysztof Debski - Let's build a solid base for a scale
33degree Krzysztof Debski - Let's build a solid base for a scale33degree Krzysztof Debski - Let's build a solid base for a scale
33degree Krzysztof Debski - Let's build a solid base for a scale
Krzysztof Debski
Getting started with Go - Florin Patan - Codemotion Rome 2017
Getting started with Go - Florin Patan - Codemotion Rome 2017Getting started with Go - Florin Patan - Codemotion Rome 2017
Getting started with Go - Florin Patan - Codemotion Rome 2017
Codemotion
Rooted con 2020 - from the heaven to hell in the CI - CD
Rooted con 2020 - from the heaven to hell in the CI - CDRooted con 2020 - from the heaven to hell in the CI - CD
Rooted con 2020 - from the heaven to hell in the CI - CD
Daniel Garcia (a.k.a cr0hn)
仂仆亳仂亳仆亞 仂弍仍舒仆仂亶 CI-亳亠仄 仆舒 仗亳仄亠亠 Jenkins / 仍亠从舒仆亟 从弍舒亠于 (HERE T...
仂仆亳仂亳仆亞 仂弍仍舒仆仂亶 CI-亳亠仄 仆舒 仗亳仄亠亠 Jenkins / 仍亠从舒仆亟 从弍舒亠于 (HERE T...仂仆亳仂亳仆亞 仂弍仍舒仆仂亶 CI-亳亠仄 仆舒 仗亳仄亠亠 Jenkins / 仍亠从舒仆亟 从弍舒亠于 (HERE T...
仂仆亳仂亳仆亞 仂弍仍舒仆仂亶 CI-亳亠仄 仆舒 仗亳仄亠亠 Jenkins / 仍亠从舒仆亟 从弍舒亠于 (HERE T...
Ontico
Golang and Domain Specific Languages - Lorenzo Fontana - Codemotion Rome 2017
Golang and Domain Specific Languages - Lorenzo Fontana - Codemotion Rome 2017Golang and Domain Specific Languages - Lorenzo Fontana - Codemotion Rome 2017
Golang and Domain Specific Languages - Lorenzo Fontana - Codemotion Rome 2017
Codemotion
The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce
The Blameless Cloud: Bringing Actionable Retrospectives to SalesforceThe Blameless Cloud: Bringing Actionable Retrospectives to Salesforce
The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce
J. Paul Reed
Technical Product Owner or How to build technical backing for services
Technical Product Owner or How to build technical backing for servicesTechnical Product Owner or How to build technical backing for services
Technical Product Owner or How to build technical backing for services
Krzysztof Debski
[2020 git lab commit] continuous infrastructure
[2020 git lab commit] continuous infrastructure[2020 git lab commit] continuous infrastructure
[2020 git lab commit] continuous infrastructure
Rodrigo Stefani Domingues
On the development and distribution of R packages
On the development and distribution of R packagesOn the development and distribution of R packages
On the development and distribution of R packages
Tom Mens
TDD on android. Why and How? (Coding Serbia 2019)
TDD on android. Why and How? (Coding Serbia 2019)TDD on android. Why and How? (Coding Serbia 2019)
TDD on android. Why and How? (Coding Serbia 2019)
Danny Preussler
Developing Apps With React Native
Developing Apps With React NativeDeveloping Apps With React Native
Developing Apps With React Native
Alvaro Viebrantz
DevOps, Waffles, and Superheroes
DevOps, Waffles, and SuperheroesDevOps, Waffles, and Superheroes
DevOps, Waffles, and Superheroes
Jessica Deen

Viewers also liked (19)

Using WeRelate.org (2009)
Using WeRelate.org (2009)Using WeRelate.org (2009)
Using WeRelate.org (2009)
Dallan Quass
FamilySearch Reference Client
FamilySearch Reference ClientFamilySearch Reference Client
FamilySearch Reference Client
Dallan Quass
Why share your genealogy content on WeRelate.org (2009)
Why share your genealogy content on WeRelate.org (2009)Why share your genealogy content on WeRelate.org (2009)
Why share your genealogy content on WeRelate.org (2009)
Dallan Quass
留僚僚竜 旅虜留溜凌 虜留旅 侶慮旅虜流 侶 隆旅留凌略. (留凌 了竜両. 略留)
留僚僚竜 旅虜留溜凌 虜留旅 侶慮旅虜流   侶 隆旅留凌略. (留凌 了竜両. 略留)留僚僚竜 旅虜留溜凌 虜留旅 侶慮旅虜流   侶 隆旅留凌略. (留凌 了竜両. 略留)
留僚僚竜 旅虜留溜凌 虜留旅 侶慮旅虜流 侶 隆旅留凌略. (留凌 了竜両. 略留)
裡亮留略粒隆留 陸留溜隆凌
厦旅哮隆旅凌
厦旅哮隆旅凌厦旅哮隆旅凌
厦旅哮隆旅凌
裡亮留略粒隆留 陸留溜隆凌
凌留溜留 凌 竜旅硫略了了凌僚凌: 凌 亮凌 凌 竜凌 虜留旅 凌旅 僚亮凌旅 僚 留僚慮マ僚.
凌留溜留 凌 竜旅硫略了了凌僚凌: 凌 亮凌 凌 竜凌 虜留旅 凌旅 僚亮凌旅 僚 留僚慮マ僚.凌留溜留 凌 竜旅硫略了了凌僚凌: 凌 亮凌 凌 竜凌 虜留旅 凌旅 僚亮凌旅 僚 留僚慮マ僚.
凌留溜留 凌 竜旅硫略了了凌僚凌: 凌 亮凌 凌 竜凌 虜留旅 凌旅 僚亮凌旅 僚 留僚慮マ僚.
裡亮留略粒隆留 陸留溜隆凌
凌 虜留留僚留了旅虜凌 凌凌
凌 虜留留僚留了旅虜凌 凌凌凌 虜留留僚留了旅虜凌 凌凌
凌 虜留留僚留了旅虜凌 凌凌
裡亮留略粒隆留 陸留溜隆凌
凌了旅虜凌 竜虜凌硫旅亮凌
凌了旅虜凌 竜虜凌硫旅亮凌凌了旅虜凌 竜虜凌硫旅亮凌
凌了旅虜凌 竜虜凌硫旅亮凌
裡亮留略粒隆留 陸留溜隆凌
厦侶虜竜亮留留
厦侶虜竜亮留留厦侶虜竜亮留留
厦侶虜竜亮留留
裡亮留略粒隆留 陸留溜隆凌
裡僚略僚侶侶 凌 律旅旅留僚旅亮凌 亮竜 凌僚 了了侶僚旅亮
裡僚略僚侶侶 凌 律旅旅留僚旅亮凌 亮竜 凌僚 了了侶僚旅亮裡僚略僚侶侶 凌 律旅旅留僚旅亮凌 亮竜 凌僚 了了侶僚旅亮
裡僚略僚侶侶 凌 律旅旅留僚旅亮凌 亮竜 凌僚 了了侶僚旅亮
裡亮留略粒隆留 陸留溜隆凌
凌僚略虜旅凌- 虜略慮旅凌 ホ捨塾刃 ( 裡亮留略粒隆留 陸留溜隆凌, 慮竜凌了粒凌 2凌 竜留了凌僚溜虜侶)
凌僚略虜旅凌- 虜略慮旅凌 ホ捨塾刃 ( 裡亮留略粒隆留 陸留溜隆凌, 慮竜凌了粒凌 2凌  竜留了凌僚溜虜侶)凌僚略虜旅凌- 虜略慮旅凌 ホ捨塾刃 ( 裡亮留略粒隆留 陸留溜隆凌, 慮竜凌了粒凌 2凌  竜留了凌僚溜虜侶)
凌僚略虜旅凌- 虜略慮旅凌 ホ捨塾刃 ( 裡亮留略粒隆留 陸留溜隆凌, 慮竜凌了粒凌 2凌 竜留了凌僚溜虜侶)
裡亮留略粒隆留 陸留溜隆凌
竜僚留旅凌 虜留 留慮了侶旅亮凌 虜留旅 硫旅留 留 裡亮留略粒隆留 陸留溜隆凌
竜僚留旅凌 虜留   留慮了侶旅亮凌 虜留旅 硫旅留 留  裡亮留略粒隆留 陸留溜隆凌竜僚留旅凌 虜留   留慮了侶旅亮凌 虜留旅 硫旅留 留  裡亮留略粒隆留 陸留溜隆凌
竜僚留旅凌 虜留 留慮了侶旅亮凌 虜留旅 硫旅留 留 裡亮留略粒隆留 陸留溜隆凌
裡亮留略粒隆留 陸留溜隆凌
虜慮竜侶 虜竜旅亮侶了溜僚: 凌凌略 僚 溜僚 凌 留粒ホ塾杵 凌 慮僚凌.
虜慮竜侶 虜竜旅亮侶了溜僚:  凌凌略 僚 溜僚 凌 留粒ホ塾杵 凌 慮僚凌.虜慮竜侶 虜竜旅亮侶了溜僚:  凌凌略 僚 溜僚 凌 留粒ホ塾杵 凌 慮僚凌.
虜慮竜侶 虜竜旅亮侶了溜僚: 凌凌略 僚 溜僚 凌 留粒ホ塾杵 凌 慮僚凌.
裡亮留略粒隆留 陸留溜隆凌
硫留略亮 旅了凌両竜僚竜溜 凌 竜 侶 虜侶僚流 凌
 硫留略亮 旅了凌両竜僚竜溜 凌 竜 侶 虜侶僚流 凌 硫留略亮 旅了凌両竜僚竜溜 凌 竜 侶 虜侶僚流 凌
硫留略亮 旅了凌両竜僚竜溜 凌 竜 侶 虜侶僚流 凌
裡亮留略粒隆留 陸留溜隆凌
離裡 里裡 裡
 離裡 里裡 裡 離裡 里裡 裡
離裡 里裡 裡
裡亮留略粒隆留 陸留溜隆凌
Produktmanager PeterProduktmanager Peter
Produktmanager Peter
alconsult
旅 竜旅僚留旅 虜留虜凌 ( 裡亮留略粒隆留 陸留溜隆凌, 慮竜凌了粒凌 2凌 竜/僚溜虜侶)
旅 竜旅僚留旅 虜留虜凌 ( 裡亮留略粒隆留 陸留溜隆凌, 慮竜凌了粒凌 2凌  竜/僚溜虜侶)旅 竜旅僚留旅 虜留虜凌 ( 裡亮留略粒隆留 陸留溜隆凌, 慮竜凌了粒凌 2凌  竜/僚溜虜侶)
旅 竜旅僚留旅 虜留虜凌 ( 裡亮留略粒隆留 陸留溜隆凌, 慮竜凌了粒凌 2凌 竜/僚溜虜侶)
裡亮留略粒隆留 陸留溜隆凌
17 . 凌旅 粒僚留旅虜竜.
17 . 凌旅 粒僚留旅虜竜.17 . 凌旅 粒僚留旅虜竜.
17 . 凌旅 粒僚留旅虜竜.
裡亮留略粒隆留 陸留溜隆凌
竜粒略了侶 硫隆凌亮略
竜粒略了侶 硫隆凌亮略竜粒略了侶 硫隆凌亮略
竜粒略了侶 硫隆凌亮略
裡亮留略粒隆留 陸留溜隆凌
Using WeRelate.org (2009)
Using WeRelate.org (2009)Using WeRelate.org (2009)
Using WeRelate.org (2009)
Dallan Quass
FamilySearch Reference Client
FamilySearch Reference ClientFamilySearch Reference Client
FamilySearch Reference Client
Dallan Quass
Why share your genealogy content on WeRelate.org (2009)
Why share your genealogy content on WeRelate.org (2009)Why share your genealogy content on WeRelate.org (2009)
Why share your genealogy content on WeRelate.org (2009)
Dallan Quass
留僚僚竜 旅虜留溜凌 虜留旅 侶慮旅虜流 侶 隆旅留凌略. (留凌 了竜両. 略留)
留僚僚竜 旅虜留溜凌 虜留旅 侶慮旅虜流   侶 隆旅留凌略. (留凌 了竜両. 略留)留僚僚竜 旅虜留溜凌 虜留旅 侶慮旅虜流   侶 隆旅留凌略. (留凌 了竜両. 略留)
留僚僚竜 旅虜留溜凌 虜留旅 侶慮旅虜流 侶 隆旅留凌略. (留凌 了竜両. 略留)
裡亮留略粒隆留 陸留溜隆凌
凌留溜留 凌 竜旅硫略了了凌僚凌: 凌 亮凌 凌 竜凌 虜留旅 凌旅 僚亮凌旅 僚 留僚慮マ僚.
凌留溜留 凌 竜旅硫略了了凌僚凌: 凌 亮凌 凌 竜凌 虜留旅 凌旅 僚亮凌旅 僚 留僚慮マ僚.凌留溜留 凌 竜旅硫略了了凌僚凌: 凌 亮凌 凌 竜凌 虜留旅 凌旅 僚亮凌旅 僚 留僚慮マ僚.
凌留溜留 凌 竜旅硫略了了凌僚凌: 凌 亮凌 凌 竜凌 虜留旅 凌旅 僚亮凌旅 僚 留僚慮マ僚.
裡亮留略粒隆留 陸留溜隆凌
裡僚略僚侶侶 凌 律旅旅留僚旅亮凌 亮竜 凌僚 了了侶僚旅亮
裡僚略僚侶侶 凌 律旅旅留僚旅亮凌 亮竜 凌僚 了了侶僚旅亮裡僚略僚侶侶 凌 律旅旅留僚旅亮凌 亮竜 凌僚 了了侶僚旅亮
裡僚略僚侶侶 凌 律旅旅留僚旅亮凌 亮竜 凌僚 了了侶僚旅亮
裡亮留略粒隆留 陸留溜隆凌
凌僚略虜旅凌- 虜略慮旅凌 ホ捨塾刃 ( 裡亮留略粒隆留 陸留溜隆凌, 慮竜凌了粒凌 2凌 竜留了凌僚溜虜侶)
凌僚略虜旅凌- 虜略慮旅凌 ホ捨塾刃 ( 裡亮留略粒隆留 陸留溜隆凌, 慮竜凌了粒凌 2凌  竜留了凌僚溜虜侶)凌僚略虜旅凌- 虜略慮旅凌 ホ捨塾刃 ( 裡亮留略粒隆留 陸留溜隆凌, 慮竜凌了粒凌 2凌  竜留了凌僚溜虜侶)
凌僚略虜旅凌- 虜略慮旅凌 ホ捨塾刃 ( 裡亮留略粒隆留 陸留溜隆凌, 慮竜凌了粒凌 2凌 竜留了凌僚溜虜侶)
裡亮留略粒隆留 陸留溜隆凌
竜僚留旅凌 虜留 留慮了侶旅亮凌 虜留旅 硫旅留 留 裡亮留略粒隆留 陸留溜隆凌
竜僚留旅凌 虜留   留慮了侶旅亮凌 虜留旅 硫旅留 留  裡亮留略粒隆留 陸留溜隆凌竜僚留旅凌 虜留   留慮了侶旅亮凌 虜留旅 硫旅留 留  裡亮留略粒隆留 陸留溜隆凌
竜僚留旅凌 虜留 留慮了侶旅亮凌 虜留旅 硫旅留 留 裡亮留略粒隆留 陸留溜隆凌
裡亮留略粒隆留 陸留溜隆凌
虜慮竜侶 虜竜旅亮侶了溜僚: 凌凌略 僚 溜僚 凌 留粒ホ塾杵 凌 慮僚凌.
虜慮竜侶 虜竜旅亮侶了溜僚:  凌凌略 僚 溜僚 凌 留粒ホ塾杵 凌 慮僚凌.虜慮竜侶 虜竜旅亮侶了溜僚:  凌凌略 僚 溜僚 凌 留粒ホ塾杵 凌 慮僚凌.
虜慮竜侶 虜竜旅亮侶了溜僚: 凌凌略 僚 溜僚 凌 留粒ホ塾杵 凌 慮僚凌.
裡亮留略粒隆留 陸留溜隆凌
硫留略亮 旅了凌両竜僚竜溜 凌 竜 侶 虜侶僚流 凌
 硫留略亮 旅了凌両竜僚竜溜 凌 竜 侶 虜侶僚流 凌 硫留略亮 旅了凌両竜僚竜溜 凌 竜 侶 虜侶僚流 凌
硫留略亮 旅了凌両竜僚竜溜 凌 竜 侶 虜侶僚流 凌
裡亮留略粒隆留 陸留溜隆凌
Produktmanager PeterProduktmanager Peter
Produktmanager Peter
alconsult
旅 竜旅僚留旅 虜留虜凌 ( 裡亮留略粒隆留 陸留溜隆凌, 慮竜凌了粒凌 2凌 竜/僚溜虜侶)
旅 竜旅僚留旅 虜留虜凌 ( 裡亮留略粒隆留 陸留溜隆凌, 慮竜凌了粒凌 2凌  竜/僚溜虜侶)旅 竜旅僚留旅 虜留虜凌 ( 裡亮留略粒隆留 陸留溜隆凌, 慮竜凌了粒凌 2凌  竜/僚溜虜侶)
旅 竜旅僚留旅 虜留虜凌 ( 裡亮留略粒隆留 陸留溜隆凌, 慮竜凌了粒凌 2凌 竜/僚溜虜侶)
裡亮留略粒隆留 陸留溜隆凌

Similar to A Robust Open-source GEDCOM Parser (20)

Front-End Tooling
Front-End ToolingFront-End Tooling
Front-End Tooling
Houssem Yahiaoui
Usability in the GeoWeb
Usability in the GeoWebUsability in the GeoWeb
Usability in the GeoWeb
Dave Bouwman
DRUG - RDSTK Talk
DRUG - RDSTK TalkDRUG - RDSTK Talk
DRUG - RDSTK Talk
rtelmore
Styleguide-Driven Development: The New Web Development
Styleguide-Driven Development: The New Web DevelopmentStyleguide-Driven Development: The New Web Development
Styleguide-Driven Development: The New Web Development
John Albin Wilkins
Introduction to Go
Introduction to GoIntroduction to Go
Introduction to Go
Simon Hewitt
Supercharging project health check
Supercharging project health checkSupercharging project health check
Supercharging project health check
David Horvath
Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)
Anand Sampat
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Andreas Grabner
ICONUK 2015 - Gradle Up!
ICONUK 2015 - Gradle Up!ICONUK 2015 - Gradle Up!
ICONUK 2015 - Gradle Up!
Ren辿 Winkelmeyer
Pain Driven Development by Alexandr Sugak
Pain Driven Development by Alexandr SugakPain Driven Development by Alexandr Sugak
Pain Driven Development by Alexandr Sugak
Sigma Software
Belgium jenkins-meetup-job-jungle-0.1
Belgium jenkins-meetup-job-jungle-0.1Belgium jenkins-meetup-job-jungle-0.1
Belgium jenkins-meetup-job-jungle-0.1
Damien Coraboeuf
JDD2015: Forgetting Java: Why Java Should Die in Flames and Take its Develope...
JDD2015: Forgetting Java: Why Java Should Die in Flames and Take its Develope...JDD2015: Forgetting Java: Why Java Should Die in Flames and Take its Develope...
JDD2015: Forgetting Java: Why Java Should Die in Flames and Take its Develope...
PROIDEA
Why Gradle?
Why Gradle?Why Gradle?
Why Gradle?
Peter Ledbrook
Supercharge your Code to get optimal Database Performance
Supercharge your Code to get optimal Database PerformanceSupercharge your Code to get optimal Database Performance
Supercharge your Code to get optimal Database Performance
gvenzl
The Duck Teaches Learn to debug from the masters. Local to production- kill ...
The Duck Teaches  Learn to debug from the masters. Local to production- kill ...The Duck Teaches  Learn to debug from the masters. Local to production- kill ...
The Duck Teaches Learn to debug from the masters. Local to production- kill ...
ShaiAlmog1
Skip Staging! Test Docker, Helm, and Kubernetes Apps like a Pro
Skip Staging! Test Docker, Helm, and Kubernetes Apps like a ProSkip Staging! Test Docker, Helm, and Kubernetes Apps like a Pro
Skip Staging! Test Docker, Helm, and Kubernetes Apps like a Pro
Codefresh
How to Set Up Esri Geoportal Server 1.2.2 on Windows
How to Set Up Esri Geoportal Server 1.2.2 on WindowsHow to Set Up Esri Geoportal Server 1.2.2 on Windows
How to Set Up Esri Geoportal Server 1.2.2 on Windows
Esri
10 Ways To Improve Your Code
10 Ways To Improve Your Code10 Ways To Improve Your Code
10 Ways To Improve Your Code
ConSanFrancisco123
Into The Box 2018 Ortus Keynote
Into The Box 2018 Ortus KeynoteInto The Box 2018 Ortus Keynote
Into The Box 2018 Ortus Keynote
Ortus Solutions, Corp
Document Object Model
Document Object ModelDocument Object Model
Document Object Model
chomas kandar
Usability in the GeoWeb
Usability in the GeoWebUsability in the GeoWeb
Usability in the GeoWeb
Dave Bouwman
DRUG - RDSTK Talk
DRUG - RDSTK TalkDRUG - RDSTK Talk
DRUG - RDSTK Talk
rtelmore
Styleguide-Driven Development: The New Web Development
Styleguide-Driven Development: The New Web DevelopmentStyleguide-Driven Development: The New Web Development
Styleguide-Driven Development: The New Web Development
John Albin Wilkins
Introduction to Go
Introduction to GoIntroduction to Go
Introduction to Go
Simon Hewitt
Supercharging project health check
Supercharging project health checkSupercharging project health check
Supercharging project health check
David Horvath
Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)
Anand Sampat
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Andreas Grabner
Pain Driven Development by Alexandr Sugak
Pain Driven Development by Alexandr SugakPain Driven Development by Alexandr Sugak
Pain Driven Development by Alexandr Sugak
Sigma Software
Belgium jenkins-meetup-job-jungle-0.1
Belgium jenkins-meetup-job-jungle-0.1Belgium jenkins-meetup-job-jungle-0.1
Belgium jenkins-meetup-job-jungle-0.1
Damien Coraboeuf
JDD2015: Forgetting Java: Why Java Should Die in Flames and Take its Develope...
JDD2015: Forgetting Java: Why Java Should Die in Flames and Take its Develope...JDD2015: Forgetting Java: Why Java Should Die in Flames and Take its Develope...
JDD2015: Forgetting Java: Why Java Should Die in Flames and Take its Develope...
PROIDEA
Supercharge your Code to get optimal Database Performance
Supercharge your Code to get optimal Database PerformanceSupercharge your Code to get optimal Database Performance
Supercharge your Code to get optimal Database Performance
gvenzl
The Duck Teaches Learn to debug from the masters. Local to production- kill ...
The Duck Teaches  Learn to debug from the masters. Local to production- kill ...The Duck Teaches  Learn to debug from the masters. Local to production- kill ...
The Duck Teaches Learn to debug from the masters. Local to production- kill ...
ShaiAlmog1
Skip Staging! Test Docker, Helm, and Kubernetes Apps like a Pro
Skip Staging! Test Docker, Helm, and Kubernetes Apps like a ProSkip Staging! Test Docker, Helm, and Kubernetes Apps like a Pro
Skip Staging! Test Docker, Helm, and Kubernetes Apps like a Pro
Codefresh
How to Set Up Esri Geoportal Server 1.2.2 on Windows
How to Set Up Esri Geoportal Server 1.2.2 on WindowsHow to Set Up Esri Geoportal Server 1.2.2 on Windows
How to Set Up Esri Geoportal Server 1.2.2 on Windows
Esri
10 Ways To Improve Your Code
10 Ways To Improve Your Code10 Ways To Improve Your Code
10 Ways To Improve Your Code
ConSanFrancisco123
Document Object Model
Document Object ModelDocument Object Model
Document Object Model
chomas kandar

A Robust Open-source GEDCOM Parser

  • 1. A Robust Open-source GEDCOM Parser Dallan Quass [email_address] Ryan Knight [email_address]
  • 2. What's a GEDCOM? 0 HEAD 1 SOUR PAF 2 NAME Personal Ancestral File 2 VERS 5.2.18.0 2 CORP The Church of Jesus Christ of Latter-day Saints 3 ADDR 50 East North Temple Street 4 CONT Salt Lake City, UT 84150 4 CONT USA 1 DEST Other 1 DATE 9 Aug 2006 2 TIME 19:57:47 1 FILE temp-paf.ged 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED 1 CHAR UTF-8 1 LANG English 1 SUBM @SUB1@ 0 @SUB1@ SUBM 1 NAME Dallan Quass 0 @I1@ INDI 1 NAME Dallan /Quass/ 2 SURN Quass 2 GIVN Dallan If this looks unfamiliar to you, you may not get a lot out of this talk On the other hand, the purpose of this project is to handle this for you, so you can develop cool projects in genealogy and let this be unfamiliar to you!
  • 3. Why is parsing GEDCOMs so hard?
  • 4. Challenge #1 Character set detection 0 HEAD 1 SOUR PAF 2 NAME Personal Ancestral File 2 VERS 5.2.18.0 2 CORP The Church of Jesus Christ of Latter-day Saints 3 ADDR 50 East North Temple Street 4 CONT Salt Lake City, UT 84150 4 CONT USA 1 DEST Other 1 DATE 9 Aug 2006 2 TIME 19:57:47 1 FILE temp-paf.ged 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED 1 CHAR UTF-8 1 LANG English 1 SUBM @SUB1@ 0 @SUB1@ SUBM 1 NAME Dallan Quass 0 @I1@ INDI 1 NAME Dallan /Quass/ 2 SURN Quass 2 GIVN Dallan Should be easy, except...
  • 5. Challenge #1 Character set detection GeneWeb ASCII -> ANSI Geni.com ANSEL -> UTF8 Geni.com UNICODE -> UTF8 GENJ UNICODE -> UTF8 All others UNICODE -> UTF16 ASCII/MacOS Roman -> x-MacRoman
  • 6. Challenge #1 Character set detection ANSEL
  • 7. Challenge #2 Custom tags The GEDCOM specification hasn't been updated in a LONG time
  • 8. Challenge #3 Misused tags
  • 9. Shout out Tim Forsythe VGed - GEDCOM validator http://ancestorsnow.blogspot.com/ 2011/07/vged.html
  • 10. ALIA 1 SEX M 1 ALIA /Ted/ 1 BIRT
  • 11. SOUR 0 @N6@ NOTE 1 CONT adopted surname Termaat 2 SOUR @S9@
  • 12. DATA 2 SOUR @S2149874917@ 3 DATA 4 DATE 11 Sep 1924 3 NOTE ... 3 DATA 4 TEXT ... 2 SOUR @S99@ 3 DATA 4 TEXT William Donald ... 4 DATE 1 Sep 1997 2 SOUR @S28@ 3 PAGE Indian Prarie... 3 QUAY 3 3 DATE 28 Feb 2005
  • 13. Challenge #4 Unused tags Event Phone Event Agency Source Citation Event Type
  • 14. Challenge #5 Names
  • 15. GEDCOM Standard ? The code is more what you'd call " guidelines " than actual rules .
  • 17. Goal #1 Parse GEDCOMs into a de facto object model De Facto: In fact or in practice; in actual use or existence, regardless of official or legal status. Wictionary.org Model should be straightforward, easy to use and understand
  • 18. Goal #2 Round-trip From GEDCOM To Object Model Back to GEDCOM without information loss
  • 20. There is no Nirvana
  • 21. But we can get pretty close 94%
  • 22. How is it done? ???
  • 26. GedML Originally by Michael Kay http://users.breathe.com/mhkay/gedml/ Enhanced by Lynn Monson http://lmonson.com/blog/?page_id=64 Further enhanced by Nathan Powell & Dallan Quass part of this project GEDCOM -> SAX events ANSEL reader & writer
  • 27. Parser Written in Java ~1500 LoC for parser + ~4000 LoC for POJOs Handles SAX events emitted by GedML Separate functions called to handle each tag Maintains a stack of model objects Attach unexpected tags to model objects as extensions Fast Easily extendible Tree parser also available
  • 28. GEDCOM Export Visitor pattern 600 LoC
  • 29. JSON GEDCOM POJO JSON POJO GEDCOM Simple model persistence using Google GSON
  • 31. Do we need a radically-different data-exchange model for genealogy?
  • 32. I don't know A new proposed object model could use this project to migrate existing GEDCOMs to the de facto model, then translate the de facto model objects to the new model
  • 33. Do we need GEDCOM validation tools?
  • 34. Definitely! A list of standard custom tags would also be pretty helpful
  • 35. We live in the real world
  • 36. Purpose of this project
  • 37. Demonstration of Gedcom Server Demonstrates GEDCOM -> model -> json -> model -> GEDCOM Built with Play 1.2.4 - A Java Web framework Allows for rapid development of web applications with a fully integrated stack Deployed to Heroku Cloud Application Platform Heroku allows one step deployment with git
  • 40. Conclusion Images appearing on these slides are copyrighted by the contributors to http://commons.wikimedia.org and are used under license Parsing GEDCOMs is hard it's like parsing HTML in the 1990's But getting it right is pretty important especially if you want to retain existing information Open source algorithm is now freely available http://github.com/DallanQ/Gedcom simple object model with extensions, 94% round-trip Hopefully others will benefit from this effort
  • 41.