This document outlines key strategic design choices to consider when developing an XML DTD or schema for scientific, technical, and medical (STM) journal content. It discusses requirements, architecture like the production workflow and validation rules, scope such as content types and supplementary materials, modeling language options, and specific design considerations around structured text, references, figures, math, and balancing simplicity versus complexity.
1 of 2
Download to read offline
More Related Content
Extreme-ML-2006-Poster-A-Schwarzman
1. Alexander (Sasha) Schwarzman, AGU Extreme Markup Languages 2006, Montr辿al, Canada Page 1 of 2
(sschwarzman@agu.org) August 7 11, 2006
Developing an STM DTD/Schema:
Strategic Design Choices
Alexander (Sasha) Schwarzman, AGU (sschwarzman@agu.org)
Extreme Markup Languages 2006, Montr辿al, Canada
August 7 11, 2006
Requirements
Does an agreed upon Requirements document exist? (Get one!)
What is your XMLs role?
Archival copy-of-record (preserving scientific content)?
Means of producing a pretty PDF?
Both?
Much more?
Architecture
When during production is XML created? How is accuracy checked at each stage?
Dummy empty elements for not-yet-assigned metadata plus use of configurable
production-stage-specific Business Rules Checker / Validator / QC Tool?
Multiple DTDs: a separate one for each production stage?
XML layering: What layer to use for enforcing editorial style and business
rules?
DTD / parser?
Validator / Schematron?
Human editors?
Revisable unit (what is the elemental unit?)
Article?
Issue?
Arbitrary / cross-journal article collection?
Volume / year?
Journal?
More than one of these?
Scope
For what material?
Current?
Future-only?
Legacy?
All of the above or some combination?
What is the extent of an article / book?
Does it include supplementary material, like datasets and computable spreadsheets?
Do you model extra stuff as just another structured section or is it something different?
Special links (related links) section?
2. Developing an STM DTD / Schema: Strategic Design Choices (contd)
Alexander (Sasha) Schwarzman, AGU Extreme Markup Languages 2006, Montr辿al, Canada Page 2 of 2
(sschwarzman@agu.org) August 7 11, 2006
Modeling Language Choices
Which constraint language is primary?
DTD?
XSD?
RELAX NG?
How many DTDs / schemas (purpose of each)?
Authoring?
Conversion / Transformation?
Production?
Archiving?
Separate or shared: If your content includes journal article, newspaper article, book
chapter, book, case study, lecture notes, etc., should you use:
Distinct DTD / schema for each?
A large shared structure?
A DTD / schema suite with common modules?
Off-the-shelf, Altered-to-fit, or Bespoke? (T. Usdin)
If altered, what public model?
compatible with or informed by (subset or superset)?
If bespoke, do you use any public models at all (for tables and math, for instance)?
Modeling Design Choices
Prussian or Californian: prescriptive or descriptive? Flexible or enforcing?
Generated or Explicit text? (depends on XMLs role)
Preserve generation / rendition rules?
Different approach for text and bibliographic references?
How to model bibliographic references?
Mixed content?
Genre-specific strict models (with an escape hatch provided)?
Tag abuse tolerance?
How to reference non-XML components, e.g., figures, in XML?
By an ID that maps to a set of multiple images in an archive?
By naming a specific file from the set? Which one is the mother of all images?
Which components to store / migrate? Is storing cheaper than thinking? (D. Lapeyre)
How to model math?
MathML presentation versus content (computation)?
How to ensure the identicalness of the same math symbols in different browsers (same UNICODE
codepoints look differently in various browsers, e.g., epsilon and varepsilon)?
LaTeX plus GIFs?
How to ensure the identicalness of special characters that occur both in a displayed formula and
inline?
Just GIFs?
Just because you can, doesnt mean you should (D. Lapeyre)
The lure of modeling for its own sake. Simplicity maintains better over time