際際滷

際際滷Share a Scribd company logo
Alexander (Sasha) Schwarzman, AGU Extreme Markup Languages 2006, Montr辿al, Canada Page 1 of 2
(sschwarzman@agu.org) August 7  11, 2006
Developing an STM DTD/Schema:
Strategic Design Choices
Alexander (Sasha) Schwarzman, AGU (sschwarzman@agu.org)
Extreme Markup Languages 2006, Montr辿al, Canada
August 7  11, 2006
Requirements
 Does an agreed upon Requirements document exist? (Get one!)
 What is your XMLs role?
 Archival copy-of-record (preserving scientific content)?
 Means of producing a pretty PDF?
 Both?
 Much more?
Architecture
 When during production is XML created? How is accuracy checked at each stage?
 Dummy empty elements for not-yet-assigned metadata plus use of configurable
production-stage-specific Business Rules Checker / Validator / QC Tool?
 Multiple DTDs: a separate one for each production stage?
 XML layering: What layer to use for enforcing editorial style and business
rules?
 DTD / parser?
 Validator / Schematron?
 Human editors?
 Revisable unit (what is the elemental unit?)
 Article?
 Issue?
 Arbitrary / cross-journal article collection?
 Volume / year?
 Journal?
 More than one of these?
Scope
 For what material?
 Current?
 Future-only?
 Legacy?
 All of the above or some combination?
 What is the extent of an article / book?
 Does it include supplementary material, like datasets and computable spreadsheets?
 Do you model extra stuff as just another structured section or is it something different?
 Special links (related links) section?
Developing an STM DTD / Schema: Strategic Design Choices (contd)
Alexander (Sasha) Schwarzman, AGU Extreme Markup Languages 2006, Montr辿al, Canada Page 2 of 2
(sschwarzman@agu.org) August 7  11, 2006
Modeling Language Choices
 Which constraint language is primary?
 DTD?
 XSD?
 RELAX NG?
 How many DTDs / schemas (purpose of each)?
 Authoring?
 Conversion / Transformation?
 Production?
 Archiving?
 Separate or shared: If your content includes journal article, newspaper article, book
chapter, book, case study, lecture notes, etc., should you use:
 Distinct DTD / schema for each?
 A large shared structure?
 A DTD / schema suite with common modules?
 Off-the-shelf, Altered-to-fit, or Bespoke? (T. Usdin)
 If altered, what public model?
 compatible with or informed by (subset or superset)?
 If bespoke, do you use any public models at all (for tables and math, for instance)?
Modeling Design Choices
 Prussian or Californian: prescriptive or descriptive? Flexible or enforcing?
 Generated or Explicit text? (depends on XMLs role)
 Preserve generation / rendition rules?
 Different approach for text and bibliographic references?
 How to model bibliographic references?
 Mixed content?
 Genre-specific strict models (with an escape hatch provided)?
 Tag abuse tolerance?
 How to reference non-XML components, e.g., figures, in XML?
 By an ID that maps to a set of multiple images in an archive?
 By naming a specific file from the set? Which one is the mother of all images?
 Which components to store / migrate? Is storing cheaper than thinking? (D. Lapeyre)
 How to model math?
 MathML presentation versus content (computation)?
 How to ensure the identicalness of the same math symbols in different browsers (same UNICODE
codepoints look differently in various browsers, e.g., epsilon and varepsilon)?
 LaTeX plus GIFs?
 How to ensure the identicalness of special characters that occur both in a displayed formula and
inline?
 Just GIFs?
 Just because you can, doesnt mean you should (D. Lapeyre)
 The lure of modeling for its own sake. Simplicity maintains better over time

More Related Content

Extreme-ML-2006-Poster-A-Schwarzman

  • 1. Alexander (Sasha) Schwarzman, AGU Extreme Markup Languages 2006, Montr辿al, Canada Page 1 of 2 (sschwarzman@agu.org) August 7 11, 2006 Developing an STM DTD/Schema: Strategic Design Choices Alexander (Sasha) Schwarzman, AGU (sschwarzman@agu.org) Extreme Markup Languages 2006, Montr辿al, Canada August 7 11, 2006 Requirements Does an agreed upon Requirements document exist? (Get one!) What is your XMLs role? Archival copy-of-record (preserving scientific content)? Means of producing a pretty PDF? Both? Much more? Architecture When during production is XML created? How is accuracy checked at each stage? Dummy empty elements for not-yet-assigned metadata plus use of configurable production-stage-specific Business Rules Checker / Validator / QC Tool? Multiple DTDs: a separate one for each production stage? XML layering: What layer to use for enforcing editorial style and business rules? DTD / parser? Validator / Schematron? Human editors? Revisable unit (what is the elemental unit?) Article? Issue? Arbitrary / cross-journal article collection? Volume / year? Journal? More than one of these? Scope For what material? Current? Future-only? Legacy? All of the above or some combination? What is the extent of an article / book? Does it include supplementary material, like datasets and computable spreadsheets? Do you model extra stuff as just another structured section or is it something different? Special links (related links) section?
  • 2. Developing an STM DTD / Schema: Strategic Design Choices (contd) Alexander (Sasha) Schwarzman, AGU Extreme Markup Languages 2006, Montr辿al, Canada Page 2 of 2 (sschwarzman@agu.org) August 7 11, 2006 Modeling Language Choices Which constraint language is primary? DTD? XSD? RELAX NG? How many DTDs / schemas (purpose of each)? Authoring? Conversion / Transformation? Production? Archiving? Separate or shared: If your content includes journal article, newspaper article, book chapter, book, case study, lecture notes, etc., should you use: Distinct DTD / schema for each? A large shared structure? A DTD / schema suite with common modules? Off-the-shelf, Altered-to-fit, or Bespoke? (T. Usdin) If altered, what public model? compatible with or informed by (subset or superset)? If bespoke, do you use any public models at all (for tables and math, for instance)? Modeling Design Choices Prussian or Californian: prescriptive or descriptive? Flexible or enforcing? Generated or Explicit text? (depends on XMLs role) Preserve generation / rendition rules? Different approach for text and bibliographic references? How to model bibliographic references? Mixed content? Genre-specific strict models (with an escape hatch provided)? Tag abuse tolerance? How to reference non-XML components, e.g., figures, in XML? By an ID that maps to a set of multiple images in an archive? By naming a specific file from the set? Which one is the mother of all images? Which components to store / migrate? Is storing cheaper than thinking? (D. Lapeyre) How to model math? MathML presentation versus content (computation)? How to ensure the identicalness of the same math symbols in different browsers (same UNICODE codepoints look differently in various browsers, e.g., epsilon and varepsilon)? LaTeX plus GIFs? How to ensure the identicalness of special characters that occur both in a displayed formula and inline? Just GIFs? Just because you can, doesnt mean you should (D. Lapeyre) The lure of modeling for its own sake. Simplicity maintains better over time