ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Facilitating Standardization and Exchange of Array Design  ADF MAGE-ML Tool Pierre Marguerite ¨C Friday Seminar EBI ¨C Microarray Informatics Team 15 October 2004
ADF MAGE-ML Tool Application stand-alone plateform independant Supports: Simple/Complex microarray layout Differents microarray applications gene_expression snp_detection comparative_genomic_hybridization binding_site_identification Others (minimal) Respects Good practices
conversion tool
MAGE-ML (MAGE-OM) Description Biosequence Array Array Design DesignElement DesignElement
MAGE-ML (next)
ADF (previous)
Array Design File adh adr adc Header contacts Technical Information
Array Design File adr adc Reporters Features Feature  /Reporter
Array Design File Composite Characteristics Map to reporters
ADF version differences 3 parts (files) instead of 1 As Workbook or text files  No Reporter Identifier item  No Reporter Group [role] item New Chromosome item  New Chromosome_band item  New Species item
2 mandatory steps : Validation Conversion
Validation File format validation: File content validation Validation of controlled vocabulary  MGED ontology terms  Approved Databases (Tags, Accession numbers) Automatic curation   (when possible)
Validation two levels of checking: Relaxed Strict two execution modes : A  complete mode A  step-by-step mode Error Log : for correction
Checking lists (header) File/Data structure checklist: Header file is a tab-delimited-file Item names are correct or can be identified if an item is not identified, it is skipped. All mandatory items are present in the header Data/file content checklist Correct field value format Possible value types: "Integer" "Free Text" "Controlled vocabulary" "MGED ontology term" "DatabaseEntry" "Sequence" "Species" Check single multiple value
Checking lists  (feature reporter) Feature Reporter file File/Data structure checklist: Header File is correct (structure and data ) FeatureReporter file is a tab-delimited-file Header item names are correct (unknown items are skipped) All mandatory items are present. item cardinalities and dependences are correct. Database tags are approved and database accession numbers are correct Item order is correct (Optional, do not fail the checking) Field dependences are correct Data/file content checklist FeatureReporter file structure must be correct Mandatory Field are present. Field cardinalities and field value multiplicities must be correct. Field values are in a mandatory format Database tags are approved by ArrayExpress and are supplied in lower caseand between square brackets Database ID are correct Ontology terms are correct (MGED ontology) Sequences are correct following the associated polymer type (DNA, RNA, protein): Integer field values are correct Duplicate features must not exist Duplicate Reporter (equal names) must have the characteristics.
Checking lists  (composite) CompositeSequence File/Data structure checklist: Feature Reporter file must be correct (structure and data) CompositeSequence file is a tab-delimited-file Header item names are correct. (Unknown items are skipped) All mandatory items are present. Header item cardinalities and dependences are correct Column order is correct (non mandatory) Data/file content checklist Composite file structure must be correct All mandatory fields are present. Field cardinalities are correct Field values are in expected format. Field multiplicity is correct (same as Feature/Reporter) Names in map are reporter or composite sequence names No duplicate CompositeSequences (same names)
Checking lists Header item names are correct All mandatory items are present All mandatory fields are present. No Duplicate features  Duplicate Reporter (equal names) must have the characteristics. No duplicate CompositeSequences (same names) Names in map are reporter or composite sequence names
?
?
MGED Ontology / DAML+OIL
Approved Databases
User  modes
Implementation -  technical  choices -MAGE-stk JaxB Configuration (default parameters) Performance: 4000 features : ~10 minutes
Installer - izpack http://www.izforge.com/izpack/
http://www.ebi.ac.uk/adf http://www.ebi.ac.uk/adf/
?

More Related Content

Similar to Friday Seminar 15 10 2004 (20)

Well Formed XML
Well Formed XMLWell Formed XML
Well Formed XML
Randy Riness @ South Puget Sound Community College
?
Reconciliation Tool
Reconciliation ToolReconciliation Tool
Reconciliation Tool
Achal Kagwad
?
Flink Forward San Francisco 2019: TensorFlow Extended: An end-to-end machine ...
Flink Forward San Francisco 2019: TensorFlow Extended: An end-to-end machine ...Flink Forward San Francisco 2019: TensorFlow Extended: An end-to-end machine ...
Flink Forward San Francisco 2019: TensorFlow Extended: An end-to-end machine ...
Flink Forward
?
New
NewNew
New
chew kok meng
?
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
UKOLN (dev), University of Bath
?
Database fundamentals
Database fundamentalsDatabase fundamentals
Database fundamentals
Then Murugeshwari
?
ASP.NET 10 - Data Controls
ASP.NET 10 - Data ControlsASP.NET 10 - Data Controls
ASP.NET 10 - Data Controls
Randy Connolly
?
4)12th_L-1_PYTHON-PANDAS-I.pptx
4)12th_L-1_PYTHON-PANDAS-I.pptx4)12th_L-1_PYTHON-PANDAS-I.pptx
4)12th_L-1_PYTHON-PANDAS-I.pptx
AdityavardhanSingh15
?
Chen test paper20abcdeftfdfd
Chen test paper20abcdeftfdfdChen test paper20abcdeftfdfd
Chen test paper20abcdeftfdfd
techweb08
?
Test for an issue
Test for an issueTest for an issue
Test for an issue
techweb08
?
Chen's first test slides
Chen's first test slidesChen's first test slides
Chen's first test slides
Hima Challa
?
Atlas ApacheCon 2017
Atlas ApacheCon 2017Atlas ApacheCon 2017
Atlas ApacheCon 2017
Vimal Sharma
?
Patni Hibernate
Patni   HibernatePatni   Hibernate
Patni Hibernate
patinijava
?
Struts Intro Course(1)
Struts Intro Course(1)Struts Intro Course(1)
Struts Intro Course(1)
wangjiaz
?
Developing web apps using Erlang-Web
Developing web apps using Erlang-WebDeveloping web apps using Erlang-Web
Developing web apps using Erlang-Web
fanqstefan
?
Chado-XML
Chado-XMLChado-XML
Chado-XML
Chris Mungall
?
Megan Milton & Mallory Van Wyngaarden - Managing Barcode Data Library Generation
Megan Milton & Mallory Van Wyngaarden - Managing Barcode Data Library GenerationMegan Milton & Mallory Van Wyngaarden - Managing Barcode Data Library Generation
Megan Milton & Mallory Van Wyngaarden - Managing Barcode Data Library Generation
Consortium for the Barcode of Life (CBOL)
?
Systems Analyst and Design - Data Dictionary
Systems Analyst and Design -  Data DictionarySystems Analyst and Design -  Data Dictionary
Systems Analyst and Design - Data Dictionary
Kimberly Coquilla
?
Encoded Archival Description (EAD)
Encoded Archival Description (EAD) Encoded Archival Description (EAD)
Encoded Archival Description (EAD)
Farris Wahbeh
?
Archivists' Toolkit Training-Resources, Digital Objects, and Reports
Archivists' Toolkit Training-Resources, Digital Objects, and ReportsArchivists' Toolkit Training-Resources, Digital Objects, and Reports
Archivists' Toolkit Training-Resources, Digital Objects, and Reports
Kira A. Dietz
?
Flink Forward San Francisco 2019: TensorFlow Extended: An end-to-end machine ...
Flink Forward San Francisco 2019: TensorFlow Extended: An end-to-end machine ...Flink Forward San Francisco 2019: TensorFlow Extended: An end-to-end machine ...
Flink Forward San Francisco 2019: TensorFlow Extended: An end-to-end machine ...
Flink Forward
?
ASP.NET 10 - Data Controls
ASP.NET 10 - Data ControlsASP.NET 10 - Data Controls
ASP.NET 10 - Data Controls
Randy Connolly
?
Chen test paper20abcdeftfdfd
Chen test paper20abcdeftfdfdChen test paper20abcdeftfdfd
Chen test paper20abcdeftfdfd
techweb08
?
Test for an issue
Test for an issueTest for an issue
Test for an issue
techweb08
?
Chen's first test slides
Chen's first test slidesChen's first test slides
Chen's first test slides
Hima Challa
?
Struts Intro Course(1)
Struts Intro Course(1)Struts Intro Course(1)
Struts Intro Course(1)
wangjiaz
?
Developing web apps using Erlang-Web
Developing web apps using Erlang-WebDeveloping web apps using Erlang-Web
Developing web apps using Erlang-Web
fanqstefan
?
Megan Milton & Mallory Van Wyngaarden - Managing Barcode Data Library Generation
Megan Milton & Mallory Van Wyngaarden - Managing Barcode Data Library GenerationMegan Milton & Mallory Van Wyngaarden - Managing Barcode Data Library Generation
Megan Milton & Mallory Van Wyngaarden - Managing Barcode Data Library Generation
Consortium for the Barcode of Life (CBOL)
?
Systems Analyst and Design - Data Dictionary
Systems Analyst and Design -  Data DictionarySystems Analyst and Design -  Data Dictionary
Systems Analyst and Design - Data Dictionary
Kimberly Coquilla
?
Encoded Archival Description (EAD)
Encoded Archival Description (EAD) Encoded Archival Description (EAD)
Encoded Archival Description (EAD)
Farris Wahbeh
?
Archivists' Toolkit Training-Resources, Digital Objects, and Reports
Archivists' Toolkit Training-Resources, Digital Objects, and ReportsArchivists' Toolkit Training-Resources, Digital Objects, and Reports
Archivists' Toolkit Training-Resources, Digital Objects, and Reports
Kira A. Dietz
?

More from Pierre Marguerite (7)

Dea Presentation 2eme VersionDea Presentation 2eme Version
Dea Presentation 2eme Version
Pierre Marguerite
?
Dea Presentation Pierre Marguerite 24 Juin 2003Dea Presentation Pierre Marguerite 24 Juin 2003
Dea Presentation Pierre Marguerite 24 Juin 2003
Pierre Marguerite
?
Expose Alzheimer2Expose Alzheimer2
Expose Alzheimer2
Pierre Marguerite
?
Affiches Pour Le BureauAffiches Pour Le Bureau
Affiches Pour Le Bureau
Pierre Marguerite
?
Analyse Retrospective Des Donn¨¦Es De BiopucesAnalyse Retrospective Des Donn¨¦Es De Biopuces
Analyse Retrospective Des Donn¨¦Es De Biopuces
Pierre Marguerite
?
Presentation Dess EbiPresentation Dess Ebi
Presentation Dess Ebi
Pierre Marguerite
?
Facilitating Standardization And Exchange Of Array Design 11 06 2004
Facilitating Standardization And Exchange Of Array Design 11 06 2004Facilitating Standardization And Exchange Of Array Design 11 06 2004
Facilitating Standardization And Exchange Of Array Design 11 06 2004
Pierre Marguerite
?
Dea Presentation 2eme VersionDea Presentation 2eme Version
Dea Presentation 2eme Version
Pierre Marguerite
?
Dea Presentation Pierre Marguerite 24 Juin 2003Dea Presentation Pierre Marguerite 24 Juin 2003
Dea Presentation Pierre Marguerite 24 Juin 2003
Pierre Marguerite
?
Expose Alzheimer2Expose Alzheimer2
Expose Alzheimer2
Pierre Marguerite
?
Affiches Pour Le BureauAffiches Pour Le Bureau
Affiches Pour Le Bureau
Pierre Marguerite
?
Analyse Retrospective Des Donn¨¦Es De BiopucesAnalyse Retrospective Des Donn¨¦Es De Biopuces
Analyse Retrospective Des Donn¨¦Es De Biopuces
Pierre Marguerite
?
Presentation Dess EbiPresentation Dess Ebi
Presentation Dess Ebi
Pierre Marguerite
?
Facilitating Standardization And Exchange Of Array Design 11 06 2004
Facilitating Standardization And Exchange Of Array Design 11 06 2004Facilitating Standardization And Exchange Of Array Design 11 06 2004
Facilitating Standardization And Exchange Of Array Design 11 06 2004
Pierre Marguerite
?

Recently uploaded (20)

L01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardnessL01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardness
RostislavDaniel
?
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
Tsuyoshi Hirayama
?
Field Device Management Market Report 2030 - TechSci Research
Field Device Management Market Report 2030 - TechSci ResearchField Device Management Market Report 2030 - TechSci Research
Field Device Management Market Report 2030 - TechSci Research
Vipin Mishra
?
Technology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptxTechnology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptx
kaylagaze
?
A Framework for Model-Driven Digital Twin Engineering
A Framework for Model-Driven Digital Twin EngineeringA Framework for Model-Driven Digital Twin Engineering
A Framework for Model-Driven Digital Twin Engineering
Daniel Lehner
?
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
ScyllaDB
?
World Information Architecture Day 2025 - UX at a Crossroads
World Information Architecture Day 2025 - UX at a CrossroadsWorld Information Architecture Day 2025 - UX at a Crossroads
World Information Architecture Day 2025 - UX at a Crossroads
Joshua Randall
?
UiPath Document Understanding - Generative AI and Active learning capabilities
UiPath Document Understanding - Generative AI and Active learning capabilitiesUiPath Document Understanding - Generative AI and Active learning capabilities
UiPath Document Understanding - Generative AI and Active learning capabilities
DianaGray10
?
Formal Methods: Whence and Whither? [Martin Fr?nzle Festkolloquium, 2025]
Formal Methods: Whence and Whither? [Martin Fr?nzle Festkolloquium, 2025]Formal Methods: Whence and Whither? [Martin Fr?nzle Festkolloquium, 2025]
Formal Methods: Whence and Whither? [Martin Fr?nzle Festkolloquium, 2025]
Jonathan Bowen
?
Computational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the WorldComputational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the World
HusseinMalikMammadli
?
Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4
Margaret Maynard-Reid
?
Unlocking DevOps Secuirty :Vault & Keylock
Unlocking DevOps Secuirty :Vault & KeylockUnlocking DevOps Secuirty :Vault & Keylock
Unlocking DevOps Secuirty :Vault & Keylock
HusseinMalikMammadli
?
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar PatturajInside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj
ScyllaDB
?
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIATHE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
Srivaanchi Nathan
?
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog Gavra
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog GavraReplacing RocksDB with ScyllaDB in Kafka Streams by Almog Gavra
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog Gavra
ScyllaDB
?
Backstage Software Templates for Java Developers
Backstage Software Templates for Java DevelopersBackstage Software Templates for Java Developers
Backstage Software Templates for Java Developers
Markus Eisele
?
Endpoint Backup: 3 Reasons MSPs Ignore It
Endpoint Backup: 3 Reasons MSPs Ignore ItEndpoint Backup: 3 Reasons MSPs Ignore It
Endpoint Backup: 3 Reasons MSPs Ignore It
MSP360
?
UiPath Agentic Automation Capabilities and Opportunities
UiPath Agentic Automation Capabilities and OpportunitiesUiPath Agentic Automation Capabilities and Opportunities
UiPath Agentic Automation Capabilities and Opportunities
DianaGray10
?
Stronger Together: Combining Data Quality and Governance for Confident AI & A...
Stronger Together: Combining Data Quality and Governance for Confident AI & A...Stronger Together: Combining Data Quality and Governance for Confident AI & A...
Stronger Together: Combining Data Quality and Governance for Confident AI & A...
Precisely
?
Wondershare Dr.Fone Crack Free Download 2025
Wondershare Dr.Fone Crack Free Download 2025Wondershare Dr.Fone Crack Free Download 2025
Wondershare Dr.Fone Crack Free Download 2025
maharajput103
?
L01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardnessL01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardness
RostislavDaniel
?
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
Tsuyoshi Hirayama
?
Field Device Management Market Report 2030 - TechSci Research
Field Device Management Market Report 2030 - TechSci ResearchField Device Management Market Report 2030 - TechSci Research
Field Device Management Market Report 2030 - TechSci Research
Vipin Mishra
?
Technology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptxTechnology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptx
kaylagaze
?
A Framework for Model-Driven Digital Twin Engineering
A Framework for Model-Driven Digital Twin EngineeringA Framework for Model-Driven Digital Twin Engineering
A Framework for Model-Driven Digital Twin Engineering
Daniel Lehner
?
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
ScyllaDB
?
World Information Architecture Day 2025 - UX at a Crossroads
World Information Architecture Day 2025 - UX at a CrossroadsWorld Information Architecture Day 2025 - UX at a Crossroads
World Information Architecture Day 2025 - UX at a Crossroads
Joshua Randall
?
UiPath Document Understanding - Generative AI and Active learning capabilities
UiPath Document Understanding - Generative AI and Active learning capabilitiesUiPath Document Understanding - Generative AI and Active learning capabilities
UiPath Document Understanding - Generative AI and Active learning capabilities
DianaGray10
?
Formal Methods: Whence and Whither? [Martin Fr?nzle Festkolloquium, 2025]
Formal Methods: Whence and Whither? [Martin Fr?nzle Festkolloquium, 2025]Formal Methods: Whence and Whither? [Martin Fr?nzle Festkolloquium, 2025]
Formal Methods: Whence and Whither? [Martin Fr?nzle Festkolloquium, 2025]
Jonathan Bowen
?
Computational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the WorldComputational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the World
HusseinMalikMammadli
?
Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4
Margaret Maynard-Reid
?
Unlocking DevOps Secuirty :Vault & Keylock
Unlocking DevOps Secuirty :Vault & KeylockUnlocking DevOps Secuirty :Vault & Keylock
Unlocking DevOps Secuirty :Vault & Keylock
HusseinMalikMammadli
?
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar PatturajInside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj
ScyllaDB
?
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIATHE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
Srivaanchi Nathan
?
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog Gavra
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog GavraReplacing RocksDB with ScyllaDB in Kafka Streams by Almog Gavra
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog Gavra
ScyllaDB
?
Backstage Software Templates for Java Developers
Backstage Software Templates for Java DevelopersBackstage Software Templates for Java Developers
Backstage Software Templates for Java Developers
Markus Eisele
?
Endpoint Backup: 3 Reasons MSPs Ignore It
Endpoint Backup: 3 Reasons MSPs Ignore ItEndpoint Backup: 3 Reasons MSPs Ignore It
Endpoint Backup: 3 Reasons MSPs Ignore It
MSP360
?
UiPath Agentic Automation Capabilities and Opportunities
UiPath Agentic Automation Capabilities and OpportunitiesUiPath Agentic Automation Capabilities and Opportunities
UiPath Agentic Automation Capabilities and Opportunities
DianaGray10
?
Stronger Together: Combining Data Quality and Governance for Confident AI & A...
Stronger Together: Combining Data Quality and Governance for Confident AI & A...Stronger Together: Combining Data Quality and Governance for Confident AI & A...
Stronger Together: Combining Data Quality and Governance for Confident AI & A...
Precisely
?
Wondershare Dr.Fone Crack Free Download 2025
Wondershare Dr.Fone Crack Free Download 2025Wondershare Dr.Fone Crack Free Download 2025
Wondershare Dr.Fone Crack Free Download 2025
maharajput103
?

Friday Seminar 15 10 2004

  • 1. Facilitating Standardization and Exchange of Array Design ADF MAGE-ML Tool Pierre Marguerite ¨C Friday Seminar EBI ¨C Microarray Informatics Team 15 October 2004
  • 2. ADF MAGE-ML Tool Application stand-alone plateform independant Supports: Simple/Complex microarray layout Differents microarray applications gene_expression snp_detection comparative_genomic_hybridization binding_site_identification Others (minimal) Respects Good practices
  • 4. MAGE-ML (MAGE-OM) Description Biosequence Array Array Design DesignElement DesignElement
  • 7. Array Design File adh adr adc Header contacts Technical Information
  • 8. Array Design File adr adc Reporters Features Feature /Reporter
  • 9. Array Design File Composite Characteristics Map to reporters
  • 10. ADF version differences 3 parts (files) instead of 1 As Workbook or text files No Reporter Identifier item No Reporter Group [role] item New Chromosome item New Chromosome_band item New Species item
  • 11. 2 mandatory steps : Validation Conversion
  • 12. Validation File format validation: File content validation Validation of controlled vocabulary MGED ontology terms Approved Databases (Tags, Accession numbers) Automatic curation (when possible)
  • 13. Validation two levels of checking: Relaxed Strict two execution modes : A complete mode A step-by-step mode Error Log : for correction
  • 14. Checking lists (header) File/Data structure checklist: Header file is a tab-delimited-file Item names are correct or can be identified if an item is not identified, it is skipped. All mandatory items are present in the header Data/file content checklist Correct field value format Possible value types: "Integer" "Free Text" "Controlled vocabulary" "MGED ontology term" "DatabaseEntry" "Sequence" "Species" Check single multiple value
  • 15. Checking lists (feature reporter) Feature Reporter file File/Data structure checklist: Header File is correct (structure and data ) FeatureReporter file is a tab-delimited-file Header item names are correct (unknown items are skipped) All mandatory items are present. item cardinalities and dependences are correct. Database tags are approved and database accession numbers are correct Item order is correct (Optional, do not fail the checking) Field dependences are correct Data/file content checklist FeatureReporter file structure must be correct Mandatory Field are present. Field cardinalities and field value multiplicities must be correct. Field values are in a mandatory format Database tags are approved by ArrayExpress and are supplied in lower caseand between square brackets Database ID are correct Ontology terms are correct (MGED ontology) Sequences are correct following the associated polymer type (DNA, RNA, protein): Integer field values are correct Duplicate features must not exist Duplicate Reporter (equal names) must have the characteristics.
  • 16. Checking lists (composite) CompositeSequence File/Data structure checklist: Feature Reporter file must be correct (structure and data) CompositeSequence file is a tab-delimited-file Header item names are correct. (Unknown items are skipped) All mandatory items are present. Header item cardinalities and dependences are correct Column order is correct (non mandatory) Data/file content checklist Composite file structure must be correct All mandatory fields are present. Field cardinalities are correct Field values are in expected format. Field multiplicity is correct (same as Feature/Reporter) Names in map are reporter or composite sequence names No duplicate CompositeSequences (same names)
  • 17. Checking lists Header item names are correct All mandatory items are present All mandatory fields are present. No Duplicate features Duplicate Reporter (equal names) must have the characteristics. No duplicate CompositeSequences (same names) Names in map are reporter or composite sequence names
  • 18. ?
  • 19. ?
  • 20. MGED Ontology / DAML+OIL
  • 23. Implementation - technical choices -MAGE-stk JaxB Configuration (default parameters) Performance: 4000 features : ~10 minutes
  • 24. Installer - izpack http://www.izforge.com/izpack/
  • 26. ?

Editor's Notes

  • #3: Vocabulaire contr?l¨¦
  • #5: 16 packages au total 6 pour design
  • #8: Redondance des informations Plus claire et plus lisible Facilement cr¨¦able (tableur)
  • #9: Facile a lire et a comprenddre cr¨¦er ->redondance d¡¯information
  • #10: Facile a lire et a comprenddre cr¨¦er ->redondance d¡¯information
  • #14: Relaxed : allowing the usual mistakes (if they can be identified as well). Strict : file must exactly match the specification. A complete mode , which checks whole data; In that case, the process will not stop if an error is identified. A step-by-step mode : once an error is found, the process will stop, allowing a correction of errors one by one (for small data set or known small error numbers);