際際滷

際際滷Share a Scribd company logo
Facilitating Standardization and Exchange of Array Design  ADF MAGE-ML Tool Pierre Marguerite  Friday Seminar EBI  Microarray Informatics Team 15 October 2004
ADF MAGE-ML Tool Application stand-alone plateform independant Supports: Simple/Complex microarray layout Differents microarray applications gene_expression snp_detection comparative_genomic_hybridization binding_site_identification Others (minimal) Respects Good practices
conversion tool
MAGE-ML (MAGE-OM) Description Biosequence Array Array Design DesignElement DesignElement
MAGE-ML (next)
ADF (previous)
Array Design File adh adr adc Header contacts Technical Information
Array Design File adr adc Reporters Features Feature  /Reporter
Array Design File Composite Characteristics Map to reporters
ADF version differences 3 parts (files) instead of 1 As Workbook or text files  No Reporter Identifier item  No Reporter Group [role] item New Chromosome item  New Chromosome_band item  New Species item
2 mandatory steps : Validation Conversion
Validation File format validation: File content validation Validation of controlled vocabulary  MGED ontology terms  Approved Databases (Tags, Accession numbers) Automatic curation   (when possible)
Validation two levels of checking: Relaxed Strict two execution modes : A  complete mode A  step-by-step mode Error Log : for correction
Checking lists (header) File/Data structure checklist: Header file is a tab-delimited-file Item names are correct or can be identified if an item is not identified, it is skipped. All mandatory items are present in the header Data/file content checklist Correct field value format Possible value types: "Integer" "Free Text" "Controlled vocabulary" "MGED ontology term" "DatabaseEntry" "Sequence" "Species" Check single multiple value
Checking lists  (feature reporter) Feature Reporter file File/Data structure checklist: Header File is correct (structure and data ) FeatureReporter file is a tab-delimited-file Header item names are correct (unknown items are skipped) All mandatory items are present. item cardinalities and dependences are correct. Database tags are approved and database accession numbers are correct Item order is correct (Optional, do not fail the checking) Field dependences are correct Data/file content checklist FeatureReporter file structure must be correct Mandatory Field are present. Field cardinalities and field value multiplicities must be correct. Field values are in a mandatory format Database tags are approved by ArrayExpress and are supplied in lower caseand between square brackets Database ID are correct Ontology terms are correct (MGED ontology) Sequences are correct following the associated polymer type (DNA, RNA, protein): Integer field values are correct Duplicate features must not exist Duplicate Reporter (equal names) must have the characteristics.
Checking lists  (composite) CompositeSequence File/Data structure checklist: Feature Reporter file must be correct (structure and data) CompositeSequence file is a tab-delimited-file Header item names are correct. (Unknown items are skipped) All mandatory items are present. Header item cardinalities and dependences are correct Column order is correct (non mandatory) Data/file content checklist Composite file structure must be correct All mandatory fields are present. Field cardinalities are correct Field values are in expected format. Field multiplicity is correct (same as Feature/Reporter) Names in map are reporter or composite sequence names No duplicate CompositeSequences (same names)
Checking lists Header item names are correct All mandatory items are present All mandatory fields are present. No Duplicate features  Duplicate Reporter (equal names) must have the characteristics. No duplicate CompositeSequences (same names) Names in map are reporter or composite sequence names
油
油
MGED Ontology / DAML+OIL
Approved Databases
User  modes
Implementation -  technical  choices -MAGE-stk JaxB Configuration (default parameters) Performance: 4000 features : ~10 minutes
Installer - izpack http://www.izforge.com/izpack/
http://www.ebi.ac.uk/adf http://www.ebi.ac.uk/adf/
油
Ad

Recommended

Stat2 25 09
Stat2 25 09
stat
Group Meeting Vamsas Project Final
Group Meeting Vamsas Project Final
Pierre Marguerite
Globlinx Newsletter Q1 2013
Globlinx Newsletter Q1 2013
GLOBALINX CORP
MAGE-TAB introduction: Alvis Brazma (EBI)
MAGE-TAB introduction: Alvis Brazma (EBI)
niranabey
Facilitating Standardization And Exchange Of Array Design 11 06 2004
Facilitating Standardization And Exchange Of Array Design 11 06 2004
Pierre Marguerite
GenePattern: Ted Liefeld
GenePattern: Ted Liefeld
niranabey
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life Sciences
Alasdair Gray
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
BOSC 2010
The Genopolis Microarray database
The Genopolis Microarray database
Novartis Institutes for BioMedical Research
caArray: Juli Klemm (NCICB)
caArray: Juli Klemm (NCICB)
niranabey
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
Yasset Perez-Riverol
Microarrays Databases.pptx
Microarrays Databases.pptx
Muzzamilahmed14
Information Integration and Knowledge Acquisition from Semantically Heterogen...
Information Integration and Knowledge Acquisition from Semantically Heterogen...
Jie Bao
Publishing hkh biodiversity data globally technical session ii
Publishing hkh biodiversity data globally technical session ii
ICIMOD
ArrayExpress: Helen Parkinson
ArrayExpress: Helen Parkinson
niranabey
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
Carole Goble
Overview of cheminformatics
Overview of cheminformatics
Benjamin Bucior
Microarray data analysis of the variants
Microarray data analysis of the variants
Muhammad Ilyas
Dea Presentation 2eme Version
Pierre Marguerite
Dea Presentation Pierre Marguerite 24 Juin 2003
Pierre Marguerite
Expose Alzheimer2
Pierre Marguerite
Affiches Pour Le Bureau
Pierre Marguerite
Analyse Retrospective Des Donn辿Es De Biopuces
Pierre Marguerite
Presentation Dess Ebi
Pierre Marguerite
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Impelsys Inc.
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Safe Software

More Related Content

Similar to Friday Seminar 15 10 2004 (10)

The Genopolis Microarray database
The Genopolis Microarray database
Novartis Institutes for BioMedical Research
caArray: Juli Klemm (NCICB)
caArray: Juli Klemm (NCICB)
niranabey
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
Yasset Perez-Riverol
Microarrays Databases.pptx
Microarrays Databases.pptx
Muzzamilahmed14
Information Integration and Knowledge Acquisition from Semantically Heterogen...
Information Integration and Knowledge Acquisition from Semantically Heterogen...
Jie Bao
Publishing hkh biodiversity data globally technical session ii
Publishing hkh biodiversity data globally technical session ii
ICIMOD
ArrayExpress: Helen Parkinson
ArrayExpress: Helen Parkinson
niranabey
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
Carole Goble
Overview of cheminformatics
Overview of cheminformatics
Benjamin Bucior
Microarray data analysis of the variants
Microarray data analysis of the variants
Muhammad Ilyas
caArray: Juli Klemm (NCICB)
caArray: Juli Klemm (NCICB)
niranabey
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
Yasset Perez-Riverol
Microarrays Databases.pptx
Microarrays Databases.pptx
Muzzamilahmed14
Information Integration and Knowledge Acquisition from Semantically Heterogen...
Information Integration and Knowledge Acquisition from Semantically Heterogen...
Jie Bao
Publishing hkh biodiversity data globally technical session ii
Publishing hkh biodiversity data globally technical session ii
ICIMOD
ArrayExpress: Helen Parkinson
ArrayExpress: Helen Parkinson
niranabey
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
Carole Goble
Overview of cheminformatics
Overview of cheminformatics
Benjamin Bucior
Microarray data analysis of the variants
Microarray data analysis of the variants
Muhammad Ilyas

More from Pierre Marguerite (6)

Dea Presentation 2eme Version
Pierre Marguerite
Dea Presentation Pierre Marguerite 24 Juin 2003
Pierre Marguerite
Expose Alzheimer2
Pierre Marguerite
Affiches Pour Le Bureau
Pierre Marguerite
Analyse Retrospective Des Donn辿Es De Biopuces
Pierre Marguerite
Presentation Dess Ebi
Pierre Marguerite
Dea Presentation 2eme Version
Pierre Marguerite
Dea Presentation Pierre Marguerite 24 Juin 2003
Pierre Marguerite
Expose Alzheimer2
Pierre Marguerite
Affiches Pour Le Bureau
Pierre Marguerite
Analyse Retrospective Des Donn辿Es De Biopuces
Pierre Marguerite
Presentation Dess Ebi
Pierre Marguerite
Ad

Recently uploaded (20)

ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Impelsys Inc.
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Safe Software
The Future of AI Agent Development Trends to Watch.pptx
The Future of AI Agent Development Trends to Watch.pptx
Lisa ward
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
The Future of Data, AI, and AR: Innovation Inspired by You.pdf
The Future of Data, AI, and AR: Innovation Inspired by You.pdf
Safe Software
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
OpenPOWER Foundation & Open-Source Core Innovations
OpenPOWER Foundation & Open-Source Core Innovations
IBM
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
Information Security Response Team Nepal_npCERT_Vice_President_Sudan_Jha.pdf
Information Security Response Team Nepal_npCERT_Vice_President_Sudan_Jha.pdf
ICT Frame Magazine Pvt. Ltd.
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Impelsys Inc.
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Safe Software
The Future of AI Agent Development Trends to Watch.pptx
The Future of AI Agent Development Trends to Watch.pptx
Lisa ward
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
The Future of Data, AI, and AR: Innovation Inspired by You.pdf
The Future of Data, AI, and AR: Innovation Inspired by You.pdf
Safe Software
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
OpenPOWER Foundation & Open-Source Core Innovations
OpenPOWER Foundation & Open-Source Core Innovations
IBM
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
Information Security Response Team Nepal_npCERT_Vice_President_Sudan_Jha.pdf
Information Security Response Team Nepal_npCERT_Vice_President_Sudan_Jha.pdf
ICT Frame Magazine Pvt. Ltd.
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
Ad

Friday Seminar 15 10 2004

  • 1. Facilitating Standardization and Exchange of Array Design ADF MAGE-ML Tool Pierre Marguerite Friday Seminar EBI Microarray Informatics Team 15 October 2004
  • 2. ADF MAGE-ML Tool Application stand-alone plateform independant Supports: Simple/Complex microarray layout Differents microarray applications gene_expression snp_detection comparative_genomic_hybridization binding_site_identification Others (minimal) Respects Good practices
  • 4. MAGE-ML (MAGE-OM) Description Biosequence Array Array Design DesignElement DesignElement
  • 7. Array Design File adh adr adc Header contacts Technical Information
  • 8. Array Design File adr adc Reporters Features Feature /Reporter
  • 9. Array Design File Composite Characteristics Map to reporters
  • 10. ADF version differences 3 parts (files) instead of 1 As Workbook or text files No Reporter Identifier item No Reporter Group [role] item New Chromosome item New Chromosome_band item New Species item
  • 11. 2 mandatory steps : Validation Conversion
  • 12. Validation File format validation: File content validation Validation of controlled vocabulary MGED ontology terms Approved Databases (Tags, Accession numbers) Automatic curation (when possible)
  • 13. Validation two levels of checking: Relaxed Strict two execution modes : A complete mode A step-by-step mode Error Log : for correction
  • 14. Checking lists (header) File/Data structure checklist: Header file is a tab-delimited-file Item names are correct or can be identified if an item is not identified, it is skipped. All mandatory items are present in the header Data/file content checklist Correct field value format Possible value types: "Integer" "Free Text" "Controlled vocabulary" "MGED ontology term" "DatabaseEntry" "Sequence" "Species" Check single multiple value
  • 15. Checking lists (feature reporter) Feature Reporter file File/Data structure checklist: Header File is correct (structure and data ) FeatureReporter file is a tab-delimited-file Header item names are correct (unknown items are skipped) All mandatory items are present. item cardinalities and dependences are correct. Database tags are approved and database accession numbers are correct Item order is correct (Optional, do not fail the checking) Field dependences are correct Data/file content checklist FeatureReporter file structure must be correct Mandatory Field are present. Field cardinalities and field value multiplicities must be correct. Field values are in a mandatory format Database tags are approved by ArrayExpress and are supplied in lower caseand between square brackets Database ID are correct Ontology terms are correct (MGED ontology) Sequences are correct following the associated polymer type (DNA, RNA, protein): Integer field values are correct Duplicate features must not exist Duplicate Reporter (equal names) must have the characteristics.
  • 16. Checking lists (composite) CompositeSequence File/Data structure checklist: Feature Reporter file must be correct (structure and data) CompositeSequence file is a tab-delimited-file Header item names are correct. (Unknown items are skipped) All mandatory items are present. Header item cardinalities and dependences are correct Column order is correct (non mandatory) Data/file content checklist Composite file structure must be correct All mandatory fields are present. Field cardinalities are correct Field values are in expected format. Field multiplicity is correct (same as Feature/Reporter) Names in map are reporter or composite sequence names No duplicate CompositeSequences (same names)
  • 17. Checking lists Header item names are correct All mandatory items are present All mandatory fields are present. No Duplicate features Duplicate Reporter (equal names) must have the characteristics. No duplicate CompositeSequences (same names) Names in map are reporter or composite sequence names
  • 18.
  • 19.
  • 20. MGED Ontology / DAML+OIL
  • 23. Implementation - technical choices -MAGE-stk JaxB Configuration (default parameters) Performance: 4000 features : ~10 minutes
  • 24. Installer - izpack http://www.izforge.com/izpack/
  • 26.

Editor's Notes

  • #3: Vocabulaire contr担l辿
  • #5: 16 packages au total 6 pour design
  • #8: Redondance des informations Plus claire et plus lisible Facilement cr辿able (tableur)
  • #9: Facile a lire et a comprenddre cr辿er ->redondance dinformation
  • #10: Facile a lire et a comprenddre cr辿er ->redondance dinformation
  • #14: Relaxed : allowing the usual mistakes (if they can be identified as well). Strict : file must exactly match the specification. A complete mode , which checks whole data; In that case, the process will not stop if an error is identified. A step-by-step mode : once an error is found, the process will stop, allowing a correction of errors one by one (for small data set or known small error numbers);