ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
BEL.bio and BioDati Studio
Why BEL?
Chemists have the Chemical Reaction Language
Biologists have the Biological Expression Language (BEL)
Open standard for communication and knowledge-storage
Whiteboard and Computer friendly
Partial chemical synthesis pathway: https://www.synarchive.com/syn/128
Overall Goals for BEL.bio
Try to simplify use of BEL and BEL Content
Stronger BEL/Nanopub validation, better error messages
Easy addition of new BEL Language features
Convert to Python and Docker
Easier community engagement
Quick easy startup/deployment
Provide API and Namespaces hosting
Easier to use/deploy/maintain search/completion service
Greatly expand organisms supported (all EntrezGene/NCBITaxonomy)
Simplify addition/maintenance of namespaces/zero downtime updates!!!
Glossary
BEL Assertion – single string version of BEL or subject, relation, object
(SRO) version of BEL assertion (e.g. BEL triple)
BEL Nanopub – BEL triple, Evidence, Context, Citation, Metadata
Evidence – short text extraction or supporting information for BELTriple (Evidence
in BEL Script, Support in OpenBEL Nanopub format)
Annotations – OpenBEL Annotations are now called Annotations and were referred
to as Experimental Context in BELMgr
BEL Edge – BEL triples, primary and computed BEL canonicalized to
standard namespace IDs and potentially orthologized stored in the
EdgeStore (a graph database)
API – BEL.bio API – BEL language, nanopub, terminology (namespace,
orthology) services
AST – Abstract SyntaxTree of BEL Statement
Function: BEL function, e.g. p() or modifier function, e.g. var()
NSArg: Namespace argument, e.g. HGNC:AKT1
StrArg: String argument, e.g. pmod(Ph,T, 22), Ph,T and 22 are string arguments
Not supported by BEL.bio
KAMs
OpenBEL API/tooling
BELScripts (except for converting to BEL Nanopubs)
XBEL
OpenBEL namespace/equivalence files (limited conversion to
BEL.bioTerminology files)
BEL Parsing and Validation
bel_lang python module
Depends on BEL.bio API for terminology services (namespaces, equivalents,
orthology)
Parsing, validation, canonicalization, orthologization, compute
edges (eventually completion and migration)
Uses BEL Specification and EBNF file for parsing and semantic
validation
EBNF file used byTatsu module to create parser library to parse BELTriple
into dictionary AST of Function, NSArg, StrArg components,AST is
transformed to python AST class-based object (BEL Object  BO)
BEL Spec used to process BO for semantic validation
bo.parse('p(MGI:A1bg)').orthologize('TAX:9606').canonicalize().ast.to_string(fmt='medium')
p(EG:1)
Provides CLI installed with module
Supports Multiple BEL versions
Can deploy bel_lang with multiple BEL versions (only BEL 2.0.0
currently (using semantic versioning now for BEL)
One BEL Specification file per version, EBNF/parser generated from
BEL Spec
Drop in new BEL Spec, get new BELVersion functionality, easy
testing of proposed BEL language features
Future: create BEL migration signatures like the computed edge
signatures for migrating BEL
BioDati Studio
BEL.bio Overview and BioDati Studio
BEL.bio Overview and BioDati Studio
BEL.bio Overview and BioDati Studio
BEL.bio Overview and BioDati Studio
Terminology Services
BEL Terminology Resources
Simplify Namespaces
GOBP, GOCC, GOBPID, GOCCID -> GO
Context (Annotations) are now also Namespaces
Simplify generator scripts
Single script per resource: download and reformat into
terminology or orthology load file
Single download/cache directory (gzipped)
BEL Resource tools Github repo
https://github.com/belbio/bel_resources
Terminology Workflow
Term Completion Examples – BioDati Studio
Namespaces Count
EG 20,750,186
TAX 1,736,298
SP 557,012
AFFX 327,392
CHEBI 106,644
MGI 57,532
RGD 44,972
GO 44,922
HGNC 41,315
ZFIN 23,388
MESH 19,223
UBERON 13,232
DO 8,699
CL 2,194
EFO 937
Terminology Statistics
Summary
BioDati Services
BioDati Studio
Data Stores (Nanopubs, Networks, Edges)
Terminologies (collection, normalization, search, completions)
Consulting – BEL-related
Acknowledgements
NatalieCatlett, PatientsLikeMe
Anselmo DiFabio, BioDati
David Chen
Tony Bargnesi
Nick Bargnesi
Additional resources
http://bel.bio
http://biodati.com
http://medium.com/biodati
JSON philosophy: https://towardsdatascience.com/my-love-affair-
with-json-edaca39e8320
https://github.com/belbio
location: 1501 Main Street, Rahway, NJ 07065 | call: 732-764-8844 | online: biodati.com
Anselmo Di Fabio
adifabio@biodati.com
William Hayes
whayes@biodati.com

More Related Content

BEL.bio Overview and BioDati Studio

  • 2. Why BEL? Chemists have the Chemical Reaction Language Biologists have the Biological Expression Language (BEL) Open standard for communication and knowledge-storage Whiteboard and Computer friendly Partial chemical synthesis pathway: https://www.synarchive.com/syn/128
  • 3. Overall Goals for BEL.bio Try to simplify use of BEL and BEL Content Stronger BEL/Nanopub validation, better error messages Easy addition of new BEL Language features Convert to Python and Docker Easier community engagement Quick easy startup/deployment Provide API and Namespaces hosting Easier to use/deploy/maintain search/completion service Greatly expand organisms supported (all EntrezGene/NCBITaxonomy) Simplify addition/maintenance of namespaces/zero downtime updates!!!
  • 4. Glossary BEL Assertion – single string version of BEL or subject, relation, object (SRO) version of BEL assertion (e.g. BEL triple) BEL Nanopub – BEL triple, Evidence, Context, Citation, Metadata Evidence – short text extraction or supporting information for BELTriple (Evidence in BEL Script, Support in OpenBEL Nanopub format) Annotations – OpenBEL Annotations are now called Annotations and were referred to as Experimental Context in BELMgr BEL Edge – BEL triples, primary and computed BEL canonicalized to standard namespace IDs and potentially orthologized stored in the EdgeStore (a graph database) API – BEL.bio API – BEL language, nanopub, terminology (namespace, orthology) services AST – Abstract SyntaxTree of BEL Statement Function: BEL function, e.g. p() or modifier function, e.g. var() NSArg: Namespace argument, e.g. HGNC:AKT1 StrArg: String argument, e.g. pmod(Ph,T, 22), Ph,T and 22 are string arguments
  • 5. Not supported by BEL.bio KAMs OpenBEL API/tooling BELScripts (except for converting to BEL Nanopubs) XBEL OpenBEL namespace/equivalence files (limited conversion to BEL.bioTerminology files)
  • 6. BEL Parsing and Validation bel_lang python module Depends on BEL.bio API for terminology services (namespaces, equivalents, orthology) Parsing, validation, canonicalization, orthologization, compute edges (eventually completion and migration) Uses BEL Specification and EBNF file for parsing and semantic validation EBNF file used byTatsu module to create parser library to parse BELTriple into dictionary AST of Function, NSArg, StrArg components,AST is transformed to python AST class-based object (BEL Object  BO) BEL Spec used to process BO for semantic validation bo.parse('p(MGI:A1bg)').orthologize('TAX:9606').canonicalize().ast.to_string(fmt='medium') p(EG:1) Provides CLI installed with module
  • 7. Supports Multiple BEL versions Can deploy bel_lang with multiple BEL versions (only BEL 2.0.0 currently (using semantic versioning now for BEL) One BEL Specification file per version, EBNF/parser generated from BEL Spec Drop in new BEL Spec, get new BELVersion functionality, easy testing of proposed BEL language features Future: create BEL migration signatures like the computed edge signatures for migrating BEL
  • 14. BEL Terminology Resources Simplify Namespaces GOBP, GOCC, GOBPID, GOCCID -> GO Context (Annotations) are now also Namespaces Simplify generator scripts Single script per resource: download and reformat into terminology or orthology load file Single download/cache directory (gzipped) BEL Resource tools Github repo https://github.com/belbio/bel_resources
  • 16. Term Completion Examples – BioDati Studio
  • 17. Namespaces Count EG 20,750,186 TAX 1,736,298 SP 557,012 AFFX 327,392 CHEBI 106,644 MGI 57,532 RGD 44,972 GO 44,922 HGNC 41,315 ZFIN 23,388 MESH 19,223 UBERON 13,232 DO 8,699 CL 2,194 EFO 937 Terminology Statistics
  • 19. BioDati Services BioDati Studio Data Stores (Nanopubs, Networks, Edges) Terminologies (collection, normalization, search, completions) Consulting – BEL-related
  • 20. Acknowledgements NatalieCatlett, PatientsLikeMe Anselmo DiFabio, BioDati David Chen Tony Bargnesi Nick Bargnesi
  • 21. Additional resources http://bel.bio http://biodati.com http://medium.com/biodati JSON philosophy: https://towardsdatascience.com/my-love-affair- with-json-edaca39e8320 https://github.com/belbio
  • 22. location: 1501 Main Street, Rahway, NJ 07065 | call: 732-764-8844 | online: biodati.com Anselmo Di Fabio adifabio@biodati.com William Hayes whayes@biodati.com

Editor's Notes

  • #4: History of BEL – developed over 10 years ago by Dexter Pratt at Genstruct (renamed to Selventa) and used for biomarker development as well as drug and toxicology mechanism analysis. BEL was was open-sourced about 5 years ago by Selventa by David de Graaf.
  • #5: Seeking funding for API and Namespaces hosting