Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Oct 24, 20112 likes2,405 views

This document presents SPLENDID, a system for federated querying across linked data sources. It uses Vocabulary of Interlinked Datasets (VoiD) descriptions to select relevant sources and optimize query planning and execution. The system applies techniques from distributed database systems to federated SPARQL querying, including dynamic programming for join ordering and statistics-based cost estimation. An evaluation using the FedBench suite found it efficiently selects sources and executes queries, outperforming state-of-the-art federated querying systems by leveraging VoiD descriptions and statistics. Future work includes integrating it with other systems and improving its cost models.

Institute for Web Science and Technologies
University of Koblenz 鈻� Landau, Germany

SPLENDID: SPARQL Endpoint Federation
Exploiting VOID Descriptions

Olaf G枚rlitz, Steffen Staab

Motivation

How to access a large number of linked data sources?

WeST Institute Olaf G枚rlitz
People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 2

Data Integration Approaches

Data Warehouse Link Traversal

飦� Efficient query execution 飦� Live Data Access
飦� Complete results 飦� Flexible / On Demand
飦� Data copies 飦� Incomplete results
飦� Inflexible 飦� Biased by starting point

WeST Institute Olaf G枚rlitz
People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 3

Our Approach

Data Federation

Live data access
Flexible source integration
Effective query planning
Complete results

Hypothesis:
Efficient query federation is possible using core Semantic
Web technology (i.e. SPARQL endpoints, VoiD descriptions)

WeST Institute Olaf G枚rlitz
People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 4

VoiD: 鈥濾ocabulary of Interlinked Datasets鈥�

} General Information

} Basic statistics
triples = 732744

} Type statistics
chebi:Compound = 50477

} Predicate statistics
bio:formula = 39555

WeST Institute Olaf G枚rlitz
People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 5

Distributed Query Processing

Contribution:
Apply Best Practices of RDBMS for RDF Federation

http://code.google.com/p/rdffederator/
WeST Institute Olaf G枚rlitz
People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 6

Query Example

Which drugs are categorized as micronutrients?

SELECT听?drug听?title听WHERE听{
听听?drug听drugbank:drugCategory听category:micronutrient听.
听听?drug听drugbank:casRegistryNumber听?id听.
听听?keggDrug听rdf:type听kegg:Drug听.
听听?keggDrug听bio2rdf:xRef听?id听.
听听?keggDrug听purl:title听?title听.听}
}

WeST Institute Olaf G枚rlitz
People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 7

Query Processing

Source Selection Join Optimization Query Execution

SELECT听?drug听?title听WHERE听{
听听?drug听drugbank:drugCategory听category:micronutrient听.
听听?drug听drugbank:casRegistryNumber听?id听.
听听?keggDrug听rdf:type听kegg:Drug听.
听听?keggDrug听bio2rdf:xRef听?id听.
听听?keggDrug听purl:title听?title听.听}
}

WeST Institute Olaf G枚rlitz
People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 8

Query Processing

Source Selection Join Optimization Query Execution

1. Step: Index-based source mapping

SELECT听?drug听?title听WHERE听{
听听?drug听drugbank:drugCategory听category:micronutrient听. 鈫� drugbank
听听?drug听drugbank:casRegistryNumber听?id听. 鈫� drugbank
听听?keggDrug听rdf:type听kegg:Drug听. 鈫� kegg
听听?keggDrug听bio2rdf:xRef听?id听. 鈫� kegg
听听?keggDrug听purl:title听?title听.听} 鈫� kegg, dbpedia, Chebi
}

predicate-index type-index
drugbank:drugCategory 鈫� drugbank kegg:Drug 鈫� kegg

WeST Institute Olaf G枚rlitz
People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 9

Query Processing

Source Selection Join Optimization Query Execution

2. Step: Refinement with ASK Queries

SELECT听?drug听?title听WHERE听{
听听?drug听drugbank:drugCategory听category:micronutrient听.
听听?drug听drugbank:casRegistryNumber听?id听.
听听?keggDrug听rdf:type听kegg:Drug听.
听听?keggDrug听bio2rdf:xRef听?id听.
听听?keggDrug听purl:title听?title听.听}
}

No index for subject / object values

WeST Institute Olaf G枚rlitz
People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 10

Query Processing

Source Selection Join Optimization Query Execution

3. Step: Grouping Triple Patterns

SELECT听?drug听?title听WHERE听{
听听?drug听drugbank:drugCategory听category:micronutrient听.
听听?drug听drugbank:casRegistryNumber听?id听. } drugbank
听听?keggDrug听rdf:type听kegg:Drug听.
听听?keggDrug听bio2rdf:xRef听?id听. } kegg
听听?keggDrug听purl:title听?title听.听} } kegg, dbpedia, Chebi
}

+ grouping sameAs patterns

WeST Institute Olaf G枚rlitz
People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 11

Join Order Optimization

Source Selection Join Optimization Query Execution

Dynamic Programming with statistics-based cost estimation

bind join /
hash join

WeST Institute Olaf G枚rlitz
People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 12

Evaluation

FedBench Evaluation Suite Measuring
鈥� Life Science + Cross Domain Data 鈥� #data sources selected
鈥� different query characteristics 鈥� query execution time

Orthogonal State-of-the-Art approaches:
DARQ AliBaba FedX SPLENDID
Statistics ServiceDesc 鈥� 鈥� VoiD
Source Statistics All sources ASK queries Statistics +
Selection (predicates) ASK queries
Query DynProg Heuristics Heuristics DynProg
Optimization
Query Bind join Bind join Bound Join + Bind Join +
Execution parallelization Hash Join

WeST Institute Olaf G枚rlitz
People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 13

Evaluation: Source Selection

Source Selection Join Optimization Query Execution

owl:sameAs rdf:type

WeST Institute Olaf G枚rlitz
People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 14

Evaluation: Query Optimization

Source Selection Join Optimization Query Execution

WeST Institute Olaf G枚rlitz
People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 15

Conclusion

Publish more VoiD description!

VoiD-based query federation is efficient

What next?
飦� Combination with FedX
飦� Improving estimation and cost model
飦� Integrating SPARQL 1.1 features
WeST Institute Olaf G枚rlitz
People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 16

This document discusses crowdsourcing tasks in open query answering on Linked Data. It provides examples of tasks that could be crowdsourced, such as interlinking entities, conceptual modeling, labeling and translation, classification, and ordering. It also discusses challenges with decomposing queries for crowdsourcing, querying execution and caching results, designing appropriate human intelligence tasks, and pricing and assigning those tasks to workers.

Rinaldi - ODINPascale Gaudet

听

This document describes a study using the ODIN text mining system to extract relationships between genes, drugs, and diseases from biomedical literature and validate those relationships against the PharmGKB knowledge base. The researchers developed methods to improve relationship ranking and conducted a revalidation experiment with curators from Stanford evaluating a sample of automatically extracted relationships. The curators provided feedback that led to improvements in the interactive curation interface to better suit their needs. Lessons were learned about obtaining user requirements and rapidly implementing and testing prototypes to develop usable curation tools.

PhD DefenseRoman Prokofyev

听

Roman Prokofyev's PhD thesis focuses on entity-centric knowledge discovery for idiosyncratic domains. The thesis outlines contributions in four areas: named entity recognition, co-reference resolution, entity disambiguation, and tag recommendation. Evaluation of the approaches demonstrates improved performance over state-of-the-art methods, with gains of over 10% precision in entity disambiguation. The work extracts structured knowledge from unstructured text in specialized domains to enable automated processing and targeted question answering systems.

BioPAX Models and PathwaysMichel Dumontier

听

The document discusses BioPAX, a standard language for representing biological pathway data. It notes that BioPAX aims to enable integration, exchange, visualization and analysis of pathway data by formalizing terminology as an OWL ontology and instantiating data that validates against the ontology. However, it states that BioPAX data is not yet ready for the semantic web due to issues like duplicity in database terminology and lack of resolvable identifiers in cross-references. It suggests addressing these issues by normalizing cross-references through identifiers.org and maintaining the type of relationship in cross-references through more specific predicate properties.

EDF2012 Peter Boncz - LOD benchmarking SRbenchEuropean Data Forum

听

SRbench is a benchmark for streaming RDF storage engines that was developed by Ying Zhang and Peter Boncz of CWI Amsterdam. It uses real-world linked open data sets and defines queries and implementations in natural language and languages like SPARQLStream and C-SPARQL to evaluate streaming RDF databases. The benchmark addresses the challenges of streaming RDF data by using appropriate datasets from the linked open data cloud and supporting semantics in stream queries. Future work will focus on performance evaluation and verifying benchmark results.

Recent improvements to the RDKitNextMove Software

听

The document discusses improvements to the maxminpicker algorithm in the RDKit for selecting diverse subsets of compounds from large datasets. It describes the maxminpicker concept of selecting compounds furthest from already picked compounds to optimize diversity. The key improvements discussed are avoiding distance matrices, preserving distance bounds between iterations, and using linked lists instead of distance matrices to improve performance from days to hours for large datasets.

Linking clinical data standardsKerstin Forsberg

听

The document discusses linking clinical data standards to the Semantic Web. It begins by explaining the difference between the traditional web of documents and the emerging web of linked data. It then provides examples of linked open government data from the UK and US. The presentation considers opportunities for applying linked data principles to linking clinical study metadata and data across the industry. Pragmatic first steps discussed include learning from other projects, expressing CDISC standards as linked data using URIs, and publishing trial summary parameters as RDF.

Linked data presentation for who umc 21 jan 2015Kerstin Forsberg

听

Linked Data efforts for data standards in biopharma and healthcareKerstin Forsberg

听

1) The document discusses efforts to represent biomedical data standards like CDISC, HL7 FHIR, MeSH, ICD-11, and others in semantic web formats like RDF and OWL to make them machine-processable. 2) It describes projects that have converted various standards to RDF through the work of groups like CDISC2RDF and PhUSE, and efforts to engage traditional standards bodies. 3) However, it notes that pushing standards organizations to adopt semantic web approaches requires ongoing knowledge sharing and community building, and that spreadsheets still see significant use.

Homophones homographs & homonyms list with examples PDFlearningenglishvocabularygrammar.com

听

This document defines and provides examples of homophones, homonyms, and homographs. Homophones are words that sound the same but have different spellings and meanings. Homonyms are words that sound the same and may be spelled the same, but have different meanings. Homographs are words that are spelled the same but have different pronunciations and meanings. The document provides lists of common homophones and examples of how to identify homophones, homonyms, and homographs in context to determine their meaning. It also includes practice problems for readers to apply their understanding.

Prepositions of movement in English pictures and videoslearningenglishvocabularygrammar.com

听

Fce travel and holidaysJavier Martos

听

The document provides vocabulary and prompts for speaking practice on various topics including travel, shopping, food, and hobbies. It includes photographs and instructions for discussing preferences and making comparisons related to activities like going on holiday, shopping in different stores, choosing adventure holidays, and deciding on hobbies. Candidates are prompted to discuss advantages and choices as it relates to the given topics and visual aids provided.

Homographs听homophones and homonyms list in PDF.learningenglishvocabularygrammar.com

听

This document defines and provides examples of homophones, homographs, and homonyms. Homophones are words that sound alike but have different meanings and spellings. Examples given include accept/except and aloud/allowed. Homographs are words that are spelled the same but have different meanings, such as close/close and minute/minute. Homonyms are words that are the same in sound and spelling but have different meanings, like left/left and stalk/stalk. Sentences using the different types of words are also provided. The document concludes with instructions for a dictionary project involving homophones, homographs, and homonyms.

Semantic Blockchains in the Supply ChainChristopher Brewster

听

Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban

听

The document discusses scientific workflows, provenance, and linked data. It covers: 1) Scientific workflows can automate data analysis at scale, abstract complex processes, and capture provenance for transparency. 2) Provenance represents the origin and history of data and can be represented using standards like PROV. It allows reasoning about how results were produced. 3) Capturing and publishing provenance as linked open data can help make scientific results more reusable and queryable, but challenges remain around multi-site studies and producing human-readable reports.

SPLODGE: Systematic Generation of SPARQL Benchmark Queries for Linked Open DataOlafGoerlitz

听

This document describes a methodology for systematically generating SPARQL benchmark queries for Linked Open Data. The methodology aims to generate queries that are scalable, flexible, and expressive. It involves parameterizing queries, automatically generating valid queries based on the parameters, and validating the queries to ensure they return results. The methodology was evaluated on real Linked Data using different techniques for query generation. The results demonstrate that the methodology can successfully generate queries with nonempty results and that incorporating background knowledge improves the number and size of results.

BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BigData_Europe

听

Distributed Query Processing for Federated RDF Data ManagementOlafGoerlitz

听

Sem tech 2011 v8dallemang

听

Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Databricks

听

It is widely known that the discovery, development, and commercialization of new classes of drugs can take 10-15 years and greater than $5 billion in R&D investment only to see less than 5% of the drugs make it to market. AstraZeneca is a global, innovation-driven biopharmaceutical business that focuses on the discovery, development, and commercialization of prescription medicines for some of the world鈥檚 most serious diseases. Our scientists have been able to improve our success rate over the past 5 years by moving to a data-driven approach (the 鈥�5R鈥�) to help develop better drugs faster, choose the right treatment for a patient and run safer clinical trials. However, our scientists are still unable to make these decisions with all of the available scientific information at their fingertips. Data is sparse across our company as well as external public databases, every new technology requires a different data processing pipeline and new data comes at an increasing pace. It is often repeated that a new scientific paper appears every 30 seconds, which makes it impossible for any individual expert to keep up-to-date with the pace of scientific discovery. To help our scientists integrate all of this information and make targeted decisions, we have used Spark on Azure Databricks to build a knowledge graph of biological insights and facts. The graph powers a recommendation system which enables any AZ scientist to generate novel target hypotheses, for any disease, leveraging all of our data. In this talk, I will describe the applications of our knowledge graph and focus on the Spark pipelines we built to quickly assemble and create projections of the graph from 100s of sources. I will also describe the NLP pipelines we have built 鈥� leveraging spacy, bioBERT or snorkel 鈥� to reliably extract meaningful relations between entities and add them to our knowledge graph.

Semantically Enhanced Interactions between Heterogeneous Data Life-Cycles - A...Basil Ell

听

This presentation was given at MTSR 2013 - 7th Metadata and Semantics Research Conference, Thessaloniki, and is related the publication of the same title. Abstract of the publication: This paper highlights how Semantic Web technologies facilitate new socio-technical interactions between researchers and libraries focussing research data in a Virtual Research Environment. Concerning data practices in the fields of social sciences and humanities, the worlds of researchers and librarians have so far been separate. The increased digitization of research data and the ubiquitous use of Web technologies change this situation and offer new capacities for interaction. This is realized as a semantically enhanced Virtual Research Environment, which offers the possibility to align the previously disparate data life-cycles in research and in libraries covering a variety of inter-activities from importing research data via enriching research data and cleansing to exporting and sharing to allow for reuse. Currently, collaborative qualitative and quantitative analyses of a large digital corpus of educational lexica are carried out using this semantic and wiki-based research environment. The publication is available at http://www.aifb.kit.edu/images/a/ac/MTSR2013_publication_-_Basil_Ell%3B_Christoph_Schindler%3B_Marc_Rittberger.pdf

Interlinking educational data to Web of Data (Thesis presentation)Enayat Rajabi

听

2013 01-14 ops-dataset_descriptionsAlasdair Gray

听

Alice: "What version of ChEMBL are we using?" Bob: "Er鈥et me check. It's going to take a while, I'll get back to you." This simple question took us the best part of a month to resolve and involved several individuals. Knowing the provenance of your data is essential, especially when using large complex systems that process multiple datasets. The underlying issues of this simple question motivated us to improve the provenance data in the Open PHACTS project. We developed a guideline for dataset descriptions where the metadata is carried with the data. In this talk I will highlight the challenges we faced and give an overview of our metadata guidelines. Presentation given to the W3C Semantic Web for Health Care and Life Sciences Interest Group on 14 January 2013.

FAIRer ResearchCarole Goble

听

This document summarizes Professor Carole Goble's presentation on making research more reproducible and FAIR (Findable, Accessible, Interoperable, Reusable) through the use of research objects and related standards and infrastructure. It discusses challenges to reproducibility in computational research and proposes bundling datasets, workflows, software and other research products into standardized research objects that can be cited and shared to help address these challenges.

Linked Data for Federation of OER Data & RepositoriesStefan Dietze

听

Knowledge Discovery using an Integrated Semantic WebMichel Dumontier

听

The document discusses HyQue, a system for knowledge discovery that facilitates hypothesis formulation and evaluation by leveraging Semantic Web technologies to provide access to facts, expert knowledge, and web services. HyQue uses an event-based data model and domain rules to calculate a quantitative measure of evidence for hypothesized events. It aims to enable users to pose a hypothesis and have the system automatically evaluate it using available data, ontologies, and services.

2009 0807 Lod GmodJun Zhao

听

Opening up pharmacological space, the OPEN PHACTs apiChris Evelo

听

The document provides an overview of the Open PHACTS project, which aims to create an open pharmacological space (OPS) through semantic integration of public drug discovery resources. It discusses the challenges of accessing and integrating scientific data across organizational boundaries. Open PHACTS builds a service layer and applications to allow standardized access and analysis of data from various public sources. It is a collaborative project involving academic and industry partners seeking to make pre-competitive drug discovery data more accessible and useful through semantic integration and common standards.

Data101 pmcb retreat_09-20-13_finalJackie Wirz, PhD

听

This document provides an overview of data management best practices. It discusses defining data through metadata and naming conventions, dealing with data through version control, backups, and standards, and sharing data through repositories and publications. The presenters recommend describing data thoroughly using metadata, naming files consistently, tracking versions of files, backing up data in multiple locations, using standards like controlled vocabularies, and sharing data to advance science. The OHSU Library can help with all aspects of data management.

10-EOSC_Symposium_Zeyd_Boukhers.pptx (1).pdfZeyd Boukhers

听

Zeyd Boukhers from Fraunhofer Institute presented on FAIR Data Spaces, a project aiming to create a common data space for industry, science, and society. The project is working to overcome challenges with current data exchange practices by establishing rules and standards. It is developing demonstrators to showcase data exchange between Gaia-X and various research infrastructure initiatives. Upcoming demonstrators will focus on biodiversity data, research data quality assurance, and cross-platform healthcare data analytics. FAIR Data Spaces also announced an open call for additional demonstrators from other domains.

More Related Content

Viewers also liked (6)

Linked Data efforts for data standards in biopharma and healthcareKerstin Forsberg

听

Homophones homographs & homonyms list with examples PDFlearningenglishvocabularygrammar.com

听

Prepositions of movement in English pictures and videoslearningenglishvocabularygrammar.com

听

Fce travel and holidaysJavier Martos

听

Homographs听homophones and homonyms list in PDF.learningenglishvocabularygrammar.com

听

Semantic Blockchains in the Supply ChainChristopher Brewster

听

Linked Data efforts for data standards in biopharma and healthcareKerstin Forsberg

听

Homophones homographs & homonyms list with examples PDFlearningenglishvocabularygrammar.com

听

Prepositions of movement in English pictures and videoslearningenglishvocabularygrammar.com

听

Fce travel and holidaysJavier Martos

听

Homographs听homophones and homonyms list in PDF.learningenglishvocabularygrammar.com

听

Semantic Blockchains in the Supply ChainChristopher Brewster

听

Similar to Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions (20)

Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban

听

SPLODGE: Systematic Generation of SPARQL Benchmark Queries for Linked Open DataOlafGoerlitz

听

BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BigData_Europe

听

Distributed Query Processing for Federated RDF Data ManagementOlafGoerlitz

听

Sem tech 2011 v8dallemang

听

Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Databricks

听

Semantically Enhanced Interactions between Heterogeneous Data Life-Cycles - A...Basil Ell

听

Interlinking educational data to Web of Data (Thesis presentation)Enayat Rajabi

听

2013 01-14 ops-dataset_descriptionsAlasdair Gray

听

FAIRer ResearchCarole Goble

听

Linked Data for Federation of OER Data & RepositoriesStefan Dietze

听

Knowledge Discovery using an Integrated Semantic WebMichel Dumontier

听

2009 0807 Lod GmodJun Zhao

听

Opening up pharmacological space, the OPEN PHACTs apiChris Evelo

听

Data101 pmcb retreat_09-20-13_finalJackie Wirz, PhD

听

10-EOSC_Symposium_Zeyd_Boukhers.pptx (1).pdfZeyd Boukhers

听

2015 genome-centerc.titus.brown

听

This document discusses the challenges and opportunities biology faces with increasing data generation. It outlines four key points: 1) Research approaches for analyzing infinite genomic data streams, such as digital normalization which compresses data while retaining information. 2) The need for usable software and decentralized infrastructure to perform real-time, streaming data analysis. 3) The importance of open science and reproducibility given most researchers cannot replicate their own computational analyses. 4) The lack of data analysis training in biology and efforts at UC Davis to address this through workshops and community building.

Soren Auer - LOD2 - creating knowledge out of Interlinked DataOpen City Foundation

听

The document discusses the LOD2 project which aims to create knowledge from interlinked open data. It focuses on very large RDF data management, knowledge enrichment through interlinking data from different sources, and developing semantic user interfaces. The project uses use cases in media, enterprise, open government data, and public sector contracts. The goal is to develop an integrated Linked Data lifecycle management stack.

Donders neuroimage toolkit - open science and good practicesRobert Oostenveld

听

Predictive modeling DBsDataVita

听

The document describes several datasets that could be used for predictive modeling research tasks. It outlines the Netflix movie ratings dataset containing 100 million ratings, the EEG dataset with brain wave recordings from alcoholic and control subjects, the Berlin emotional speech database with recordings of actors conveying different emotions, the Wikipedia link graph with over 5 million pages and 130 million links, and a dataset with malicious and benign URLs to detect malicious sites. It provides links to access the raw data and literature about techniques that could be applied to the different predictive modeling problems.

Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban

听

SPLODGE: Systematic Generation of SPARQL Benchmark Queries for Linked Open DataOlafGoerlitz

听

BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BigData_Europe

听

Distributed Query Processing for Federated RDF Data ManagementOlafGoerlitz

听

Sem tech 2011 v8dallemang

听

Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Databricks

听

Semantically Enhanced Interactions between Heterogeneous Data Life-Cycles - A...Basil Ell

听

Interlinking educational data to Web of Data (Thesis presentation)Enayat Rajabi

听

2013 01-14 ops-dataset_descriptionsAlasdair Gray

听

FAIRer ResearchCarole Goble

听

Linked Data for Federation of OER Data & RepositoriesStefan Dietze

听

Knowledge Discovery using an Integrated Semantic WebMichel Dumontier

听

2009 0807 Lod GmodJun Zhao

听

Opening up pharmacological space, the OPEN PHACTs apiChris Evelo

听

Data101 pmcb retreat_09-20-13_finalJackie Wirz, PhD

听

10-EOSC_Symposium_Zeyd_Boukhers.pptx (1).pdfZeyd Boukhers

听

2015 genome-centerc.titus.brown

听

Soren Auer - LOD2 - creating knowledge out of Interlinked DataOpen City Foundation

听

Donders neuroimage toolkit - open science and good practicesRobert Oostenveld

听

Predictive modeling DBsDataVita

听

Recently uploaded (20)

RRB ALP CBT 2 Mechanic Motor Vehicle Question Paper (MMV Exam MCQ)SONU HEETSON

听

Inventory Reporting in Odoo 17 - Odoo 17 Inventory AppCeline George

听

ASP.NET Web API Interview Questions By ScholarhatScholarhat

听

B峄� TEST KI峄侻 TRA GI峄瓵 K脤 2 - TI岷綨G ANH 10,11,12 - CHU岷∟ FORM 2025 - GLOBAL SU...Nguyen Thanh Tu Collection

听

The basics of sentences session 5pptx.pptxheathfieldcps1

听

Mastering Soft Tissue Therapy & Sports TapingKusal Goonewardena

听

Mastering Soft Tissue Therapy & Sports Taping: Pathway to Sports Medicine Excellence This presentation was delivered in Colombo, Sri Lanka, at the Institute of Sports Medicine to an audience of sports physiotherapists, exercise scientists, athletic trainers, and healthcare professionals. Led by Kusal Goonewardena (PhD Candidate - Muscle Fatigue, APA Titled Sports & Exercise Physiotherapist) and Gayath Jayasinghe (Sports Scientist), the session provided comprehensive training on soft tissue assessment, treatment techniques, and essential sports taping methods. Key topics covered: 鉁� Soft Tissue Therapy 鈥� The science behind muscle, fascia, and joint assessment for optimal treatment outcomes. 鉁� Sports Taping Techniques 鈥� Practical applications for injury prevention and rehabilitation, including ankle, knee, shoulder, thoracic, and cervical spine taping. 鉁� Sports Trainer Level 1 Course by Sports Medicine Australia 鈥� A gateway to professional development, career opportunities, and working in Australia. This training mirrors the Elite Akademy Sports Medicine standards, ensuring evidence-based approaches to injury management and athlete care. If you are a sports professional looking to enhance your clinical skills and open doors to global opportunities, this presentation is for you.

Year 10 The Senior Phase Session 3 Term 1.pptxmansk2

听

Functional Muscle Testing of Facial Muscles.pdfSamarHosni3

听

2025 MSKMUN NEWS 1.pdf 2025 MSKMUN NEWS 1.pdf1mksmunathens

听

CRITICAL THINKING AND NURSING JUDGEMENT.pptxPoojaSen20

听

1111.pptx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxPraksha3

听

Dr. Ansari Khurshid Ahmed- Factors affecting Validity of a Test.pptxKhurshid Ahmed Ansari

听

Validity is an important characteristic of a test. A test having low validity is of little use. Validity is the accuracy with which a test measures whatever it is supposed to measure. Validity can be low, moderate or high. There are many factors which affect the validity of a test. If these factors are controlled, then the validity of the test can be maintained to a high level. In the power point presentation, factors affecting validity are discussed with the help of concrete examples.

Azure Administrator Interview Questions By ScholarHatScholarhat

听

Blind spots in AI and Formulation Science, IFPAC 2025.pdfAjaz Hussain

听

The intersection of AI and pharmaceutical formulation science highlights significant blind spots鈥攕ystemic gaps in pharmaceutical development, regulatory oversight, quality assurance, and the ethical use of AI鈥攖hat could jeopardize patient safety and undermine public trust. To move forward effectively, we must address these normalized blind spots, which may arise from outdated assumptions, errors, gaps in previous knowledge, and biases in language or regulatory inertia. This is essential to ensure that AI and formulation science are developed as tools for patient-centered and ethical healthcare.

ASP.NET Interview Questions PDF By ScholarHatScholarhat

听

Full-Stack .NET Developer Interview Questions PDF By ScholarHatScholarhat

听

Annex-A_PMES-Tool-for-Proficient-Teachers-SY-2024-2025.pptjoan dalilis

听

Intellectual Honesty & Research Integrity.pptxNidhiSharma495177

听

Admission Procedure and types in hospital pptxPoojaSen20

听

How to Configure Recurring Revenue in Odoo 17 CRMCeline George

听

RRB ALP CBT 2 Mechanic Motor Vehicle Question Paper (MMV Exam MCQ)SONU HEETSON

听

Inventory Reporting in Odoo 17 - Odoo 17 Inventory AppCeline George

听

ASP.NET Web API Interview Questions By ScholarhatScholarhat

听

B峄� TEST KI峄侻 TRA GI峄瓵 K脤 2 - TI岷綨G ANH 10,11,12 - CHU岷∟ FORM 2025 - GLOBAL SU...Nguyen Thanh Tu Collection

听

The basics of sentences session 5pptx.pptxheathfieldcps1

听

Mastering Soft Tissue Therapy & Sports TapingKusal Goonewardena

听

Year 10 The Senior Phase Session 3 Term 1.pptxmansk2

听

Functional Muscle Testing of Facial Muscles.pdfSamarHosni3

听

2025 MSKMUN NEWS 1.pdf 2025 MSKMUN NEWS 1.pdf1mksmunathens

听

CRITICAL THINKING AND NURSING JUDGEMENT.pptxPoojaSen20

听

1111.pptx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxPraksha3

听

Dr. Ansari Khurshid Ahmed- Factors affecting Validity of a Test.pptxKhurshid Ahmed Ansari

听

Azure Administrator Interview Questions By ScholarHatScholarhat

听

Blind spots in AI and Formulation Science, IFPAC 2025.pdfAjaz Hussain

听

ASP.NET Interview Questions PDF By ScholarHatScholarhat

听

Full-Stack .NET Developer Interview Questions PDF By ScholarHatScholarhat

听

Annex-A_PMES-Tool-for-Proficient-Teachers-SY-2024-2025.pptjoan dalilis

听

Intellectual Honesty & Research Integrity.pptxNidhiSharma495177

听

Admission Procedure and types in hospital pptxPoojaSen20

听

How to Configure Recurring Revenue in Odoo 17 CRMCeline George

听

Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

1. Institute for Web Science and Technologies University of Koblenz 鈻� Landau, Germany SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions Olaf G枚rlitz, Steffen Staab

2. Motivation How to access a large number of linked data sources? WeST Institute Olaf G枚rlitz People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 2

3. Data Integration Approaches Data Warehouse Link Traversal 飦� Efficient query execution 飦� Live Data Access 飦� Complete results 飦� Flexible / On Demand 飦� Data copies 飦� Incomplete results 飦� Inflexible 飦� Biased by starting point WeST Institute Olaf G枚rlitz People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 3

4. Our Approach Data Federation Live data access Flexible source integration Effective query planning Complete results Hypothesis: Efficient query federation is possible using core Semantic Web technology (i.e. SPARQL endpoints, VoiD descriptions) WeST Institute Olaf G枚rlitz People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 4

5. VoiD: 鈥濾ocabulary of Interlinked Datasets鈥� } General Information } Basic statistics triples = 732744 } Type statistics chebi:Compound = 50477 } Predicate statistics bio:formula = 39555 WeST Institute Olaf G枚rlitz People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 5

6. Distributed Query Processing Contribution: Apply Best Practices of RDBMS for RDF Federation http://code.google.com/p/rdffederator/ WeST Institute Olaf G枚rlitz People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 6

7. Query Example Which drugs are categorized as micronutrients? SELECT听?drug听?title听WHERE听{ 听听?drug听drugbank:drugCategory听category:micronutrient听. 听听?drug听drugbank:casRegistryNumber听?id听. 听听?keggDrug听rdf:type听kegg:Drug听. 听听?keggDrug听bio2rdf:xRef听?id听. 听听?keggDrug听purl:title听?title听.听} } WeST Institute Olaf G枚rlitz People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 7

8. Query Processing Source Selection Join Optimization Query Execution SELECT听?drug听?title听WHERE听{ 听听?drug听drugbank:drugCategory听category:micronutrient听. 听听?drug听drugbank:casRegistryNumber听?id听. 听听?keggDrug听rdf:type听kegg:Drug听. 听听?keggDrug听bio2rdf:xRef听?id听. 听听?keggDrug听purl:title听?title听.听} } WeST Institute Olaf G枚rlitz People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 8

9. Query Processing Source Selection Join Optimization Query Execution 1. Step: Index-based source mapping SELECT听?drug听?title听WHERE听{ 听听?drug听drugbank:drugCategory听category:micronutrient听. 鈫� drugbank 听听?drug听drugbank:casRegistryNumber听?id听. 鈫� drugbank 听听?keggDrug听rdf:type听kegg:Drug听. 鈫� kegg 听听?keggDrug听bio2rdf:xRef听?id听. 鈫� kegg 听听?keggDrug听purl:title听?title听.听} 鈫� kegg, dbpedia, Chebi } predicate-index type-index drugbank:drugCategory 鈫� drugbank kegg:Drug 鈫� kegg WeST Institute Olaf G枚rlitz People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 9

10. Query Processing Source Selection Join Optimization Query Execution 2. Step: Refinement with ASK Queries SELECT听?drug听?title听WHERE听{ 听听?drug听drugbank:drugCategory听category:micronutrient听. 听听?drug听drugbank:casRegistryNumber听?id听. 听听?keggDrug听rdf:type听kegg:Drug听. 听听?keggDrug听bio2rdf:xRef听?id听. 听听?keggDrug听purl:title听?title听.听} } No index for subject / object values WeST Institute Olaf G枚rlitz People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 10

11. Query Processing Source Selection Join Optimization Query Execution 3. Step: Grouping Triple Patterns SELECT听?drug听?title听WHERE听{ 听听?drug听drugbank:drugCategory听category:micronutrient听. 听听?drug听drugbank:casRegistryNumber听?id听. } drugbank 听听?keggDrug听rdf:type听kegg:Drug听. 听听?keggDrug听bio2rdf:xRef听?id听. } kegg 听听?keggDrug听purl:title听?title听.听} } kegg, dbpedia, Chebi } + grouping sameAs patterns WeST Institute Olaf G枚rlitz People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 11

12. Join Order Optimization Source Selection Join Optimization Query Execution Dynamic Programming with statistics-based cost estimation bind join / hash join WeST Institute Olaf G枚rlitz People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 12

13. Evaluation FedBench Evaluation Suite Measuring 鈥� Life Science + Cross Domain Data 鈥� #data sources selected 鈥� different query characteristics 鈥� query execution time Orthogonal State-of-the-Art approaches: DARQ AliBaba FedX SPLENDID Statistics ServiceDesc 鈥� 鈥� VoiD Source Statistics All sources ASK queries Statistics + Selection (predicates) ASK queries Query DynProg Heuristics Heuristics DynProg Optimization Query Bind join Bind join Bound Join + Bind Join + Execution parallelization Hash Join WeST Institute Olaf G枚rlitz People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 13

14. Evaluation: Source Selection Source Selection Join Optimization Query Execution owl:sameAs rdf:type WeST Institute Olaf G枚rlitz People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 14

15. Evaluation: Query Optimization Source Selection Join Optimization Query Execution WeST Institute Olaf G枚rlitz People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 15

16. Conclusion Publish more VoiD description! VoiD-based query federation is efficient What next? 飦� Combination with FedX 飦� Improving estimation and cost model 飦� Integrating SPARQL 1.1 features WeST Institute Olaf G枚rlitz People and Knowledge Networks COLD 2011, Bonn, Germany 狠狠撸 16

Editor's Notes

#3: Pre-selected linked datasets Transparent query federation

狠狠撸

Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Recommended

More Related Content

Viewers also liked (6)

Similar to Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions (20)

Recently uploaded (20)

Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Editor's Notes