ºÝºÝߣshows by User: alasdair_gray / http://www.slideshare.net/images/logo.gif ºÝºÝߣshows by User: alasdair_gray / Tue, 09 Oct 2018 14:21:07 GMT ºÝºÝߣShare feed for ºÝºÝߣshows by User: alasdair_gray Using a Jupyter Notebook to perform a reproducible scientific analysis over semantic web sources /slideshow/using-a-jupyter-notebook-to-perform-a-reproducible-scientific-analysis-over-semantic-web-sources/118871595 comp-notebook-semsci-181009142107
In recent years there has been a reproducibility crisis in science. Computational notebooks, such as Jupyter, have been touted as one solution to this problem. However, when executing analyses over live SPARQL endpoints, we get different answers depending upon when the analysis in the notebook was executed. In this paper, we identify some of the issues discovered in trying to develop a reproducible analysis over a collection of biomedical data sources and suggest some best practice to overcome these issues.]]>

In recent years there has been a reproducibility crisis in science. Computational notebooks, such as Jupyter, have been touted as one solution to this problem. However, when executing analyses over live SPARQL endpoints, we get different answers depending upon when the analysis in the notebook was executed. In this paper, we identify some of the issues discovered in trying to develop a reproducible analysis over a collection of biomedical data sources and suggest some best practice to overcome these issues.]]>
Tue, 09 Oct 2018 14:21:07 GMT /slideshow/using-a-jupyter-notebook-to-perform-a-reproducible-scientific-analysis-over-semantic-web-sources/118871595 alasdair_gray@slideshare.net(alasdair_gray) Using a Jupyter Notebook to perform a reproducible scientific analysis over semantic web sources alasdair_gray In recent years there has been a reproducibility crisis in science. Computational notebooks, such as Jupyter, have been touted as one solution to this problem. However, when executing analyses over live SPARQL endpoints, we get different answers depending upon when the analysis in the notebook was executed. In this paper, we identify some of the issues discovered in trying to develop a reproducible analysis over a collection of biomedical data sources and suggest some best practice to overcome these issues. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/comp-notebook-semsci-181009142107-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In recent years there has been a reproducibility crisis in science. Computational notebooks, such as Jupyter, have been touted as one solution to this problem. However, when executing analyses over live SPARQL endpoints, we get different answers depending upon when the analysis in the notebook was executed. In this paper, we identify some of the issues discovered in trying to develop a reproducible analysis over a collection of biomedical data sources and suggest some best practice to overcome these issues.
Using a Jupyter Notebook to perform a reproducible scientific analysis over semantic web sources from Alasdair Gray
]]>
1382 6 https://cdn.slidesharecdn.com/ss_thumbnails/comp-notebook-semsci-181009142107-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Bioschemas Community: Developing profiles over Schema.org to make life sciences resources more findable /slideshow/bioschemas-community-developing-profiles-over-schemaorg-to-make-life-sciences-resources-more-findable/95688015 bioschemascommunity-180502104458
The Bioschemas community (http://bioschemas.org) is a loose collaboration formed by a wide range of life science resource providers and informaticians. The community is developing profiles over Schema.org to enable life science resources such as data about a specific protein, sample, or training event, to be more discoverable on the web. While the content of well-known resources such as Uniprot (for protein data) are easily discoverable, there is a long tail of specialist resources that would benefit from embedding Schema.org markup in a standardised approach. The community have developed twelve profiles for specific types of life science resources (http://bioschemas.org/specifications/), with another six at an early draft stage. For each profile, a set of use cases have been identified. These typically focus on search, but several facilitate lightweight data exchange to support data aggregators such as Identifiers.org, FAIRsharing.org, and BioSamples. The next stage of the development of a profile consists of mapping the terms used in the use cases to existing properties in Schema.org and domain ontologies. The properties are then prioritised in order to support the use cases, with a minimal set of about six properties identified, along with a larger set of recommended and optional properties. For each property, an expected cardinality is defined and where appropriate, object values are specified from controlled vocabularies. Before a profile is finalised, it must first be demonstrated that resources can deploy the markup. In this talk, we will outline the progress that has been made by the Bioschemas Community in a single year through three hackathon events. We will discuss the processes followed by the Bioschemas Community to foster collaboration, and highlight the benefits and drawbacks of using open Google documents and spreadsheets to support the community develop the profiles. We will conclude by summarising future opportunities and directions for the community. ]]>

The Bioschemas community (http://bioschemas.org) is a loose collaboration formed by a wide range of life science resource providers and informaticians. The community is developing profiles over Schema.org to enable life science resources such as data about a specific protein, sample, or training event, to be more discoverable on the web. While the content of well-known resources such as Uniprot (for protein data) are easily discoverable, there is a long tail of specialist resources that would benefit from embedding Schema.org markup in a standardised approach. The community have developed twelve profiles for specific types of life science resources (http://bioschemas.org/specifications/), with another six at an early draft stage. For each profile, a set of use cases have been identified. These typically focus on search, but several facilitate lightweight data exchange to support data aggregators such as Identifiers.org, FAIRsharing.org, and BioSamples. The next stage of the development of a profile consists of mapping the terms used in the use cases to existing properties in Schema.org and domain ontologies. The properties are then prioritised in order to support the use cases, with a minimal set of about six properties identified, along with a larger set of recommended and optional properties. For each property, an expected cardinality is defined and where appropriate, object values are specified from controlled vocabularies. Before a profile is finalised, it must first be demonstrated that resources can deploy the markup. In this talk, we will outline the progress that has been made by the Bioschemas Community in a single year through three hackathon events. We will discuss the processes followed by the Bioschemas Community to foster collaboration, and highlight the benefits and drawbacks of using open Google documents and spreadsheets to support the community develop the profiles. We will conclude by summarising future opportunities and directions for the community. ]]>
Wed, 02 May 2018 10:44:58 GMT /slideshow/bioschemas-community-developing-profiles-over-schemaorg-to-make-life-sciences-resources-more-findable/95688015 alasdair_gray@slideshare.net(alasdair_gray) Bioschemas Community: Developing profiles over Schema.org to make life sciences resources more findable alasdair_gray The Bioschemas community (http://bioschemas.org) is a loose collaboration formed by a wide range of life science resource providers and informaticians. The community is developing profiles over Schema.org to enable life science resources such as data about a specific protein, sample, or training event, to be more discoverable on the web. While the content of well-known resources such as Uniprot (for protein data) are easily discoverable, there is a long tail of specialist resources that would benefit from embedding Schema.org markup in a standardised approach. The community have developed twelve profiles for specific types of life science resources (http://bioschemas.org/specifications/), with another six at an early draft stage. For each profile, a set of use cases have been identified. These typically focus on search, but several facilitate lightweight data exchange to support data aggregators such as Identifiers.org, FAIRsharing.org, and BioSamples. The next stage of the development of a profile consists of mapping the terms used in the use cases to existing properties in Schema.org and domain ontologies. The properties are then prioritised in order to support the use cases, with a minimal set of about six properties identified, along with a larger set of recommended and optional properties. For each property, an expected cardinality is defined and where appropriate, object values are specified from controlled vocabularies. Before a profile is finalised, it must first be demonstrated that resources can deploy the markup. In this talk, we will outline the progress that has been made by the Bioschemas Community in a single year through three hackathon events. We will discuss the processes followed by the Bioschemas Community to foster collaboration, and highlight the benefits and drawbacks of using open Google documents and spreadsheets to support the community develop the profiles. We will conclude by summarising future opportunities and directions for the community. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/bioschemascommunity-180502104458-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> The Bioschemas community (http://bioschemas.org) is a loose collaboration formed by a wide range of life science resource providers and informaticians. The community is developing profiles over Schema.org to enable life science resources such as data about a specific protein, sample, or training event, to be more discoverable on the web. While the content of well-known resources such as Uniprot (for protein data) are easily discoverable, there is a long tail of specialist resources that would benefit from embedding Schema.org markup in a standardised approach. The community have developed twelve profiles for specific types of life science resources (http://bioschemas.org/specifications/), with another six at an early draft stage. For each profile, a set of use cases have been identified. These typically focus on search, but several facilitate lightweight data exchange to support data aggregators such as Identifiers.org, FAIRsharing.org, and BioSamples. The next stage of the development of a profile consists of mapping the terms used in the use cases to existing properties in Schema.org and domain ontologies. The properties are then prioritised in order to support the use cases, with a minimal set of about six properties identified, along with a larger set of recommended and optional properties. For each property, an expected cardinality is defined and where appropriate, object values are specified from controlled vocabularies. Before a profile is finalised, it must first be demonstrated that resources can deploy the markup. In this talk, we will outline the progress that has been made by the Bioschemas Community in a single year through three hackathon events. We will discuss the processes followed by the Bioschemas Community to foster collaboration, and highlight the benefits and drawbacks of using open Google documents and spreadsheets to support the community develop the profiles. We will conclude by summarising future opportunities and directions for the community.
Bioschemas Community: Developing profiles over Schema.org to make life sciences resources more findable from Alasdair Gray
]]>
1406 2 https://cdn.slidesharecdn.com/ss_thumbnails/bioschemascommunity-180502104458-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
An Identifier Scheme for the Digitising Scotland Project /slideshow/an-identifier-scheme-for-the-digitising-scotland-project/76556741 ab110gray-170601105903
The Digitising Scotland project is having the vital records of Scotland transcribed from images of the original handwritten civil registers . Linking the resulting dataset of 24 million vital records covering the lives of 18 million people is a major challenge requiring improved record linkage techniques. Discussions within the multidisciplinary, widely distributed Digitising Scotland project team have been hampered by the teams in each of the institutions using their own identification scheme. To enable fruitful discussions within the Digitising Scotland team, we required a mechanism for uniquely identifying each individual represented on the certificates. From the identifier it should be possible to determine the type of certificate and the role each person played. We have devised a protocol to generate for any individual on the certificate a unique identifier, without using a computer, by exploiting the National Records of Scotland•À_s registration districts. Importantly, the approach does not rely on the handwritten content of the certificates which reduces the risk of the content being misread resulting in an incorrect identifier. The resulting identifier scheme has improved the internal discussions within the project. This paper discusses the rationale behind the chosen identifier scheme, and presents the format of the different identifiers. The work reported in the paper was supported by the British ESRC under grants ES/K00574X/1(Digitising Scotland) and ES/L007487/1 (Administrative Data Research Center - Scotland).]]>

The Digitising Scotland project is having the vital records of Scotland transcribed from images of the original handwritten civil registers . Linking the resulting dataset of 24 million vital records covering the lives of 18 million people is a major challenge requiring improved record linkage techniques. Discussions within the multidisciplinary, widely distributed Digitising Scotland project team have been hampered by the teams in each of the institutions using their own identification scheme. To enable fruitful discussions within the Digitising Scotland team, we required a mechanism for uniquely identifying each individual represented on the certificates. From the identifier it should be possible to determine the type of certificate and the role each person played. We have devised a protocol to generate for any individual on the certificate a unique identifier, without using a computer, by exploiting the National Records of Scotland•À_s registration districts. Importantly, the approach does not rely on the handwritten content of the certificates which reduces the risk of the content being misread resulting in an incorrect identifier. The resulting identifier scheme has improved the internal discussions within the project. This paper discusses the rationale behind the chosen identifier scheme, and presents the format of the different identifiers. The work reported in the paper was supported by the British ESRC under grants ES/K00574X/1(Digitising Scotland) and ES/L007487/1 (Administrative Data Research Center - Scotland).]]>
Thu, 01 Jun 2017 10:59:03 GMT /slideshow/an-identifier-scheme-for-the-digitising-scotland-project/76556741 alasdair_gray@slideshare.net(alasdair_gray) An Identifier Scheme for the Digitising Scotland Project alasdair_gray The Digitising Scotland project is having the vital records of Scotland transcribed from images of the original handwritten civil registers . Linking the resulting dataset of 24 million vital records covering the lives of 18 million people is a major challenge requiring improved record linkage techniques. Discussions within the multidisciplinary, widely distributed Digitising Scotland project team have been hampered by the teams in each of the institutions using their own identification scheme. To enable fruitful discussions within the Digitising Scotland team, we required a mechanism for uniquely identifying each individual represented on the certificates. From the identifier it should be possible to determine the type of certificate and the role each person played. We have devised a protocol to generate for any individual on the certificate a unique identifier, without using a computer, by exploiting the National Records of Scotland•À_s registration districts. Importantly, the approach does not rely on the handwritten content of the certificates which reduces the risk of the content being misread resulting in an incorrect identifier. The resulting identifier scheme has improved the internal discussions within the project. This paper discusses the rationale behind the chosen identifier scheme, and presents the format of the different identifiers. The work reported in the paper was supported by the British ESRC under grants ES/K00574X/1(Digitising Scotland) and ES/L007487/1 (Administrative Data Research Center - Scotland). <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/ab110gray-170601105903-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> The Digitising Scotland project is having the vital records of Scotland transcribed from images of the original handwritten civil registers . Linking the resulting dataset of 24 million vital records covering the lives of 18 million people is a major challenge requiring improved record linkage techniques. Discussions within the multidisciplinary, widely distributed Digitising Scotland project team have been hampered by the teams in each of the institutions using their own identification scheme. To enable fruitful discussions within the Digitising Scotland team, we required a mechanism for uniquely identifying each individual represented on the certificates. From the identifier it should be possible to determine the type of certificate and the role each person played. We have devised a protocol to generate for any individual on the certificate a unique identifier, without using a computer, by exploiting the National Records of Scotland•À_s registration districts. Importantly, the approach does not rely on the handwritten content of the certificates which reduces the risk of the content being misread resulting in an incorrect identifier. The resulting identifier scheme has improved the internal discussions within the project. This paper discusses the rationale behind the chosen identifier scheme, and presents the format of the different identifiers. The work reported in the paper was supported by the British ESRC under grants ES/K00574X/1(Digitising Scotland) and ES/L007487/1 (Administrative Data Research Center - Scotland).
An Identifier Scheme for the Digitising Scotland Project from Alasdair Gray
]]>
1622 5 https://cdn.slidesharecdn.com/ss_thumbnails/ab110gray-170601105903-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Supporting Dataset Descriptions in the Life Sciences /slideshow/supporting-dataset-descriptions-in-the-life-sciences/74583117 graydatadescvalidation2017-170406211830
Machine processable descriptions of datasets can help make data more FAIR; that is Findable, Accessible, Interoperable, and Reusable. However, there are a variety of metadata profiles for describing datasets, some specific to the life sciences and others more generic in their focus. Each profile has its own set of properties and requirements as to which must be provided and which are more optional. Developing a dataset description for a given dataset to conform to a specific metadata profile is a challenging process. In this talk, I will give an overview of some of the dataset description specifications that are available. I will discuss the difficulties in writing a dataset description that conforms to a profile and the tooling that I've developed to support dataset publishers in creating metadata description and validating them against a chosen specification. Seminar talk given at the EBI on 5 April 2017 ]]>

Machine processable descriptions of datasets can help make data more FAIR; that is Findable, Accessible, Interoperable, and Reusable. However, there are a variety of metadata profiles for describing datasets, some specific to the life sciences and others more generic in their focus. Each profile has its own set of properties and requirements as to which must be provided and which are more optional. Developing a dataset description for a given dataset to conform to a specific metadata profile is a challenging process. In this talk, I will give an overview of some of the dataset description specifications that are available. I will discuss the difficulties in writing a dataset description that conforms to a profile and the tooling that I've developed to support dataset publishers in creating metadata description and validating them against a chosen specification. Seminar talk given at the EBI on 5 April 2017 ]]>
Thu, 06 Apr 2017 21:18:29 GMT /slideshow/supporting-dataset-descriptions-in-the-life-sciences/74583117 alasdair_gray@slideshare.net(alasdair_gray) Supporting Dataset Descriptions in the Life Sciences alasdair_gray Machine processable descriptions of datasets can help make data more FAIR; that is Findable, Accessible, Interoperable, and Reusable. However, there are a variety of metadata profiles for describing datasets, some specific to the life sciences and others more generic in their focus. Each profile has its own set of properties and requirements as to which must be provided and which are more optional. Developing a dataset description for a given dataset to conform to a specific metadata profile is a challenging process. In this talk, I will give an overview of some of the dataset description specifications that are available. I will discuss the difficulties in writing a dataset description that conforms to a profile and the tooling that I've developed to support dataset publishers in creating metadata description and validating them against a chosen specification. Seminar talk given at the EBI on 5 April 2017 <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/graydatadescvalidation2017-170406211830-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Machine processable descriptions of datasets can help make data more FAIR; that is Findable, Accessible, Interoperable, and Reusable. However, there are a variety of metadata profiles for describing datasets, some specific to the life sciences and others more generic in their focus. Each profile has its own set of properties and requirements as to which must be provided and which are more optional. Developing a dataset description for a given dataset to conform to a specific metadata profile is a challenging process. In this talk, I will give an overview of some of the dataset description specifications that are available. I will discuss the difficulties in writing a dataset description that conforms to a profile and the tooling that I&#39;ve developed to support dataset publishers in creating metadata description and validating them against a chosen specification. Seminar talk given at the EBI on 5 April 2017
Supporting Dataset Descriptions in the Life Sciences from Alasdair Gray
]]>
1170 4 https://cdn.slidesharecdn.com/ss_thumbnails/graydatadescvalidation2017-170406211830-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Tutorial: Describing Datasets with the Health Care and Life Sciences Community Profile /slideshow/tutorial-describing-datasets-with-the-health-care-and-life-sciences-community-profile/69865304 hclstutorial-161206092115
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting HCLS community profile covers elements of description, identification, attribution, versioning, provenance, and content summarization. The HCLS community profile reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. The goal of this tutorial is to explain elements of the HCLS community profile and to enable users to craft and validate descriptions for datasets of interest. ]]>

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting HCLS community profile covers elements of description, identification, attribution, versioning, provenance, and content summarization. The HCLS community profile reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. The goal of this tutorial is to explain elements of the HCLS community profile and to enable users to craft and validate descriptions for datasets of interest. ]]>
Tue, 06 Dec 2016 09:21:15 GMT /slideshow/tutorial-describing-datasets-with-the-health-care-and-life-sciences-community-profile/69865304 alasdair_gray@slideshare.net(alasdair_gray) Tutorial: Describing Datasets with the Health Care and Life Sciences Community Profile alasdair_gray Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting HCLS community profile covers elements of description, identification, attribution, versioning, provenance, and content summarization. The HCLS community profile reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. The goal of this tutorial is to explain elements of the HCLS community profile and to enable users to craft and validate descriptions for datasets of interest. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/hclstutorial-161206092115-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting HCLS community profile covers elements of description, identification, attribution, versioning, provenance, and content summarization. The HCLS community profile reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. The goal of this tutorial is to explain elements of the HCLS community profile and to enable users to craft and validate descriptions for datasets of interest.
Tutorial: Describing Datasets with the Health Care and Life Sciences Community Profile from Alasdair Gray
]]>
1481 7 https://cdn.slidesharecdn.com/ss_thumbnails/hclstutorial-161206092115-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Validata: A tool for testing profile conformance /slideshow/validata-a-tool-for-testing-profile-conformance/69715024 grayvalidata-sdsvoc2016-161201091843
Validata (http://hw-swel.github.io/Validata/) is an online web application for validating a dataset description expressed in RDF against a community profile expressed as a Shape Expression (ShEx). Additionally it provides an API for programmatic access to the validator. Validata is capable of being used for multiple community agreed standards, e.g. DCAT, the HCLS community profile, or the Open PHACTS guidelines, and there are currently deployments to support each of these. Validata can be easily repurposed for different deployments by providing it with a new ShEx schema. The Validata code is available from GitHub (https://github.com/HW-SWeL/Validata). Presentation given at SDSVoc https://www.w3.org/2016/11/sdsvoc]]>

Validata (http://hw-swel.github.io/Validata/) is an online web application for validating a dataset description expressed in RDF against a community profile expressed as a Shape Expression (ShEx). Additionally it provides an API for programmatic access to the validator. Validata is capable of being used for multiple community agreed standards, e.g. DCAT, the HCLS community profile, or the Open PHACTS guidelines, and there are currently deployments to support each of these. Validata can be easily repurposed for different deployments by providing it with a new ShEx schema. The Validata code is available from GitHub (https://github.com/HW-SWeL/Validata). Presentation given at SDSVoc https://www.w3.org/2016/11/sdsvoc]]>
Thu, 01 Dec 2016 09:18:42 GMT /slideshow/validata-a-tool-for-testing-profile-conformance/69715024 alasdair_gray@slideshare.net(alasdair_gray) Validata: A tool for testing profile conformance alasdair_gray Validata (http://hw-swel.github.io/Validata/) is an online web application for validating a dataset description expressed in RDF against a community profile expressed as a Shape Expression (ShEx). Additionally it provides an API for programmatic access to the validator. Validata is capable of being used for multiple community agreed standards, e.g. DCAT, the HCLS community profile, or the Open PHACTS guidelines, and there are currently deployments to support each of these. Validata can be easily repurposed for different deployments by providing it with a new ShEx schema. The Validata code is available from GitHub (https://github.com/HW-SWeL/Validata). Presentation given at SDSVoc https://www.w3.org/2016/11/sdsvoc <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/grayvalidata-sdsvoc2016-161201091843-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Validata (http://hw-swel.github.io/Validata/) is an online web application for validating a dataset description expressed in RDF against a community profile expressed as a Shape Expression (ShEx). Additionally it provides an API for programmatic access to the validator. Validata is capable of being used for multiple community agreed standards, e.g. DCAT, the HCLS community profile, or the Open PHACTS guidelines, and there are currently deployments to support each of these. Validata can be easily repurposed for different deployments by providing it with a new ShEx schema. The Validata code is available from GitHub (https://github.com/HW-SWeL/Validata). Presentation given at SDSVoc https://www.w3.org/2016/11/sdsvoc
Validata: A tool for testing profile conformance from Alasdair Gray
]]>
1776 5 https://cdn.slidesharecdn.com/ss_thumbnails/grayvalidata-sdsvoc2016-161201091843-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
The HCLS Community Profile: Describing Datasets, Versions, and Distributions /slideshow/the-hcls-community-profile-describing-datasets-versions-and-distributions/69672115 grayhcls-sdsvoc2016-161130091444
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting HCLS community profile covers elements of description, identification, attribution, versioning, provenance, and content summarization. The HCLS community profile reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. The goal of this presentation is to give an overview of the HCLS Community Profile and explain how it extends and builds upon other approaches. Presentation given at SDSVoc (https://www.w3.org/2016/11/sdsvoc/)]]>

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting HCLS community profile covers elements of description, identification, attribution, versioning, provenance, and content summarization. The HCLS community profile reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. The goal of this presentation is to give an overview of the HCLS Community Profile and explain how it extends and builds upon other approaches. Presentation given at SDSVoc (https://www.w3.org/2016/11/sdsvoc/)]]>
Wed, 30 Nov 2016 09:14:44 GMT /slideshow/the-hcls-community-profile-describing-datasets-versions-and-distributions/69672115 alasdair_gray@slideshare.net(alasdair_gray) The HCLS Community Profile: Describing Datasets, Versions, and Distributions alasdair_gray Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting HCLS community profile covers elements of description, identification, attribution, versioning, provenance, and content summarization. The HCLS community profile reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. The goal of this presentation is to give an overview of the HCLS Community Profile and explain how it extends and builds upon other approaches. Presentation given at SDSVoc (https://www.w3.org/2016/11/sdsvoc/) <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/grayhcls-sdsvoc2016-161130091444-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting HCLS community profile covers elements of description, identification, attribution, versioning, provenance, and content summarization. The HCLS community profile reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. The goal of this presentation is to give an overview of the HCLS Community Profile and explain how it extends and builds upon other approaches. Presentation given at SDSVoc (https://www.w3.org/2016/11/sdsvoc/)
The HCLS Community Profile: Describing Datasets, Versions, and Distributions from Alasdair Gray
]]>
1483 4 https://cdn.slidesharecdn.com/ss_thumbnails/grayhcls-sdsvoc2016-161130091444-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Open PHACTS: The Data Today /alasdair_gray/open-phacts-the-data-today opsthedatatoday-160218133529
Presentation given at the Open PHACTS project symposium. The slides give an overview of the data in the 2.0 Open PHACTS drug discovery platform and the challenges that have been faced in the Open PHACTS project to reach this stage.]]>

Presentation given at the Open PHACTS project symposium. The slides give an overview of the data in the 2.0 Open PHACTS drug discovery platform and the challenges that have been faced in the Open PHACTS project to reach this stage.]]>
Thu, 18 Feb 2016 13:35:29 GMT /alasdair_gray/open-phacts-the-data-today alasdair_gray@slideshare.net(alasdair_gray) Open PHACTS: The Data Today alasdair_gray Presentation given at the Open PHACTS project symposium. The slides give an overview of the data in the 2.0 Open PHACTS drug discovery platform and the challenges that have been faced in the Open PHACTS project to reach this stage. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/opsthedatatoday-160218133529-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presentation given at the Open PHACTS project symposium. The slides give an overview of the data in the 2.0 Open PHACTS drug discovery platform and the challenges that have been faced in the Open PHACTS project to reach this stage.
Open PHACTS: The Data Today from Alasdair Gray
]]>
2087 5 https://cdn.slidesharecdn.com/ss_thumbnails/opsthedatatoday-160218133529-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Project X /slideshow/project-x-55874811/55874811 rpaiuxdqwt8zt2qbrrw8-signature-a8d9d3c47a5b4f6f7c511933af7a4e00ca271f69fa05fb43dc4f993daf7dc183-poli-151206231148-lva1-app6891
This presentation was prepared for my faculty Christmas conference. Abstract: For the last 11 months I have been working on a top secret project with a world renowned Scandinavian industry partner. We are now moving into the exciting operational phase of this project. I have been granted an early lifting of the embargo that has stopped me talking about this work up until now. I will talk about the data science behind this big data project and how semantic web technology has enabled the delivery of Project X.]]>

This presentation was prepared for my faculty Christmas conference. Abstract: For the last 11 months I have been working on a top secret project with a world renowned Scandinavian industry partner. We are now moving into the exciting operational phase of this project. I have been granted an early lifting of the embargo that has stopped me talking about this work up until now. I will talk about the data science behind this big data project and how semantic web technology has enabled the delivery of Project X.]]>
Sun, 06 Dec 2015 23:11:48 GMT /slideshow/project-x-55874811/55874811 alasdair_gray@slideshare.net(alasdair_gray) Project X alasdair_gray This presentation was prepared for my faculty Christmas conference. Abstract: For the last 11 months I have been working on a top secret project with a world renowned Scandinavian industry partner. We are now moving into the exciting operational phase of this project. I have been granted an early lifting of the embargo that has stopped me talking about this work up until now. I will talk about the data science behind this big data project and how semantic web technology has enabled the delivery of Project X. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/rpaiuxdqwt8zt2qbrrw8-signature-a8d9d3c47a5b4f6f7c511933af7a4e00ca271f69fa05fb43dc4f993daf7dc183-poli-151206231148-lva1-app6891-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> This presentation was prepared for my faculty Christmas conference. Abstract: For the last 11 months I have been working on a top secret project with a world renowned Scandinavian industry partner. We are now moving into the exciting operational phase of this project. I have been granted an early lifting of the embargo that has stopped me talking about this work up until now. I will talk about the data science behind this big data project and how semantic web technology has enabled the delivery of Project X.
Project X from Alasdair Gray
]]>
1769 5 https://cdn.slidesharecdn.com/ss_thumbnails/rpaiuxdqwt8zt2qbrrw8-signature-a8d9d3c47a5b4f6f7c511933af7a4e00ca271f69fa05fb43dc4f993daf7dc183-poli-151206231148-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Data Integration in a Big Data Context: An Open PHACTS Case Study /slideshow/data-integration-in-a-big-data-context-an-open-phacts-case-study/53121173 data-integration-big-data-150923174112-lva1-app6891
Keynote presentation at the EU Ambient Assisted Living Forum workshop The Crusade for Big Data in the AAL Domain. The presentation explores the Open PHACTS project and how it overcame various Big Data challenges.]]>

Keynote presentation at the EU Ambient Assisted Living Forum workshop The Crusade for Big Data in the AAL Domain. The presentation explores the Open PHACTS project and how it overcame various Big Data challenges.]]>
Wed, 23 Sep 2015 17:41:11 GMT /slideshow/data-integration-in-a-big-data-context-an-open-phacts-case-study/53121173 alasdair_gray@slideshare.net(alasdair_gray) Data Integration in a Big Data Context: An Open PHACTS Case Study alasdair_gray Keynote presentation at the EU Ambient Assisted Living Forum workshop The Crusade for Big Data in the AAL Domain. The presentation explores the Open PHACTS project and how it overcame various Big Data challenges. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/data-integration-big-data-150923174112-lva1-app6891-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Keynote presentation at the EU Ambient Assisted Living Forum workshop The Crusade for Big Data in the AAL Domain. The presentation explores the Open PHACTS project and how it overcame various Big Data challenges.
Data Integration in a Big Data Context: An Open PHACTS Case Study from Alasdair Gray
]]>
1697 7 https://cdn.slidesharecdn.com/ss_thumbnails/data-integration-big-data-150923174112-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Data Integration in a Big Data Context /slideshow/data-integration-in-a-big-data-context/52355382 data-integration-big-data-150902200447-lva1-app6892
Data is being generated all around us – from our smart phones tracking our movement through a city to the city itself sensing various properties and reacting to various conditions. However, to maximise the potential from all this data, it needs to be combined and coerced into models that enable analysis and interpretation. In this talk I will give an overview of the techniques that I have developed for data integration: integrating streams of sensor data with background contextual data and supporting multiple interpretations of linking data together. At the end of the talk I will overview the work I will be conducting in the Administrative Data Research Centre for Scotland.]]>

Data is being generated all around us – from our smart phones tracking our movement through a city to the city itself sensing various properties and reacting to various conditions. However, to maximise the potential from all this data, it needs to be combined and coerced into models that enable analysis and interpretation. In this talk I will give an overview of the techniques that I have developed for data integration: integrating streams of sensor data with background contextual data and supporting multiple interpretations of linking data together. At the end of the talk I will overview the work I will be conducting in the Administrative Data Research Centre for Scotland.]]>
Wed, 02 Sep 2015 20:04:47 GMT /slideshow/data-integration-in-a-big-data-context/52355382 alasdair_gray@slideshare.net(alasdair_gray) Data Integration in a Big Data Context alasdair_gray Data is being generated all around us – from our smart phones tracking our movement through a city to the city itself sensing various properties and reacting to various conditions. However, to maximise the potential from all this data, it needs to be combined and coerced into models that enable analysis and interpretation. In this talk I will give an overview of the techniques that I have developed for data integration: integrating streams of sensor data with background contextual data and supporting multiple interpretations of linking data together. At the end of the talk I will overview the work I will be conducting in the Administrative Data Research Centre for Scotland. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/data-integration-big-data-150902200447-lva1-app6892-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Data is being generated all around us – from our smart phones tracking our movement through a city to the city itself sensing various properties and reacting to various conditions. However, to maximise the potential from all this data, it needs to be combined and coerced into models that enable analysis and interpretation. In this talk I will give an overview of the techniques that I have developed for data integration: integrating streams of sensor data with background contextual data and supporting multiple interpretations of linking data together. At the end of the talk I will overview the work I will be conducting in the Administrative Data Research Centre for Scotland.
Data Integration in a Big Data Context from Alasdair Gray
]]>
2494 9 https://cdn.slidesharecdn.com/ss_thumbnails/data-integration-big-data-150902200447-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Data Linkage /slideshow/data-linkage/46761994 datalinkage-150408044421-conversion-gate01
Many areas of scientific discovery rely on combining data from multiples data sources. However there are many challenges in linking data. This presentation highlights these challenges in the context of using Linked Data for environmental and social science databases.]]>

Many areas of scientific discovery rely on combining data from multiples data sources. However there are many challenges in linking data. This presentation highlights these challenges in the context of using Linked Data for environmental and social science databases.]]>
Wed, 08 Apr 2015 04:44:21 GMT /slideshow/data-linkage/46761994 alasdair_gray@slideshare.net(alasdair_gray) Data Linkage alasdair_gray Many areas of scientific discovery rely on combining data from multiples data sources. However there are many challenges in linking data. This presentation highlights these challenges in the context of using Linked Data for environmental and social science databases. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/datalinkage-150408044421-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Many areas of scientific discovery rely on combining data from multiples data sources. However there are many challenges in linking data. This presentation highlights these challenges in the context of using Linked Data for environmental and social science databases.
Data Linkage from Alasdair Gray
]]>
1838 4 https://cdn.slidesharecdn.com/ss_thumbnails/datalinkage-150408044421-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Scientific lenses to support multiple views over linked chemistry data /alasdair_gray/scientific-lenses-to-support-multiple-views-over-linked-chemistry-data sci-lenses-iswc-inuse-141020094823-conversion-gate01
When are two entries about a small molecule in different datasets the same? If they have the same drug name, chemical structure, or some other criteria? The choice depends upon the application to which the data will be put. However, existing Linked Data approaches provide a single global view over the data with no way of varying the notion of equivalence to be applied. In this paper, we present an approach to enable applications to choose the equivalence criteria to apply between datasets. Thus, supporting multiple dynamic views over the Linked Data. For chemical data, we show that multiple sets of links can be automatically generated according to different equivalence criteria and published with semantic descriptions capturing their context and interpretation. This approach has been applied within a large scale public-private data integration platform for drug discovery. To cater for different use cases, the platform allows the application of different lenses which vary the equivalence rules to be applied based on the context and interpretation of the links.]]>

When are two entries about a small molecule in different datasets the same? If they have the same drug name, chemical structure, or some other criteria? The choice depends upon the application to which the data will be put. However, existing Linked Data approaches provide a single global view over the data with no way of varying the notion of equivalence to be applied. In this paper, we present an approach to enable applications to choose the equivalence criteria to apply between datasets. Thus, supporting multiple dynamic views over the Linked Data. For chemical data, we show that multiple sets of links can be automatically generated according to different equivalence criteria and published with semantic descriptions capturing their context and interpretation. This approach has been applied within a large scale public-private data integration platform for drug discovery. To cater for different use cases, the platform allows the application of different lenses which vary the equivalence rules to be applied based on the context and interpretation of the links.]]>
Mon, 20 Oct 2014 09:48:23 GMT /alasdair_gray/scientific-lenses-to-support-multiple-views-over-linked-chemistry-data alasdair_gray@slideshare.net(alasdair_gray) Scientific lenses to support multiple views over linked chemistry data alasdair_gray When are two entries about a small molecule in different datasets the same? If they have the same drug name, chemical structure, or some other criteria? The choice depends upon the application to which the data will be put. However, existing Linked Data approaches provide a single global view over the data with no way of varying the notion of equivalence to be applied. In this paper, we present an approach to enable applications to choose the equivalence criteria to apply between datasets. Thus, supporting multiple dynamic views over the Linked Data. For chemical data, we show that multiple sets of links can be automatically generated according to different equivalence criteria and published with semantic descriptions capturing their context and interpretation. This approach has been applied within a large scale public-private data integration platform for drug discovery. To cater for different use cases, the platform allows the application of different lenses which vary the equivalence rules to be applied based on the context and interpretation of the links. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sci-lenses-iswc-inuse-141020094823-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> When are two entries about a small molecule in different datasets the same? If they have the same drug name, chemical structure, or some other criteria? The choice depends upon the application to which the data will be put. However, existing Linked Data approaches provide a single global view over the data with no way of varying the notion of equivalence to be applied. In this paper, we present an approach to enable applications to choose the equivalence criteria to apply between datasets. Thus, supporting multiple dynamic views over the Linked Data. For chemical data, we show that multiple sets of links can be automatically generated according to different equivalence criteria and published with semantic descriptions capturing their context and interpretation. This approach has been applied within a large scale public-private data integration platform for drug discovery. To cater for different use cases, the platform allows the application of different lenses which vary the equivalence rules to be applied based on the context and interpretation of the links.
Scientific lenses to support multiple views over linked chemistry data from Alasdair Gray
]]>
3686 2 https://cdn.slidesharecdn.com/ss_thumbnails/sci-lenses-iswc-inuse-141020094823-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Scientific Lenses over Linked Data �An approach to support multiple integrated views /slideshow/scientific-lenses-over-linked-data-an-approach-to-support-multiple-integrated-views/40319735 sci-lenses-bolzano-141015153821-conversion-gate01
When are two entries about a concept in different datasets the same? If they have the same name, properties, or some other criteria? The choice depends upon the application to which the data will be put. However, existing Linked Data approaches provide a single global view over the data with no way of varying the notion of equivalence to be applied. In this presentation, I will introduce Scientific lenses, an approach that enables applications to vary the equivalence conditions between linked datasets. They have been deployed in the Open PHACTS Discovery Platform – a large scale data integration platform for drug discovery. To cater for different use cases, the platform allows the application of different lenses which vary the equivalence rules to be applied based on the context and interpretation of the links.]]>

When are two entries about a concept in different datasets the same? If they have the same name, properties, or some other criteria? The choice depends upon the application to which the data will be put. However, existing Linked Data approaches provide a single global view over the data with no way of varying the notion of equivalence to be applied. In this presentation, I will introduce Scientific lenses, an approach that enables applications to vary the equivalence conditions between linked datasets. They have been deployed in the Open PHACTS Discovery Platform – a large scale data integration platform for drug discovery. To cater for different use cases, the platform allows the application of different lenses which vary the equivalence rules to be applied based on the context and interpretation of the links.]]>
Wed, 15 Oct 2014 15:38:21 GMT /slideshow/scientific-lenses-over-linked-data-an-approach-to-support-multiple-integrated-views/40319735 alasdair_gray@slideshare.net(alasdair_gray) Scientific Lenses over Linked Data �An approach to support multiple integrated views alasdair_gray When are two entries about a concept in different datasets the same? If they have the same name, properties, or some other criteria? The choice depends upon the application to which the data will be put. However, existing Linked Data approaches provide a single global view over the data with no way of varying the notion of equivalence to be applied. In this presentation, I will introduce Scientific lenses, an approach that enables applications to vary the equivalence conditions between linked datasets. They have been deployed in the Open PHACTS Discovery Platform – a large scale data integration platform for drug discovery. To cater for different use cases, the platform allows the application of different lenses which vary the equivalence rules to be applied based on the context and interpretation of the links. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sci-lenses-bolzano-141015153821-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> When are two entries about a concept in different datasets the same? If they have the same name, properties, or some other criteria? The choice depends upon the application to which the data will be put. However, existing Linked Data approaches provide a single global view over the data with no way of varying the notion of equivalence to be applied. In this presentation, I will introduce Scientific lenses, an approach that enables applications to vary the equivalence conditions between linked datasets. They have been deployed in the Open PHACTS Discovery Platform – a large scale data integration platform for drug discovery. To cater for different use cases, the platform allows the application of different lenses which vary the equivalence rules to be applied based on the context and interpretation of the links.
Scientific Lenses over Linked Data An approach to support multiple integrated views from Alasdair Gray
]]>
938 3 https://cdn.slidesharecdn.com/ss_thumbnails/sci-lenses-bolzano-141015153821-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Describing Scientific Datasets: The HCLS Community Profile /slideshow/datadeschcls/39474506 datadeschcls-140924084057-phpapp01
Big Data presents an exciting opportunity to pursue large-scale analyses over collections of data in order to uncover valuable insights across a myriad of fields and disciplines. Yet, as more and more data is made available, researchers are finding it increasingly difficult to discover and reuse these data. One problem is that data are insufficiently described to understand what they are or how they were produced. A second issue is that no single vocabulary provides all key metadata fields required to support basic scientific use cases. A third issue is that data catalogs and data repositories all use different metadata standards, if they use any standard at all, and this prevents easy search and aggregation of data. Therefore, we need a community profile to indicate what are the essential metadata, and the manner in which we can express it. The W3C Health Care and Life Sciences Interest Group have developed such a community profile that defines the required properties to provide high-quality dataset descriptions that support finding, understanding, and reusing scientific data, i.e. making the data FAIR (Findable, Accessible, Interoperable and Re-usable – http://datafairport.org). The specification reuses many notions and vocabulary terms from Dublin Core, DCAT and VoID, with provenance and versioning information being provided by PROV-O and PAV. The community profile is based around a three tier model; the summary description captures catalogue style metadata about the dataset, each version of the dataset is described separately as are the various distribution formats of these versions. The resulting community profile is generic and applicable to a wide variety of scientific data. Tools are being developed to help with the creation and validation of these descriptions. Several datasets including those from Bio2RDF, EBI and IntegBio are already moving to release descriptions conforming to the community profile.]]>

Big Data presents an exciting opportunity to pursue large-scale analyses over collections of data in order to uncover valuable insights across a myriad of fields and disciplines. Yet, as more and more data is made available, researchers are finding it increasingly difficult to discover and reuse these data. One problem is that data are insufficiently described to understand what they are or how they were produced. A second issue is that no single vocabulary provides all key metadata fields required to support basic scientific use cases. A third issue is that data catalogs and data repositories all use different metadata standards, if they use any standard at all, and this prevents easy search and aggregation of data. Therefore, we need a community profile to indicate what are the essential metadata, and the manner in which we can express it. The W3C Health Care and Life Sciences Interest Group have developed such a community profile that defines the required properties to provide high-quality dataset descriptions that support finding, understanding, and reusing scientific data, i.e. making the data FAIR (Findable, Accessible, Interoperable and Re-usable – http://datafairport.org). The specification reuses many notions and vocabulary terms from Dublin Core, DCAT and VoID, with provenance and versioning information being provided by PROV-O and PAV. The community profile is based around a three tier model; the summary description captures catalogue style metadata about the dataset, each version of the dataset is described separately as are the various distribution formats of these versions. The resulting community profile is generic and applicable to a wide variety of scientific data. Tools are being developed to help with the creation and validation of these descriptions. Several datasets including those from Bio2RDF, EBI and IntegBio are already moving to release descriptions conforming to the community profile.]]>
Wed, 24 Sep 2014 08:40:56 GMT /slideshow/datadeschcls/39474506 alasdair_gray@slideshare.net(alasdair_gray) Describing Scientific Datasets: The HCLS Community Profile alasdair_gray Big Data presents an exciting opportunity to pursue large-scale analyses over collections of data in order to uncover valuable insights across a myriad of fields and disciplines. Yet, as more and more data is made available, researchers are finding it increasingly difficult to discover and reuse these data. One problem is that data are insufficiently described to understand what they are or how they were produced. A second issue is that no single vocabulary provides all key metadata fields required to support basic scientific use cases. A third issue is that data catalogs and data repositories all use different metadata standards, if they use any standard at all, and this prevents easy search and aggregation of data. Therefore, we need a community profile to indicate what are the essential metadata, and the manner in which we can express it. The W3C Health Care and Life Sciences Interest Group have developed such a community profile that defines the required properties to provide high-quality dataset descriptions that support finding, understanding, and reusing scientific data, i.e. making the data FAIR (Findable, Accessible, Interoperable and Re-usable – http://datafairport.org). The specification reuses many notions and vocabulary terms from Dublin Core, DCAT and VoID, with provenance and versioning information being provided by PROV-O and PAV. The community profile is based around a three tier model; the summary description captures catalogue style metadata about the dataset, each version of the dataset is described separately as are the various distribution formats of these versions. The resulting community profile is generic and applicable to a wide variety of scientific data. Tools are being developed to help with the creation and validation of these descriptions. Several datasets including those from Bio2RDF, EBI and IntegBio are already moving to release descriptions conforming to the community profile. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/datadeschcls-140924084057-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Big Data presents an exciting opportunity to pursue large-scale analyses over collections of data in order to uncover valuable insights across a myriad of fields and disciplines. Yet, as more and more data is made available, researchers are finding it increasingly difficult to discover and reuse these data. One problem is that data are insufficiently described to understand what they are or how they were produced. A second issue is that no single vocabulary provides all key metadata fields required to support basic scientific use cases. A third issue is that data catalogs and data repositories all use different metadata standards, if they use any standard at all, and this prevents easy search and aggregation of data. Therefore, we need a community profile to indicate what are the essential metadata, and the manner in which we can express it. The W3C Health Care and Life Sciences Interest Group have developed such a community profile that defines the required properties to provide high-quality dataset descriptions that support finding, understanding, and reusing scientific data, i.e. making the data FAIR (Findable, Accessible, Interoperable and Re-usable – http://datafairport.org). The specification reuses many notions and vocabulary terms from Dublin Core, DCAT and VoID, with provenance and versioning information being provided by PROV-O and PAV. The community profile is based around a three tier model; the summary description captures catalogue style metadata about the dataset, each version of the dataset is described separately as are the various distribution formats of these versions. The resulting community profile is generic and applicable to a wide variety of scientific data. Tools are being developed to help with the creation and validation of these descriptions. Several datasets including those from Bio2RDF, EBI and IntegBio are already moving to release descriptions conforming to the community profile.
Describing Scientific Datasets: The HCLS Community Profile from Alasdair Gray
]]>
1843 2 https://cdn.slidesharecdn.com/ss_thumbnails/datadeschcls-140924084057-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
SensorBench /slideshow/sensorbench/38104983 jn5qjh7dshy8bdak0i67-signature-a105a3b37a0596dae70a736e366fd6267bca481fff67a4832aaa0283a4c42836-poli-140818143812-phpapp01
SensorBench is a benchmark suite for wireless sensor networks. The design of wireless sensor network systems sits within a multi-dimensional design space, where it can be difficult to understand the implications of specific decisions and to identify optimal solutions. SensorBench enables the systematic analysis and comparison of different techniques and platforms, enabling both development and user communities to make well informed choices. The benchmark identifies key variables and performance metrics, and specifies experiments that explore how different types of task perform under different metrics for the controlled variables. The benchmark is demonstrated by its application on representative platforms. Full details of the benchmark are available from http://dl.acm.org/citation.cfm?id=2618252 (DOI: 10.1145/2618243.2618252)]]>

SensorBench is a benchmark suite for wireless sensor networks. The design of wireless sensor network systems sits within a multi-dimensional design space, where it can be difficult to understand the implications of specific decisions and to identify optimal solutions. SensorBench enables the systematic analysis and comparison of different techniques and platforms, enabling both development and user communities to make well informed choices. The benchmark identifies key variables and performance metrics, and specifies experiments that explore how different types of task perform under different metrics for the controlled variables. The benchmark is demonstrated by its application on representative platforms. Full details of the benchmark are available from http://dl.acm.org/citation.cfm?id=2618252 (DOI: 10.1145/2618243.2618252)]]>
Mon, 18 Aug 2014 14:38:12 GMT /slideshow/sensorbench/38104983 alasdair_gray@slideshare.net(alasdair_gray) SensorBench alasdair_gray SensorBench is a benchmark suite for wireless sensor networks. The design of wireless sensor network systems sits within a multi-dimensional design space, where it can be difficult to understand the implications of specific decisions and to identify optimal solutions. SensorBench enables the systematic analysis and comparison of different techniques and platforms, enabling both development and user communities to make well informed choices. The benchmark identifies key variables and performance metrics, and specifies experiments that explore how different types of task perform under different metrics for the controlled variables. The benchmark is demonstrated by its application on representative platforms. Full details of the benchmark are available from http://dl.acm.org/citation.cfm?id=2618252 (DOI: 10.1145/2618243.2618252) <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/jn5qjh7dshy8bdak0i67-signature-a105a3b37a0596dae70a736e366fd6267bca481fff67a4832aaa0283a4c42836-poli-140818143812-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> SensorBench is a benchmark suite for wireless sensor networks. The design of wireless sensor network systems sits within a multi-dimensional design space, where it can be difficult to understand the implications of specific decisions and to identify optimal solutions. SensorBench enables the systematic analysis and comparison of different techniques and platforms, enabling both development and user communities to make well informed choices. The benchmark identifies key variables and performance metrics, and specifies experiments that explore how different types of task perform under different metrics for the controlled variables. The benchmark is demonstrated by its application on representative platforms. Full details of the benchmark are available from http://dl.acm.org/citation.cfm?id=2618252 (DOI: 10.1145/2618243.2618252)
SensorBench from Alasdair Gray
]]>
1573 1 https://cdn.slidesharecdn.com/ss_thumbnails/jn5qjh7dshy8bdak0i67-signature-a105a3b37a0596dae70a736e366fd6267bca481fff67a4832aaa0283a4c42836-poli-140818143812-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Data Science meets Linked Data /slideshow/data-science-meets-linked-data/36568432 graylinkeddata-140702155958-phpapp02
What are the research and technical challenges of linked data that are relevant to data science? This presentation introduces the ideas of linked data using the BBC sport web site as an example. It then identifies several research challenges that remain to be addressed. ]]>

What are the research and technical challenges of linked data that are relevant to data science? This presentation introduces the ideas of linked data using the BBC sport web site as an example. It then identifies several research challenges that remain to be addressed. ]]>
Wed, 02 Jul 2014 15:59:58 GMT /slideshow/data-science-meets-linked-data/36568432 alasdair_gray@slideshare.net(alasdair_gray) Data Science meets Linked Data alasdair_gray What are the research and technical challenges of linked data that are relevant to data science? This presentation introduces the ideas of linked data using the BBC sport web site as an example. It then identifies several research challenges that remain to be addressed. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/graylinkeddata-140702155958-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> What are the research and technical challenges of linked data that are relevant to data science? This presentation introduces the ideas of linked data using the BBC sport web site as an example. It then identifies several research challenges that remain to be addressed.
Data Science meets Linked Data from Alasdair Gray
]]>
1200 3 https://cdn.slidesharecdn.com/ss_thumbnails/graylinkeddata-140702155958-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Sensors and Big Data for Health and Well-being /slideshow/sensors-and-big-data-for-health-and-wellbeing/36107598 graysensorsbigdataweb-140620074849-phpapp02
]]>

]]>
Fri, 20 Jun 2014 07:48:49 GMT /slideshow/sensors-and-big-data-for-health-and-wellbeing/36107598 alasdair_gray@slideshare.net(alasdair_gray) Sensors and Big Data for Health and Well-being alasdair_gray <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/graysensorsbigdataweb-140620074849-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br>
Sensors and Big Data for Health and Well-being from Alasdair Gray
]]>
1408 2 https://cdn.slidesharecdn.com/ss_thumbnails/graysensorsbigdataweb-140620074849-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Scientific Lenses over Linked Data: Identity Management in the �Open PHACTS project /slideshow/scientific-lenses-over-linked-data-identity-management-in-the-open-phacts-project/35895604 htzcjmqmris1mhi6ezqt-140615152951-phpapp01
]]>

]]>
Sun, 15 Jun 2014 15:29:50 GMT /slideshow/scientific-lenses-over-linked-data-identity-management-in-the-open-phacts-project/35895604 alasdair_gray@slideshare.net(alasdair_gray) Scientific Lenses over Linked Data: Identity Management in the �Open PHACTS project alasdair_gray <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/htzcjmqmris1mhi6ezqt-140615152951-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br>
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project from Alasdair Gray
]]>
581 2 https://cdn.slidesharecdn.com/ss_thumbnails/htzcjmqmris1mhi6ezqt-140615152951-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Dataset Descriptions in Open PHACTS and HCLS /slideshow/dataset-descriptions-in-open-phacts-and-hcls/33024320 2013-01-14-ops-dataset-descriptions-140402034124-phpapp01
This presentation gives an overview of the dataset description specification developed in the Open PHACTS project (http://www.openphacts.org/). The creation of the specification was driven by a real need within the project to track the datasets used. Details of the dataset metadata captured and the vocabularies used to model this metadata are given together with the tools developed to enable the specification's uptake. Over the course of the last 12 months, the W3C Healthcare and Life Science Interest Group have been developing a community profile for dataset descriptions. This has drawn on the ideas developed in the Open PHACTS specification. A brief overview of the forthcoming community profile is given in the presentation. This presentation was given to the Network Data Exchange project http://www.ndexbio.org/ on 2 April 2014.]]>

This presentation gives an overview of the dataset description specification developed in the Open PHACTS project (http://www.openphacts.org/). The creation of the specification was driven by a real need within the project to track the datasets used. Details of the dataset metadata captured and the vocabularies used to model this metadata are given together with the tools developed to enable the specification's uptake. Over the course of the last 12 months, the W3C Healthcare and Life Science Interest Group have been developing a community profile for dataset descriptions. This has drawn on the ideas developed in the Open PHACTS specification. A brief overview of the forthcoming community profile is given in the presentation. This presentation was given to the Network Data Exchange project http://www.ndexbio.org/ on 2 April 2014.]]>
Wed, 02 Apr 2014 03:41:24 GMT /slideshow/dataset-descriptions-in-open-phacts-and-hcls/33024320 alasdair_gray@slideshare.net(alasdair_gray) Dataset Descriptions in Open PHACTS and HCLS alasdair_gray This presentation gives an overview of the dataset description specification developed in the Open PHACTS project (http://www.openphacts.org/). The creation of the specification was driven by a real need within the project to track the datasets used. Details of the dataset metadata captured and the vocabularies used to model this metadata are given together with the tools developed to enable the specification's uptake. Over the course of the last 12 months, the W3C Healthcare and Life Science Interest Group have been developing a community profile for dataset descriptions. This has drawn on the ideas developed in the Open PHACTS specification. A brief overview of the forthcoming community profile is given in the presentation. This presentation was given to the Network Data Exchange project http://www.ndexbio.org/ on 2 April 2014. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/2013-01-14-ops-dataset-descriptions-140402034124-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> This presentation gives an overview of the dataset description specification developed in the Open PHACTS project (http://www.openphacts.org/). The creation of the specification was driven by a real need within the project to track the datasets used. Details of the dataset metadata captured and the vocabularies used to model this metadata are given together with the tools developed to enable the specification&#39;s uptake. Over the course of the last 12 months, the W3C Healthcare and Life Science Interest Group have been developing a community profile for dataset descriptions. This has drawn on the ideas developed in the Open PHACTS specification. A brief overview of the forthcoming community profile is given in the presentation. This presentation was given to the Network Data Exchange project http://www.ndexbio.org/ on 2 April 2014.
Dataset Descriptions in Open PHACTS and HCLS from Alasdair Gray
]]>
589 2 https://cdn.slidesharecdn.com/ss_thumbnails/2013-01-14-ops-dataset-descriptions-140402034124-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-alasdair_gray-48x48.jpg?cb=1565361314 www.alasdairjggray.co.uk https://cdn.slidesharecdn.com/ss_thumbnails/comp-notebook-semsci-181009142107-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/using-a-jupyter-notebook-to-perform-a-reproducible-scientific-analysis-over-semantic-web-sources/118871595 Using a Jupyter Notebo... https://cdn.slidesharecdn.com/ss_thumbnails/bioschemascommunity-180502104458-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/bioschemas-community-developing-profiles-over-schemaorg-to-make-life-sciences-resources-more-findable/95688015 Bioschemas Community: ... https://cdn.slidesharecdn.com/ss_thumbnails/ab110gray-170601105903-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/an-identifier-scheme-for-the-digitising-scotland-project/76556741 An Identifier Scheme f...