ºÝºÝߣshows by User: DanSullivan10 / http://www.slideshare.net/images/logo.gif ºÝºÝߣshows by User: DanSullivan10 / Thu, 11 Mar 2021 17:14:05 GMT ºÝºÝߣShare feed for ºÝºÝߣshows by User: DanSullivan10 How to Design a Modern Data Warehouse in BigQuery /slideshow/how-to-design-a-modern-data-warehouse-in-big-query-2/244235928 howtodesignamoderndatawarehouseinbigquery2-210311171406
BigQuery is an analytical database designed to scale to petabyte scales. To optimize BigQuery we need to use practices and patterns that take advantage of the BigQuery architecture.]]>

BigQuery is an analytical database designed to scale to petabyte scales. To optimize BigQuery we need to use practices and patterns that take advantage of the BigQuery architecture.]]>
Thu, 11 Mar 2021 17:14:05 GMT /slideshow/how-to-design-a-modern-data-warehouse-in-big-query-2/244235928 DanSullivan10@slideshare.net(DanSullivan10) How to Design a Modern Data Warehouse in BigQuery DanSullivan10 BigQuery is an analytical database designed to scale to petabyte scales. To optimize BigQuery we need to use practices and patterns that take advantage of the BigQuery architecture. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/howtodesignamoderndatawarehouseinbigquery2-210311171406-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> BigQuery is an analytical database designed to scale to petabyte scales. To optimize BigQuery we need to use practices and patterns that take advantage of the BigQuery architecture.
How to Design a Modern Data Warehouse in BigQuery from Dan Sullivan, Ph.D.
]]>
834 0 https://cdn.slidesharecdn.com/ss_thumbnails/howtodesignamoderndatawarehouseinbigquery2-210311171406-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
With Automated ML, is Everyone an ML Engineer? /slideshow/with-automated-ml-is-everyone-an-ml-engineer/239036123 buildingmodelsingcp-201031185232
Google AutoML, AWS SageMaker and other ML tools automate some but not all steps in machine learning workflows. Learn about problem formulation, data engineering, monitoring, and fairness assessment.]]>

Google AutoML, AWS SageMaker and other ML tools automate some but not all steps in machine learning workflows. Learn about problem formulation, data engineering, monitoring, and fairness assessment.]]>
Sat, 31 Oct 2020 18:52:32 GMT /slideshow/with-automated-ml-is-everyone-an-ml-engineer/239036123 DanSullivan10@slideshare.net(DanSullivan10) With Automated ML, is Everyone an ML Engineer? DanSullivan10 Google AutoML, AWS SageMaker and other ML tools automate some but not all steps in machine learning workflows. Learn about problem formulation, data engineering, monitoring, and fairness assessment. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/buildingmodelsingcp-201031185232-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Google AutoML, AWS SageMaker and other ML tools automate some but not all steps in machine learning workflows. Learn about problem formulation, data engineering, monitoring, and fairness assessment.
With Automated ML, is Everyone an ML Engineer? from Dan Sullivan, Ph.D.
]]>
106 0 https://cdn.slidesharecdn.com/ss_thumbnails/buildingmodelsingcp-201031185232-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Getting Started with BigQuery ML /DanSullivan10/getting-started-with-bigquery-ml bigqueryml-full-201031184939
Learn how to build machine learning models in Google Cloud BigQuery using SQL.]]>

Learn how to build machine learning models in Google Cloud BigQuery using SQL.]]>
Sat, 31 Oct 2020 18:49:39 GMT /DanSullivan10/getting-started-with-bigquery-ml DanSullivan10@slideshare.net(DanSullivan10) Getting Started with BigQuery ML DanSullivan10 Learn how to build machine learning models in Google Cloud BigQuery using SQL. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/bigqueryml-full-201031184939-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Learn how to build machine learning models in Google Cloud BigQuery using SQL.
Getting Started with BigQuery ML from Dan Sullivan, Ph.D.
]]>
209 0 https://cdn.slidesharecdn.com/ss_thumbnails/bigqueryml-full-201031184939-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Google Cloud Certifications & Machine Learning /slideshow/google-cloud-certifications-machine-learning/226232006 googlecloudcertifications4-200130203332
Google Cloud Professional Data Engineer certification prepares machine learning engineers for running ML models in production. This includes DevOps tasks, such as monitoring and scaling.]]>

Google Cloud Professional Data Engineer certification prepares machine learning engineers for running ML models in production. This includes DevOps tasks, such as monitoring and scaling.]]>
Thu, 30 Jan 2020 20:33:32 GMT /slideshow/google-cloud-certifications-machine-learning/226232006 DanSullivan10@slideshare.net(DanSullivan10) Google Cloud Certifications & Machine Learning DanSullivan10 Google Cloud Professional Data Engineer certification prepares machine learning engineers for running ML models in production. This includes DevOps tasks, such as monitoring and scaling. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/googlecloudcertifications4-200130203332-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Google Cloud Professional Data Engineer certification prepares machine learning engineers for running ML models in production. This includes DevOps tasks, such as monitoring and scaling.
Google Cloud Certifications & Machine Learning from Dan Sullivan, Ph.D.
]]>
202 0 https://cdn.slidesharecdn.com/ss_thumbnails/googlecloudcertifications4-200130203332-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Unstructured text to structured data /slideshow/unstructured-text-to-structured-data/91688656 dan-unstructuredtexttostructureddata-180323145157
Text mining techniques like sentiment analysis, topic modeling, named entity extraction, and event extraction are used to map unstructured text to conventional data store structures.]]>

Text mining techniques like sentiment analysis, topic modeling, named entity extraction, and event extraction are used to map unstructured text to conventional data store structures.]]>
Fri, 23 Mar 2018 14:51:57 GMT /slideshow/unstructured-text-to-structured-data/91688656 DanSullivan10@slideshare.net(DanSullivan10) Unstructured text to structured data DanSullivan10 Text mining techniques like sentiment analysis, topic modeling, named entity extraction, and event extraction are used to map unstructured text to conventional data store structures. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/dan-unstructuredtexttostructureddata-180323145157-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Text mining techniques like sentiment analysis, topic modeling, named entity extraction, and event extraction are used to map unstructured text to conventional data store structures.
Unstructured text to structured data from Dan Sullivan, Ph.D.
]]>
1151 3 https://cdn.slidesharecdn.com/ss_thumbnails/dan-unstructuredtexttostructureddata-180323145157-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
A first look at tf idf-pdx data science meetup /slideshow/a-first-look-at-tf-idfpdx-data-science-meetup/72783594 afirstlookattf-idf-pdxdatasciencemeetup-170303181505
How to Measure Document Similarity and Build Text Classifiers: A First Look at Term Frequency-Inverse Document Frequency (TF-IDF) Representations Text data is potentially valuable for many data science projects but working with text is different from working with structured data. One representation of text that has worked well for many text mining and machine learning applications is the term frequency - inverse document frequency (TF-IDF) vector. In spite of the long winded name, this method is easy to understand, performs well in many applications, and has been implemented in commonly used data science tools. This presentation will introduce TF-IDF and show examples of how to use TF-IDF for document classification and measuring the similarity between documents. This presentation does not assume any background in text mining or natural language processing. Examples will use Python.]]>

How to Measure Document Similarity and Build Text Classifiers: A First Look at Term Frequency-Inverse Document Frequency (TF-IDF) Representations Text data is potentially valuable for many data science projects but working with text is different from working with structured data. One representation of text that has worked well for many text mining and machine learning applications is the term frequency - inverse document frequency (TF-IDF) vector. In spite of the long winded name, this method is easy to understand, performs well in many applications, and has been implemented in commonly used data science tools. This presentation will introduce TF-IDF and show examples of how to use TF-IDF for document classification and measuring the similarity between documents. This presentation does not assume any background in text mining or natural language processing. Examples will use Python.]]>
Fri, 03 Mar 2017 18:15:04 GMT /slideshow/a-first-look-at-tf-idfpdx-data-science-meetup/72783594 DanSullivan10@slideshare.net(DanSullivan10) A first look at tf idf-pdx data science meetup DanSullivan10 How to Measure Document Similarity and Build Text Classifiers: A First Look at Term Frequency-Inverse Document Frequency (TF-IDF) Representations Text data is potentially valuable for many data science projects but working with text is different from working with structured data. One representation of text that has worked well for many text mining and machine learning applications is the term frequency - inverse document frequency (TF-IDF) vector. In spite of the long winded name, this method is easy to understand, performs well in many applications, and has been implemented in commonly used data science tools. This presentation will introduce TF-IDF and show examples of how to use TF-IDF for document classification and measuring the similarity between documents. This presentation does not assume any background in text mining or natural language processing. Examples will use Python. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/afirstlookattf-idf-pdxdatasciencemeetup-170303181505-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> How to Measure Document Similarity and Build Text Classifiers: A First Look at Term Frequency-Inverse Document Frequency (TF-IDF) Representations Text data is potentially valuable for many data science projects but working with text is different from working with structured data. One representation of text that has worked well for many text mining and machine learning applications is the term frequency - inverse document frequency (TF-IDF) vector. In spite of the long winded name, this method is easy to understand, performs well in many applications, and has been implemented in commonly used data science tools. This presentation will introduce TF-IDF and show examples of how to use TF-IDF for document classification and measuring the similarity between documents. This presentation does not assume any background in text mining or natural language processing. Examples will use Python.
A first look at tf idf-pdx data science meetup from Dan Sullivan, Ph.D.
]]>
658 5 https://cdn.slidesharecdn.com/ss_thumbnails/afirstlookattf-idf-pdxdatasciencemeetup-170303181505-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Text mining meets neural nets /DanSullivan10/text-mining-meets-neural-nets textminingmeetsneuralnets-151022045538-lva1-app6892
Text mining with word embeddings (word2vec) and deep learning, particularly convolution networks compared to TF-IDF classifiers.]]>

Text mining with word embeddings (word2vec) and deep learning, particularly convolution networks compared to TF-IDF classifiers.]]>
Thu, 22 Oct 2015 04:55:37 GMT /DanSullivan10/text-mining-meets-neural-nets DanSullivan10@slideshare.net(DanSullivan10) Text mining meets neural nets DanSullivan10 Text mining with word embeddings (word2vec) and deep learning, particularly convolution networks compared to TF-IDF classifiers. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/textminingmeetsneuralnets-151022045538-lva1-app6892-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Text mining with word embeddings (word2vec) and deep learning, particularly convolution networks compared to TF-IDF classifiers.
Text mining meets neural nets from Dan Sullivan, Ph.D.
]]>
4532 11 https://cdn.slidesharecdn.com/ss_thumbnails/textminingmeetsneuralnets-151022045538-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation 000000 http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
ACID vs BASE in NoSQL: Another False Dichotomy /DanSullivan10/acid-vs-base-in-nosql-another-false-dichotomy nosqlnow2015-sullivan-acidvsbaseinnosql-150819205832-lva1-app6892
As relational and NoSQL database continue to adopt characteristic of each other, it becomes more important to understand that ACID-BASE is a spectrum. Instead of making a binary choice between ACID and BASE, developers and designers choose a combination of varying levels of data consistency, availability and network partition tolerance. This presentation briefly describes the ACID-BASE spectrum, the CAP Theorem and how to find the right balance of trade-offs for your application.]]>

As relational and NoSQL database continue to adopt characteristic of each other, it becomes more important to understand that ACID-BASE is a spectrum. Instead of making a binary choice between ACID and BASE, developers and designers choose a combination of varying levels of data consistency, availability and network partition tolerance. This presentation briefly describes the ACID-BASE spectrum, the CAP Theorem and how to find the right balance of trade-offs for your application.]]>
Wed, 19 Aug 2015 20:58:31 GMT /DanSullivan10/acid-vs-base-in-nosql-another-false-dichotomy DanSullivan10@slideshare.net(DanSullivan10) ACID vs BASE in NoSQL: Another False Dichotomy DanSullivan10 As relational and NoSQL database continue to adopt characteristic of each other, it becomes more important to understand that ACID-BASE is a spectrum. Instead of making a binary choice between ACID and BASE, developers and designers choose a combination of varying levels of data consistency, availability and network partition tolerance. This presentation briefly describes the ACID-BASE spectrum, the CAP Theorem and how to find the right balance of trade-offs for your application. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/nosqlnow2015-sullivan-acidvsbaseinnosql-150819205832-lva1-app6892-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> As relational and NoSQL database continue to adopt characteristic of each other, it becomes more important to understand that ACID-BASE is a spectrum. Instead of making a binary choice between ACID and BASE, developers and designers choose a combination of varying levels of data consistency, availability and network partition tolerance. This presentation briefly describes the ACID-BASE spectrum, the CAP Theorem and how to find the right balance of trade-offs for your application.
ACID vs BASE in NoSQL: Another False Dichotomy from Dan Sullivan, Ph.D.
]]>
1770 10 https://cdn.slidesharecdn.com/ss_thumbnails/nosqlnow2015-sullivan-acidvsbaseinnosql-150819205832-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation 000000 http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Big data, bioscience and the cloud biocatalyst june 2015 sullivan /slideshow/big-data-bioscience-and-the-cloud-biocatalyst-june-2015-sullivan/49848082 bigdatabioscienceandthecloud-biocatalystjune2015sullivan-150625195550-lva1-app6892
One of the most challenging problems in bioscience is data integration. From subcellular studiesto population simulations, we are faced with large volumes of difficult to integrate data. Presentation includes tips on getting started in big data bioscience.]]>

One of the most challenging problems in bioscience is data integration. From subcellular studiesto population simulations, we are faced with large volumes of difficult to integrate data. Presentation includes tips on getting started in big data bioscience.]]>
Thu, 25 Jun 2015 19:55:50 GMT /slideshow/big-data-bioscience-and-the-cloud-biocatalyst-june-2015-sullivan/49848082 DanSullivan10@slideshare.net(DanSullivan10) Big data, bioscience and the cloud biocatalyst june 2015 sullivan DanSullivan10 One of the most challenging problems in bioscience is data integration. From subcellular studiesto population simulations, we are faced with large volumes of difficult to integrate data. Presentation includes tips on getting started in big data bioscience. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/bigdatabioscienceandthecloud-biocatalystjune2015sullivan-150625195550-lva1-app6892-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> One of the most challenging problems in bioscience is data integration. From subcellular studiesto population simulations, we are faced with large volumes of difficult to integrate data. Presentation includes tips on getting started in big data bioscience.
Big data, bioscience and the cloud biocatalyst june 2015 sullivan from Dan Sullivan, Ph.D.
]]>
764 2 https://cdn.slidesharecdn.com/ss_thumbnails/bigdatabioscienceandthecloud-biocatalystjune2015sullivan-150625195550-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation 000000 http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property /slideshow/sullivan-big-data-tech-con-2015-boston-v3/47481831 sullivanbigdatatechcon2015bostonv3-150427161712-conversion-gate01
Text has evolved from a secon-class citizen in the world of data management to a principle source of insight. In this class, you will learn ways of analyzing text (statistical, syntactic and semantic methods), common text mining tasks (classification, named entity extraction, and information extraction), and the advantages and disadvantages of various algorithms. The class begins with an overview of statistical text mining, syntactic parsing, and semantic representations. Statistical techniques will focus on n-grams and their advantages and limitations. Syntactic parsing is described along with a discussion of well developed open-source parsers. The need for integration with structured data drives the discussion of semantic representations. Algorithms are introduced for classification with particular emphasis on term frequency – inverse document frequency (TF-IDF) representations and support vector machines (SVMs). This combination is widely used but there are limits to the precision and recall that one can achieve. Alternative formulations, such as distributed word representations, are discussed in detail. The problem of named entity extraction is addressed using conditional random fields. New advances in applying neural networks to create distributed word representations and their advantages over TF-IDF representations are discussed. Examples will be drawn from a large-scale text mining project (approximately 25 million documents) that applies machine learning (neural networks and support vector machines) and statistical analysis. The class will also include a discussion of open-source tools for text mining that include R, Spark, NLTK and the Python scientific stack. The session with conclude with a checklist of tips for planning and managing large-scale text mining projects.]]>

Text has evolved from a secon-class citizen in the world of data management to a principle source of insight. In this class, you will learn ways of analyzing text (statistical, syntactic and semantic methods), common text mining tasks (classification, named entity extraction, and information extraction), and the advantages and disadvantages of various algorithms. The class begins with an overview of statistical text mining, syntactic parsing, and semantic representations. Statistical techniques will focus on n-grams and their advantages and limitations. Syntactic parsing is described along with a discussion of well developed open-source parsers. The need for integration with structured data drives the discussion of semantic representations. Algorithms are introduced for classification with particular emphasis on term frequency – inverse document frequency (TF-IDF) representations and support vector machines (SVMs). This combination is widely used but there are limits to the precision and recall that one can achieve. Alternative formulations, such as distributed word representations, are discussed in detail. The problem of named entity extraction is addressed using conditional random fields. New advances in applying neural networks to create distributed word representations and their advantages over TF-IDF representations are discussed. Examples will be drawn from a large-scale text mining project (approximately 25 million documents) that applies machine learning (neural networks and support vector machines) and statistical analysis. The class will also include a discussion of open-source tools for text mining that include R, Spark, NLTK and the Python scientific stack. The session with conclude with a checklist of tips for planning and managing large-scale text mining projects.]]>
Mon, 27 Apr 2015 16:17:12 GMT /slideshow/sullivan-big-data-tech-con-2015-boston-v3/47481831 DanSullivan10@slideshare.net(DanSullivan10) Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property DanSullivan10 Text has evolved from a secon-class citizen in the world of data management to a principle source of insight. In this class, you will learn ways of analyzing text (statistical, syntactic and semantic methods), common text mining tasks (classification, named entity extraction, and information extraction), and the advantages and disadvantages of various algorithms. The class begins with an overview of statistical text mining, syntactic parsing, and semantic representations. Statistical techniques will focus on n-grams and their advantages and limitations. Syntactic parsing is described along with a discussion of well developed open-source parsers. The need for integration with structured data drives the discussion of semantic representations. Algorithms are introduced for classification with particular emphasis on term frequency – inverse document frequency (TF-IDF) representations and support vector machines (SVMs). This combination is widely used but there are limits to the precision and recall that one can achieve. Alternative formulations, such as distributed word representations, are discussed in detail. The problem of named entity extraction is addressed using conditional random fields. New advances in applying neural networks to create distributed word representations and their advantages over TF-IDF representations are discussed. Examples will be drawn from a large-scale text mining project (approximately 25 million documents) that applies machine learning (neural networks and support vector machines) and statistical analysis. The class will also include a discussion of open-source tools for text mining that include R, Spark, NLTK and the Python scientific stack. The session with conclude with a checklist of tips for planning and managing large-scale text mining projects. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sullivanbigdatatechcon2015bostonv3-150427161712-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Text has evolved from a secon-class citizen in the world of data management to a principle source of insight. In this class, you will learn ways of analyzing text (statistical, syntactic and semantic methods), common text mining tasks (classification, named entity extraction, and information extraction), and the advantages and disadvantages of various algorithms. The class begins with an overview of statistical text mining, syntactic parsing, and semantic representations. Statistical techniques will focus on n-grams and their advantages and limitations. Syntactic parsing is described along with a discussion of well developed open-source parsers. The need for integration with structured data drives the discussion of semantic representations. Algorithms are introduced for classification with particular emphasis on term frequency – inverse document frequency (TF-IDF) representations and support vector machines (SVMs). This combination is widely used but there are limits to the precision and recall that one can achieve. Alternative formulations, such as distributed word representations, are discussed in detail. The problem of named entity extraction is addressed using conditional random fields. New advances in applying neural networks to create distributed word representations and their advantages over TF-IDF representations are discussed. Examples will be drawn from a large-scale text mining project (approximately 25 million documents) that applies machine learning (neural networks and support vector machines) and statistical analysis. The class will also include a discussion of open-source tools for text mining that include R, Spark, NLTK and the Python scientific stack. The session with conclude with a checklist of tips for planning and managing large-scale text mining projects.
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property from Dan Sullivan, Ph.D.
]]>
1832 1 https://cdn.slidesharecdn.com/ss_thumbnails/sullivanbigdatatechcon2015bostonv3-150427161712-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Modeling with Document Database: 5 Key Patterns /slideshow/sullivan-document-data-modeling-patterns/46614932 sullivan-documentdatamodelingpatterns-150403094656-conversion-gate01
Document databases are more flexible in many ways than relational databases and this presents both opportunities and challenges. Poorly designed document structures adversely affect performance, increase maintenance overhead, and lead to unnecessarily complex application code. This presentation describes 5 commonly used design patters in document databases: one-to-many, many-to-many, simple table inheritance, trees and lookup patterns.]]>

Document databases are more flexible in many ways than relational databases and this presents both opportunities and challenges. Poorly designed document structures adversely affect performance, increase maintenance overhead, and lead to unnecessarily complex application code. This presentation describes 5 commonly used design patters in document databases: one-to-many, many-to-many, simple table inheritance, trees and lookup patterns.]]>
Fri, 03 Apr 2015 09:46:56 GMT /slideshow/sullivan-document-data-modeling-patterns/46614932 DanSullivan10@slideshare.net(DanSullivan10) Modeling with Document Database: 5 Key Patterns DanSullivan10 Document databases are more flexible in many ways than relational databases and this presents both opportunities and challenges. Poorly designed document structures adversely affect performance, increase maintenance overhead, and lead to unnecessarily complex application code. This presentation describes 5 commonly used design patters in document databases: one-to-many, many-to-many, simple table inheritance, trees and lookup patterns. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sullivan-documentdatamodelingpatterns-150403094656-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Document databases are more flexible in many ways than relational databases and this presents both opportunities and challenges. Poorly designed document structures adversely affect performance, increase maintenance overhead, and lead to unnecessarily complex application code. This presentation describes 5 commonly used design patters in document databases: one-to-many, many-to-many, simple table inheritance, trees and lookup patterns.
Modeling with Document Database: 5 Key Patterns from Dan Sullivan, Ph.D.
]]>
2493 5 https://cdn.slidesharecdn.com/ss_thumbnails/sullivan-documentdatamodelingpatterns-150403094656-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2 /DanSullivan10/sullivan-gbcb-seminar-fall-2014-limits-of-rdms-for-bioinformatics-v2 4ccfb09c-f136-4e98-9e82-16fbaf7f3b78-150129155455-conversion-gate01
]]>

]]>
Thu, 29 Jan 2015 15:54:55 GMT /DanSullivan10/sullivan-gbcb-seminar-fall-2014-limits-of-rdms-for-bioinformatics-v2 DanSullivan10@slideshare.net(DanSullivan10) Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2 DanSullivan10 <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/4ccfb09c-f136-4e98-9e82-16fbaf7f3b78-150129155455-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br>
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2 from Dan Sullivan, Ph.D.
]]>
485 3 https://cdn.slidesharecdn.com/ss_thumbnails/4ccfb09c-f136-4e98-9e82-16fbaf7f3b78-150129155455-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation 000000 http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Text Mining for Biocuration of Bacterial Infectious Diseases /slideshow/text-mining-for-biocuration-of-bacterial-infectious-diseases/43915591 sullivangbcbseminarspring2014-textmining-150126140346-conversion-gate01
Specialty gene sets, such as virulence factors and antibiotic resistance genes, are of particular interest to infectious disease researchers. Much of the information about specialty genes’ function is described in literature but unavailable as structured data in bioinformatics databases. The steadily increasing volume of literature makes it difficult to manually find relevant papers and extract assertion sentences about specialty genes. This presentation describes efforts to build and an automatic classifier for such sentences. Experiments were conducted to assess the impact of the imbalance of positive and negative examples in source documents on classification; develop a support vector machine (SVM) classifier using term frequency-inverse document frequency (TF-IDF) representation of text; and assess the marginal benefit of additional training examples on the quality of the classifier. Analysis of learning curves indicates that additional training examples will not likely improve the quality of the classifier. We discuss options for other text representation schemes to investigate in order to improve the quality of the classifier as measured by F-score.]]>

Specialty gene sets, such as virulence factors and antibiotic resistance genes, are of particular interest to infectious disease researchers. Much of the information about specialty genes’ function is described in literature but unavailable as structured data in bioinformatics databases. The steadily increasing volume of literature makes it difficult to manually find relevant papers and extract assertion sentences about specialty genes. This presentation describes efforts to build and an automatic classifier for such sentences. Experiments were conducted to assess the impact of the imbalance of positive and negative examples in source documents on classification; develop a support vector machine (SVM) classifier using term frequency-inverse document frequency (TF-IDF) representation of text; and assess the marginal benefit of additional training examples on the quality of the classifier. Analysis of learning curves indicates that additional training examples will not likely improve the quality of the classifier. We discuss options for other text representation schemes to investigate in order to improve the quality of the classifier as measured by F-score.]]>
Mon, 26 Jan 2015 14:03:45 GMT /slideshow/text-mining-for-biocuration-of-bacterial-infectious-diseases/43915591 DanSullivan10@slideshare.net(DanSullivan10) Text Mining for Biocuration of Bacterial Infectious Diseases DanSullivan10 Specialty gene sets, such as virulence factors and antibiotic resistance genes, are of particular interest to infectious disease researchers. Much of the information about specialty genes’ function is described in literature but unavailable as structured data in bioinformatics databases. The steadily increasing volume of literature makes it difficult to manually find relevant papers and extract assertion sentences about specialty genes. This presentation describes efforts to build and an automatic classifier for such sentences. Experiments were conducted to assess the impact of the imbalance of positive and negative examples in source documents on classification; develop a support vector machine (SVM) classifier using term frequency-inverse document frequency (TF-IDF) representation of text; and assess the marginal benefit of additional training examples on the quality of the classifier. Analysis of learning curves indicates that additional training examples will not likely improve the quality of the classifier. We discuss options for other text representation schemes to investigate in order to improve the quality of the classifier as measured by F-score. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sullivangbcbseminarspring2014-textmining-150126140346-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Specialty gene sets, such as virulence factors and antibiotic resistance genes, are of particular interest to infectious disease researchers. Much of the information about specialty genes’ function is described in literature but unavailable as structured data in bioinformatics databases. The steadily increasing volume of literature makes it difficult to manually find relevant papers and extract assertion sentences about specialty genes. This presentation describes efforts to build and an automatic classifier for such sentences. Experiments were conducted to assess the impact of the imbalance of positive and negative examples in source documents on classification; develop a support vector machine (SVM) classifier using term frequency-inverse document frequency (TF-IDF) representation of text; and assess the marginal benefit of additional training examples on the quality of the classifier. Analysis of learning curves indicates that additional training examples will not likely improve the quality of the classifier. We discuss options for other text representation schemes to investigate in order to improve the quality of the classifier as measured by F-score.
Text Mining for Biocuration of Bacterial Infectious Diseases from Dan Sullivan, Ph.D.
]]>
625 3 https://cdn.slidesharecdn.com/ss_thumbnails/sullivangbcbseminarspring2014-textmining-150126140346-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation 000000 http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Limits of RDBMS and Need for NoSQL in Bioinformatics /slideshow/limits-of-rdbms-and-need-for-nosql-in-bioinformatics/43914626 sullivangbcbseminarfall2014-limitsofrdmsforbioinformaticsv2-150126133425-conversion-gate02
Bioinformaticians constantly face challenges with data: from the large volumes of data to the need to integrate diverse data types. Relational databases have a long and successful history of managing data but have been unable to meet emerging needs of big data and highly integrated data stores. This talk discusses the limitations we face when using relational data models for bioinformatics applications. It describes the features, limitations and use cases of four alternative database models: key value databases, document databases, wide column data stores and graph databases. Use in bioinformatics applications is demonstrate with text mining and atherosclerosis research projects. The talk concludes with guidance on choosing an appropriate database model for varying bioinformatics requirements.]]>

Bioinformaticians constantly face challenges with data: from the large volumes of data to the need to integrate diverse data types. Relational databases have a long and successful history of managing data but have been unable to meet emerging needs of big data and highly integrated data stores. This talk discusses the limitations we face when using relational data models for bioinformatics applications. It describes the features, limitations and use cases of four alternative database models: key value databases, document databases, wide column data stores and graph databases. Use in bioinformatics applications is demonstrate with text mining and atherosclerosis research projects. The talk concludes with guidance on choosing an appropriate database model for varying bioinformatics requirements.]]>
Mon, 26 Jan 2015 13:34:25 GMT /slideshow/limits-of-rdbms-and-need-for-nosql-in-bioinformatics/43914626 DanSullivan10@slideshare.net(DanSullivan10) Limits of RDBMS and Need for NoSQL in Bioinformatics DanSullivan10 Bioinformaticians constantly face challenges with data: from the large volumes of data to the need to integrate diverse data types. Relational databases have a long and successful history of managing data but have been unable to meet emerging needs of big data and highly integrated data stores. This talk discusses the limitations we face when using relational data models for bioinformatics applications. It describes the features, limitations and use cases of four alternative database models: key value databases, document databases, wide column data stores and graph databases. Use in bioinformatics applications is demonstrate with text mining and atherosclerosis research projects. The talk concludes with guidance on choosing an appropriate database model for varying bioinformatics requirements. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sullivangbcbseminarfall2014-limitsofrdmsforbioinformaticsv2-150126133425-conversion-gate02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Bioinformaticians constantly face challenges with data: from the large volumes of data to the need to integrate diverse data types. Relational databases have a long and successful history of managing data but have been unable to meet emerging needs of big data and highly integrated data stores. This talk discusses the limitations we face when using relational data models for bioinformatics applications. It describes the features, limitations and use cases of four alternative database models: key value databases, document databases, wide column data stores and graph databases. Use in bioinformatics applications is demonstrate with text mining and atherosclerosis research projects. The talk concludes with guidance on choosing an appropriate database model for varying bioinformatics requirements.
Limits of RDBMS and Need for NoSQL in Bioinformatics from Dan Sullivan, Ph.D.
]]>
3329 9 https://cdn.slidesharecdn.com/ss_thumbnails/sullivangbcbseminarfall2014-limitsofrdmsforbioinformaticsv2-150126133425-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation 000000 http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-DanSullivan10-48x48.jpg?cb=1644694204 Principle Engineer and data architect PEAK6 Technologies as well as a former research scientist and systems architect at a major life science research institute. Experience leading design and implementation of innovative applications and research efforts in both commercial and academic environments. Author of Official Google Cloud Certified Professional Engineer, Professional Architect, and Associate Cloud Engineer Study Guides. dansullivanlearning.com https://cdn.slidesharecdn.com/ss_thumbnails/howtodesignamoderndatawarehouseinbigquery2-210311171406-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/how-to-design-a-modern-data-warehouse-in-big-query-2/244235928 How to Design a Modern... https://cdn.slidesharecdn.com/ss_thumbnails/buildingmodelsingcp-201031185232-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/with-automated-ml-is-everyone-an-ml-engineer/239036123 With Automated ML, is ... https://cdn.slidesharecdn.com/ss_thumbnails/bigqueryml-full-201031184939-thumbnail.jpg?width=320&height=320&fit=bounds DanSullivan10/getting-started-with-bigquery-ml Getting Started with B...