際際滷shows by User: CarolynDuby / http://www.slideshare.net/images/logo.gif 際際滷shows by User: CarolynDuby / Tue, 05 Dec 2017 15:39:44 GMT 際際滷Share feed for 際際滷shows by User: CarolynDuby Enterprise data science at scale /slideshow/enterprise-data-science-at-scale/83410626 enterprisedatascienceatscale-171205153944
Introducing Data Science at Scale with Spark, Jupyter, Zeppelin, and IBM DSX.]]>

Introducing Data Science at Scale with Spark, Jupyter, Zeppelin, and IBM DSX.]]>
Tue, 05 Dec 2017 15:39:44 GMT /slideshow/enterprise-data-science-at-scale/83410626 CarolynDuby@slideshare.net(CarolynDuby) Enterprise data science at scale CarolynDuby Introducing Data Science at Scale with Spark, Jupyter, Zeppelin, and IBM DSX. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/enterprisedatascienceatscale-171205153944-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Introducing Data Science at Scale with Spark, Jupyter, Zeppelin, and IBM DSX.
Enterprise data science at scale from Carolyn Duby
]]>
600 1 https://cdn.slidesharecdn.com/ss_thumbnails/enterprisedatascienceatscale-171205153944-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Unlocking insights in streaming data /slideshow/unlocking-insights-in-streaming-data/81294991 unlockinginsightsinstreamingdata-171027195104
Presentation from Future of Data Boston Meetup on Oct 24, 2017. Streaming data is rich with insights but these insights can be difficult to find due to the difficulty of developing and deploying streaming applications. During this presentation we will show how to build and deploy a complex streaming application in a few minutes using open source tools. First we will build an application using Streaming Analytics Manager and Schema Registry that ingests data into Apache Druid. Then we will use Apache Superset to build beautiful, informative dashboards. ]]>

Presentation from Future of Data Boston Meetup on Oct 24, 2017. Streaming data is rich with insights but these insights can be difficult to find due to the difficulty of developing and deploying streaming applications. During this presentation we will show how to build and deploy a complex streaming application in a few minutes using open source tools. First we will build an application using Streaming Analytics Manager and Schema Registry that ingests data into Apache Druid. Then we will use Apache Superset to build beautiful, informative dashboards. ]]>
Fri, 27 Oct 2017 19:51:04 GMT /slideshow/unlocking-insights-in-streaming-data/81294991 CarolynDuby@slideshare.net(CarolynDuby) Unlocking insights in streaming data CarolynDuby Presentation from Future of Data Boston Meetup on Oct 24, 2017. Streaming data is rich with insights but these insights can be difficult to find due to the difficulty of developing and deploying streaming applications. During this presentation we will show how to build and deploy a complex streaming application in a few minutes using open source tools. First we will build an application using Streaming Analytics Manager and Schema Registry that ingests data into Apache Druid. Then we will use Apache Superset to build beautiful, informative dashboards. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/unlockinginsightsinstreamingdata-171027195104-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presentation from Future of Data Boston Meetup on Oct 24, 2017. Streaming data is rich with insights but these insights can be difficult to find due to the difficulty of developing and deploying streaming applications. During this presentation we will show how to build and deploy a complex streaming application in a few minutes using open source tools. First we will build an application using Streaming Analytics Manager and Schema Registry that ingests data into Apache Druid. Then we will use Apache Superset to build beautiful, informative dashboards.
Unlocking insights in streaming data from Carolyn Duby
]]>
404 2 https://cdn.slidesharecdn.com/ss_thumbnails/unlockinginsightsinstreamingdata-171027195104-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Data Science at Scale with Apache Spark and Zeppelin Notebook /CarolynDuby/data-science-at-scale-with-apache-spark-and-zeppelin-notebook cdubygdsc-boston-oct16-182017-171019231807
How to get started with Data Science at scale using Apache Spark to clean, analyze, discover, and build models on large data sets. Use Zeppelin to record analysis to encourage peer review and reuse of analysis techniques.]]>

How to get started with Data Science at scale using Apache Spark to clean, analyze, discover, and build models on large data sets. Use Zeppelin to record analysis to encourage peer review and reuse of analysis techniques.]]>
Thu, 19 Oct 2017 23:18:07 GMT /CarolynDuby/data-science-at-scale-with-apache-spark-and-zeppelin-notebook CarolynDuby@slideshare.net(CarolynDuby) Data Science at Scale with Apache Spark and Zeppelin Notebook CarolynDuby How to get started with Data Science at scale using Apache Spark to clean, analyze, discover, and build models on large data sets. Use Zeppelin to record analysis to encourage peer review and reuse of analysis techniques. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/cdubygdsc-boston-oct16-182017-171019231807-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> How to get started with Data Science at scale using Apache Spark to clean, analyze, discover, and build models on large data sets. Use Zeppelin to record analysis to encourage peer review and reuse of analysis techniques.
Data Science at Scale with Apache Spark and Zeppelin Notebook from Carolyn Duby
]]>
735 3 https://cdn.slidesharecdn.com/ss_thumbnails/cdubygdsc-boston-oct16-182017-171019231807-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Best Practices for Data at Scale - Global Data Science Conference /slideshow/best-practices-for-data-at-scale-global-data-science-conference/80999831 bestpracticesdataatscale-171019231744
Practical advice for a successful projects using data at scale across the project lifecycle. ]]>

Practical advice for a successful projects using data at scale across the project lifecycle. ]]>
Thu, 19 Oct 2017 23:17:44 GMT /slideshow/best-practices-for-data-at-scale-global-data-science-conference/80999831 CarolynDuby@slideshare.net(CarolynDuby) Best Practices for Data at Scale - Global Data Science Conference CarolynDuby Practical advice for a successful projects using data at scale across the project lifecycle. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/bestpracticesdataatscale-171019231744-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Practical advice for a successful projects using data at scale across the project lifecycle.
Best Practices for Data at Scale - Global Data Science Conference from Carolyn Duby
]]>
531 2 https://cdn.slidesharecdn.com/ss_thumbnails/bestpracticesdataatscale-171019231744-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card Fraud Demo /slideshow/boston-future-of-data-meetup-may-2017-spark-introduction-with-credit-card-fraud-demo/75932577 sparkintrocreditcardfraud-170512212313
Presented at the Future of Data Meetup in Boston in May 2017. An introduction to Apache Spark followed by a sample credit card fraud demo using Spark, Nifi, Storm and Hbase. ]]>

Presented at the Future of Data Meetup in Boston in May 2017. An introduction to Apache Spark followed by a sample credit card fraud demo using Spark, Nifi, Storm and Hbase. ]]>
Fri, 12 May 2017 21:23:13 GMT /slideshow/boston-future-of-data-meetup-may-2017-spark-introduction-with-credit-card-fraud-demo/75932577 CarolynDuby@slideshare.net(CarolynDuby) Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card Fraud Demo CarolynDuby Presented at the Future of Data Meetup in Boston in May 2017. An introduction to Apache Spark followed by a sample credit card fraud demo using Spark, Nifi, Storm and Hbase. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sparkintrocreditcardfraud-170512212313-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presented at the Future of Data Meetup in Boston in May 2017. An introduction to Apache Spark followed by a sample credit card fraud demo using Spark, Nifi, Storm and Hbase.
Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card Fraud Demo from Carolyn Duby
]]>
625 6 https://cdn.slidesharecdn.com/ss_thumbnails/sparkintrocreditcardfraud-170512212313-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark /slideshow/odsc-east-2017-reproducible-research-at-scale-with-apache-zeppelin-and-spark/75718472 dubyodsceast2017reproducibleresearch-170505184740
ODSC East 2017 - How to use Zeppelin and Spark to document your research. Reproducible research documents not just the findings of a study but the exact code required to produce those findings. Reproducible research is a requirement for study authors to reliably repeat their analysis or accelerate new findings by applying the same techniques to new data. The increased transparency allows peers to quickly understand and compare the methods of the study to other studies and can lead to higher levels of trust, interest and eventually more citations of your work. Big data introduces some new challenges for reproducible research. As our data universe expands and the open data movement grows, more data is available than ever to analyze, and the possible combinations are infinite. Data cleaning and feature extraction often involve lengthy sequences of transformations. The space allotted for publications is not adequate to effectively describe all the details, so they can be reviewed and reproduced by others. Fortunately, the open source community is addressing this need with Apache Spark, Zeppelin and Hadoop. Apache Spark 2.0 makes it even simpler and faster to harness the power of a Hadoop computing cluster to clean, analyze, explore and train machine learning models on large data sets. Zeppelin web-based notebooks capture and share code and interactive visualizations with others. After this session you will be able to create a reproducible data science pipeline over large data sets using Spark, Zeppelin, and a Hadoop distributed computing cluster. Learn how to combine Spark with other supported interpreters to codify your results from cleaning to exploration to feature extraction and machine learning. Discover how to share your notebooks and data with others using the cloud. This talk will cover Spark and show examples, but it is not intended to be a complete tutorial on Spark.]]>

ODSC East 2017 - How to use Zeppelin and Spark to document your research. Reproducible research documents not just the findings of a study but the exact code required to produce those findings. Reproducible research is a requirement for study authors to reliably repeat their analysis or accelerate new findings by applying the same techniques to new data. The increased transparency allows peers to quickly understand and compare the methods of the study to other studies and can lead to higher levels of trust, interest and eventually more citations of your work. Big data introduces some new challenges for reproducible research. As our data universe expands and the open data movement grows, more data is available than ever to analyze, and the possible combinations are infinite. Data cleaning and feature extraction often involve lengthy sequences of transformations. The space allotted for publications is not adequate to effectively describe all the details, so they can be reviewed and reproduced by others. Fortunately, the open source community is addressing this need with Apache Spark, Zeppelin and Hadoop. Apache Spark 2.0 makes it even simpler and faster to harness the power of a Hadoop computing cluster to clean, analyze, explore and train machine learning models on large data sets. Zeppelin web-based notebooks capture and share code and interactive visualizations with others. After this session you will be able to create a reproducible data science pipeline over large data sets using Spark, Zeppelin, and a Hadoop distributed computing cluster. Learn how to combine Spark with other supported interpreters to codify your results from cleaning to exploration to feature extraction and machine learning. Discover how to share your notebooks and data with others using the cloud. This talk will cover Spark and show examples, but it is not intended to be a complete tutorial on Spark.]]>
Fri, 05 May 2017 18:47:40 GMT /slideshow/odsc-east-2017-reproducible-research-at-scale-with-apache-zeppelin-and-spark/75718472 CarolynDuby@slideshare.net(CarolynDuby) ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark CarolynDuby ODSC East 2017 - How to use Zeppelin and Spark to document your research. Reproducible research documents not just the findings of a study but the exact code required to produce those findings. Reproducible research is a requirement for study authors to reliably repeat their analysis or accelerate new findings by applying the same techniques to new data. The increased transparency allows peers to quickly understand and compare the methods of the study to other studies and can lead to higher levels of trust, interest and eventually more citations of your work. Big data introduces some new challenges for reproducible research. As our data universe expands and the open data movement grows, more data is available than ever to analyze, and the possible combinations are infinite. Data cleaning and feature extraction often involve lengthy sequences of transformations. The space allotted for publications is not adequate to effectively describe all the details, so they can be reviewed and reproduced by others. Fortunately, the open source community is addressing this need with Apache Spark, Zeppelin and Hadoop. Apache Spark 2.0 makes it even simpler and faster to harness the power of a Hadoop computing cluster to clean, analyze, explore and train machine learning models on large data sets. Zeppelin web-based notebooks capture and share code and interactive visualizations with others. After this session you will be able to create a reproducible data science pipeline over large data sets using Spark, Zeppelin, and a Hadoop distributed computing cluster. Learn how to combine Spark with other supported interpreters to codify your results from cleaning to exploration to feature extraction and machine learning. Discover how to share your notebooks and data with others using the cloud. This talk will cover Spark and show examples, but it is not intended to be a complete tutorial on Spark. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/dubyodsceast2017reproducibleresearch-170505184740-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> ODSC East 2017 - How to use Zeppelin and Spark to document your research. Reproducible research documents not just the findings of a study but the exact code required to produce those findings. Reproducible research is a requirement for study authors to reliably repeat their analysis or accelerate new findings by applying the same techniques to new data. The increased transparency allows peers to quickly understand and compare the methods of the study to other studies and can lead to higher levels of trust, interest and eventually more citations of your work. Big data introduces some new challenges for reproducible research. As our data universe expands and the open data movement grows, more data is available than ever to analyze, and the possible combinations are infinite. Data cleaning and feature extraction often involve lengthy sequences of transformations. The space allotted for publications is not adequate to effectively describe all the details, so they can be reviewed and reproduced by others. Fortunately, the open source community is addressing this need with Apache Spark, Zeppelin and Hadoop. Apache Spark 2.0 makes it even simpler and faster to harness the power of a Hadoop computing cluster to clean, analyze, explore and train machine learning models on large data sets. Zeppelin web-based notebooks capture and share code and interactive visualizations with others. After this session you will be able to create a reproducible data science pipeline over large data sets using Spark, Zeppelin, and a Hadoop distributed computing cluster. Learn how to combine Spark with other supported interpreters to codify your results from cleaning to exploration to feature extraction and machine learning. Discover how to share your notebooks and data with others using the cloud. This talk will cover Spark and show examples, but it is not intended to be a complete tutorial on Spark.
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark from Carolyn Duby
]]>
836 7 https://cdn.slidesharecdn.com/ss_thumbnails/dubyodsceast2017reproducibleresearch-170505184740-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Providence Future of Data Meetup - Apache Metron Open Source Cybersecurity Platform /slideshow/providence-future-of-data-meetup-apache-metron-open-source-cybersecurity-platform/70480686 metron-providencefutureofdatameetp1-161227210648
An overview of Apache Metron, an open source platform for ingesting, enriching, triaging, and storing diverse cybersecurity feeds. Metron is built on top of hadoop and is horizontally scalable using commodity hardware.]]>

An overview of Apache Metron, an open source platform for ingesting, enriching, triaging, and storing diverse cybersecurity feeds. Metron is built on top of hadoop and is horizontally scalable using commodity hardware.]]>
Tue, 27 Dec 2016 21:06:48 GMT /slideshow/providence-future-of-data-meetup-apache-metron-open-source-cybersecurity-platform/70480686 CarolynDuby@slideshare.net(CarolynDuby) Providence Future of Data Meetup - Apache Metron Open Source Cybersecurity Platform CarolynDuby An overview of Apache Metron, an open source platform for ingesting, enriching, triaging, and storing diverse cybersecurity feeds. Metron is built on top of hadoop and is horizontally scalable using commodity hardware. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/metron-providencefutureofdatameetp1-161227210648-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> An overview of Apache Metron, an open source platform for ingesting, enriching, triaging, and storing diverse cybersecurity feeds. Metron is built on top of hadoop and is horizontally scalable using commodity hardware.
Providence Future of Data Meetup - Apache Metron Open Source Cybersecurity Platform from Carolyn Duby
]]>
750 4 https://cdn.slidesharecdn.com/ss_thumbnails/metron-providencefutureofdatameetp1-161227210648-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://public.slidesharecdn.com/v2/images/profile-picture.png https://cdn.slidesharecdn.com/ss_thumbnails/enterprisedatascienceatscale-171205153944-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/enterprise-data-science-at-scale/83410626 Enterprise data scienc... https://cdn.slidesharecdn.com/ss_thumbnails/unlockinginsightsinstreamingdata-171027195104-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/unlocking-insights-in-streaming-data/81294991 Unlocking insights in ... https://cdn.slidesharecdn.com/ss_thumbnails/cdubygdsc-boston-oct16-182017-171019231807-thumbnail.jpg?width=320&height=320&fit=bounds CarolynDuby/data-science-at-scale-with-apache-spark-and-zeppelin-notebook Data Science at Scale ...