�ݺ�ߣshows by User: pacoid

�ݺ�ߣshows by User: pacoid / http://www.slideshare.net/images/logo.gif �ݺ�ߣshows by User: pacoid / Fri, 09 Mar 2018 16:29:19 GMT �ݺ�ߣShare feed for �ݺ�ߣshows by User: pacoid Human in the loop: a design pattern for managing teams working with ML /slideshow/human-in-the-loop-a-design-pattern-for-managing-teams-working-with-ml-90169772/90169772 hitlstrata-180309162919
Strata CA 2018-03-08 https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/64223 Although it has long been used for has been used for use cases like simulation, training, and UX mockups, human-in-the-loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. One approach, active learning (a special case of semi-supervised learning), employs mostly automated processes based on machine learning models, but exceptions are referred to human experts, whose decisions help improve new iterations of the models.]]>
Strata CA 2018-03-08 https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/64223 Although it has long been used for has been used for use cases like simulation, training, and UX mockups, human-in-the-loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. One approach, active learning (a special case of semi-supervised learning), employs mostly automated processes based on machine learning models, but exceptions are referred to human experts, whose decisions help improve new iterations of the models.]]> Fri, 09 Mar 2018 16:29:19 GMT /slideshow/human-in-the-loop-a-design-pattern-for-managing-teams-working-with-ml-90169772/90169772 pacoid@slideshare.net(pacoid) Human in the loop: a design pattern for managing teams working with ML pacoid Strata CA 2018-03-08 https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/64223 Although it has long been used for has been used for use cases like simulation, training, and UX mockups, human-in-the-loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. One approach, active learning (a special case of semi-supervised learning), employs mostly automated processes based on machine learning models, but exceptions are referred to human experts, whose decisions help improve new iterations of the models. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/hitlstrata-180309162919-thumbnail.jpg?width=120&height=120&fit=bounds" /> Strata CA 2018-03-08 https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/64223 Although it has long been used for has been used for use cases like simulation, training, and UX mockups, human-in-the-loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. One approach, active learning (a special case of semi-supervised learning), employs mostly automated processes based on machine learning models, but exceptions are referred to human experts, whose decisions help improve new iterations of the models.

Human in the loop: a design pattern for managing teams working with ML from Paco Nathan

]]> 2901 8 https://cdn.slidesharecdn.com/ss_thumbnails/hitlstrata-180309162919-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 Human-in-the-loop: a design pattern for managing teams that leverage ML /slideshow/humanintheloop-a-design-pattern-for-managing-teams-that-leverage-ml/83469606 hitlpxn-171206083050
Strata Singapore 2017 session talk 2017-12-06 https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/65611 Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called active learning allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models. This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We’ll consider some of the technical aspects — including available open source projects — as well as management perspectives for how to apply HITL: * When is HITL indicated vs. when isn’t it applicable? * How do HITL approaches compare/contrast with more “typical” use of Big Data? * What’s the relationship between use of HITL and preparing an organization to leverage Deep Learning? * Experiences training and managing a team which uses HITL at scale * Caveats to know ahead of time: * In what ways do the humans involved learn from the machines? * In particular, we’ll examine use cases at O’Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).]]>
Strata Singapore 2017 session talk 2017-12-06 https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/65611 Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called active learning allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models. This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We’ll consider some of the technical aspects — including available open source projects — as well as management perspectives for how to apply HITL: * When is HITL indicated vs. when isn’t it applicable? * How do HITL approaches compare/contrast with more “typical” use of Big Data? * What’s the relationship between use of HITL and preparing an organization to leverage Deep Learning? * Experiences training and managing a team which uses HITL at scale * Caveats to know ahead of time: * In what ways do the humans involved learn from the machines? * In particular, we’ll examine use cases at O’Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).]]> Wed, 06 Dec 2017 08:30:50 GMT /slideshow/humanintheloop-a-design-pattern-for-managing-teams-that-leverage-ml/83469606 pacoid@slideshare.net(pacoid) Human-in-the-loop: a design pattern for managing teams that leverage ML pacoid Strata Singapore 2017 session talk 2017-12-06 https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/65611 Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called active learning allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models. This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We’ll consider some of the technical aspects — including available open source projects — as well as management perspectives for how to apply HITL: * When is HITL indicated vs. when isn’t it applicable? * How do HITL approaches compare/contrast with more “typical” use of Big Data? * What’s the relationship between use of HITL and preparing an organization to leverage Deep Learning? * Experiences training and managing a team which uses HITL at scale * Caveats to know ahead of time: * In what ways do the humans involved learn from the machines? * In particular, we’ll examine use cases at O’Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation). <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/hitlpxn-171206083050-thumbnail.jpg?width=120&height=120&fit=bounds" /> Strata Singapore 2017 session talk 2017-12-06 https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/65611 Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called active learning allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models. This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We’ll consider some of the technical aspects — including available open source projects — as well as management perspectives for how to apply HITL: * When is HITL indicated vs. when isn’t it applicable? * How do HITL approaches compare/contrast with more “typical” use of Big Data? * What’s the relationship between use of HITL and preparing an organization to leverage Deep Learning? * Experiences training and managing a team which uses HITL at scale * Caveats to know ahead of time: * In what ways do the humans involved learn from the machines? * In particular, we’ll examine use cases at O’Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).

Human-in-the-loop: a design pattern for managing teams that leverage ML from Paco Nathan

]]> 2726 4 https://cdn.slidesharecdn.com/ss_thumbnails/hitlpxn-171206083050-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 Human-in-a-loop: a design pattern for managing teams which leverage ML /slideshow/humaninaloop-a-design-pattern-for-managing-teams-which-leverage-ml-82181008/82181008 hitlpxn-171116193157
Human-in-a-loop: a design pattern for managing teams which leverage ML Big Data Spain, 2017-11-16 https://www.bigdataspain.org/2017/talk/human-in-the-loop-a-design-pattern-for-managing-teams-which-leverage-ml Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called _active learning_ allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models. This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We'll consider some of the technical aspects -- including available open source projects -- as well as management perspectives for how to apply HITL: * When is HITL indicated vs. when isn't it applicable? * How do HITL approaches compare/contrast with more "typical" use of Big Data? * What's the relationship between use of HITL and preparing an organization to leverage Deep Learning? * Experiences training and managing a team which uses HITL at scale * Caveats to know ahead of time * In what ways do the humans involved learn from the machines? In particular, we'll examine use cases at O'Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).]]>
Human-in-a-loop: a design pattern for managing teams which leverage ML Big Data Spain, 2017-11-16 https://www.bigdataspain.org/2017/talk/human-in-the-loop-a-design-pattern-for-managing-teams-which-leverage-ml Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called _active learning_ allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models. This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We'll consider some of the technical aspects -- including available open source projects -- as well as management perspectives for how to apply HITL: * When is HITL indicated vs. when isn't it applicable? * How do HITL approaches compare/contrast with more "typical" use of Big Data? * What's the relationship between use of HITL and preparing an organization to leverage Deep Learning? * Experiences training and managing a team which uses HITL at scale * Caveats to know ahead of time * In what ways do the humans involved learn from the machines? In particular, we'll examine use cases at O'Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).]]> Thu, 16 Nov 2017 19:31:57 GMT /slideshow/humaninaloop-a-design-pattern-for-managing-teams-which-leverage-ml-82181008/82181008 pacoid@slideshare.net(pacoid) Human-in-a-loop: a design pattern for managing teams which leverage ML pacoid Human-in-a-loop: a design pattern for managing teams which leverage ML Big Data Spain, 2017-11-16 https://www.bigdataspain.org/2017/talk/human-in-the-loop-a-design-pattern-for-managing-teams-which-leverage-ml Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called _active learning_ allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models. This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We'll consider some of the technical aspects -- including available open source projects -- as well as management perspectives for how to apply HITL: * When is HITL indicated vs. when isn't it applicable? * How do HITL approaches compare/contrast with more "typical" use of Big Data? * What's the relationship between use of HITL and preparing an organization to leverage Deep Learning? * Experiences training and managing a team which uses HITL at scale * Caveats to know ahead of time * In what ways do the humans involved learn from the machines? In particular, we'll examine use cases at O'Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation). <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/hitlpxn-171116193157-thumbnail.jpg?width=120&height=120&fit=bounds" /> Human-in-a-loop: a design pattern for managing teams which leverage ML Big Data Spain, 2017-11-16 https://www.bigdataspain.org/2017/talk/human-in-the-loop-a-design-pattern-for-managing-teams-which-leverage-ml Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called _active learning_ allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models. This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We'll consider some of the technical aspects -- including available open source projects -- as well as management perspectives for how to apply HITL: * When is HITL indicated vs. when isn't it applicable? * How do HITL approaches compare/contrast with more "typical" use of Big Data? * What's the relationship between use of HITL and preparing an organization to leverage Deep Learning? * Experiences training and managing a team which uses HITL at scale * Caveats to know ahead of time * In what ways do the humans involved learn from the machines? In particular, we'll examine use cases at O'Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).

Human-in-a-loop: a design pattern for managing teams which leverage ML from Paco Nathan

]]> 1370 4 https://cdn.slidesharecdn.com/ss_thumbnails/hitlpxn-171116193157-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 Humans in a loop: Jupyter notebooks as a front-end for AI /slideshow/humans-in-a-loop-jupyter-notebooks-as-a-frontend-for-ai/79147844 jupytercon-170825095159
JupyterCon NY 2017-08-24 https://www.safaribooksonline.com/library/view/jupytercon-2017-/9781491985311/video313210.html Paco Nathan reviews use cases where Jupyter provides a front-end to AI as the means for keeping "humans in the loop". This talk introduces *active learning* and the "human-in-the-loop" design pattern for managing how people and machines collaborate in AI workflows, including several case studies. The talk also explores how O'Reilly Media leverages AI in Media, and in particular some of our use cases for active learning such as disambiguation in content discovery. We're using Jupyter as a way to manage active learning ML pipelines, where the machines generally run automated until they hit an edge case and refer the judgement back to human experts. In turn, the experts training the ML pipelines purely through examples, not feature engineering, model parameters, etc. Jupyter notebooks serve as one part configuration file,  one part data sample, one part structured log,  one part data visualization tool. O'Reilly has released an open source project on GitHub called `nbtransom` which builds atop `nbformat` and `pandas` for our active learning use cases. This work anticipates upcoming work on collaborative documents in JupyterLab, based on Google Drive. In other words, where the machines and people are collaborators on shared documents.]]>
JupyterCon NY 2017-08-24 https://www.safaribooksonline.com/library/view/jupytercon-2017-/9781491985311/video313210.html Paco Nathan reviews use cases where Jupyter provides a front-end to AI as the means for keeping "humans in the loop". This talk introduces *active learning* and the "human-in-the-loop" design pattern for managing how people and machines collaborate in AI workflows, including several case studies. The talk also explores how O'Reilly Media leverages AI in Media, and in particular some of our use cases for active learning such as disambiguation in content discovery. We're using Jupyter as a way to manage active learning ML pipelines, where the machines generally run automated until they hit an edge case and refer the judgement back to human experts. In turn, the experts training the ML pipelines purely through examples, not feature engineering, model parameters, etc. Jupyter notebooks serve as one part configuration file,  one part data sample, one part structured log,  one part data visualization tool. O'Reilly has released an open source project on GitHub called `nbtransom` which builds atop `nbformat` and `pandas` for our active learning use cases. This work anticipates upcoming work on collaborative documents in JupyterLab, based on Google Drive. In other words, where the machines and people are collaborators on shared documents.]]> Fri, 25 Aug 2017 09:51:58 GMT /slideshow/humans-in-a-loop-jupyter-notebooks-as-a-frontend-for-ai/79147844 pacoid@slideshare.net(pacoid) Humans in a loop: Jupyter notebooks as a front-end for AI pacoid JupyterCon NY 2017-08-24 https://www.safaribooksonline.com/library/view/jupytercon-2017-/9781491985311/video313210.html Paco Nathan reviews use cases where Jupyter provides a front-end to AI as the means for keeping "humans in the loop". This talk introduces *active learning* and the "human-in-the-loop" design pattern for managing how people and machines collaborate in AI workflows, including several case studies. The talk also explores how O'Reilly Media leverages AI in Media, and in particular some of our use cases for active learning such as disambiguation in content discovery. We're using Jupyter as a way to manage active learning ML pipelines, where the machines generally run automated until they hit an edge case and refer the judgement back to human experts. In turn, the experts training the ML pipelines purely through examples, not feature engineering, model parameters, etc. Jupyter notebooks serve as one part configuration file,  one part data sample, one part structured log,  one part data visualization tool. O'Reilly has released an open source project on GitHub called `nbtransom` which builds atop `nbformat` and `pandas` for our active learning use cases. This work anticipates upcoming work on collaborative documents in JupyterLab, based on Google Drive. In other words, where the machines and people are collaborators on shared documents. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/jupytercon-170825095159-thumbnail.jpg?width=120&height=120&fit=bounds" /> JupyterCon NY 2017-08-24 https://www.safaribooksonline.com/library/view/jupytercon-2017-/9781491985311/video313210.html Paco Nathan reviews use cases where Jupyter provides a front-end to AI as the means for keeping "humans in the loop". This talk introduces *active learning* and the "human-in-the-loop" design pattern for managing how people and machines collaborate in AI workflows, including several case studies. The talk also explores how O'Reilly Media leverages AI in Media, and in particular some of our use cases for active learning such as disambiguation in content discovery. We're using Jupyter as a way to manage active learning ML pipelines, where the machines generally run automated until they hit an edge case and refer the judgement back to human experts. In turn, the experts training the ML pipelines purely through examples, not feature engineering, model parameters, etc. Jupyter notebooks serve as one part configuration file,  one part data sample, one part structured log,  one part data visualization tool. O'Reilly has released an open source project on GitHub called `nbtransom` which builds atop `nbformat` and `pandas` for our active learning use cases. This work anticipates upcoming work on collaborative documents in JupyterLab, based on Google Drive. In other words, where the machines and people are collaborators on shared documents.

Humans in a loop: Jupyter notebooks as a front-end for AI from Paco Nathan

]]> 1766 10 https://cdn.slidesharecdn.com/ss_thumbnails/jupytercon-170825095159-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 Humans in the loop: AI in open source and industry /slideshow/humans-in-the-loop-ai-in-open-source-and-industry/78766802 humansintheloop-170811150519
Nike Tech Talk, Portland, 2017-08-10 https://niketechtalks-aug2017.splashthat.com/ O'Reilly Media gets to see the forefront of trends in artificial intelligence: what the leading teams are working on, which use cases are getting the most traction, previews of advances before they get announced on stage. Through conferences, publishing, and training programs, we've been assembling resources for anyone who wants to learn. An excellent recent example: Generative Adversarial Networks for Beginners, by Jon Bruner. This talk covers current trends in AI, industry use cases, and recent highlights from the AI Conf series presented by O'Reilly and Intel, plus related materials from Safari learning platform, Strata Data, Data Show, and the upcoming JupyterCon. Along with reporting, we're leveraging AI in Media. This talk dives into O'Reilly uses of deep learning -- combined with ontology, graph algorithms, probabilistic data structures, and even some evolutionary software -- to help editors and customers alike accomplish more of what they need to do. In particular, we'll show two open source projects in Python from O'Reilly's AI team: • pytextrank built atop spaCy, NetworkX, datasketch, providing graph algorithms for advanced NLP and text analytics  • nbtransom leveraging Project Jupyter for a human-in-the-loop design pattern approach to AI work: people and machines collaborating on content annotation]]>
Nike Tech Talk, Portland, 2017-08-10 https://niketechtalks-aug2017.splashthat.com/ O'Reilly Media gets to see the forefront of trends in artificial intelligence: what the leading teams are working on, which use cases are getting the most traction, previews of advances before they get announced on stage. Through conferences, publishing, and training programs, we've been assembling resources for anyone who wants to learn. An excellent recent example: Generative Adversarial Networks for Beginners, by Jon Bruner. This talk covers current trends in AI, industry use cases, and recent highlights from the AI Conf series presented by O'Reilly and Intel, plus related materials from Safari learning platform, Strata Data, Data Show, and the upcoming JupyterCon. Along with reporting, we're leveraging AI in Media. This talk dives into O'Reilly uses of deep learning -- combined with ontology, graph algorithms, probabilistic data structures, and even some evolutionary software -- to help editors and customers alike accomplish more of what they need to do. In particular, we'll show two open source projects in Python from O'Reilly's AI team: • pytextrank built atop spaCy, NetworkX, datasketch, providing graph algorithms for advanced NLP and text analytics  • nbtransom leveraging Project Jupyter for a human-in-the-loop design pattern approach to AI work: people and machines collaborating on content annotation]]> Fri, 11 Aug 2017 15:05:19 GMT /slideshow/humans-in-the-loop-ai-in-open-source-and-industry/78766802 pacoid@slideshare.net(pacoid) Humans in the loop: AI in open source and industry pacoid Nike Tech Talk, Portland, 2017-08-10 https://niketechtalks-aug2017.splashthat.com/ O'Reilly Media gets to see the forefront of trends in artificial intelligence: what the leading teams are working on, which use cases are getting the most traction, previews of advances before they get announced on stage. Through conferences, publishing, and training programs, we've been assembling resources for anyone who wants to learn. An excellent recent example: Generative Adversarial Networks for Beginners, by Jon Bruner. This talk covers current trends in AI, industry use cases, and recent highlights from the AI Conf series presented by O'Reilly and Intel, plus related materials from Safari learning platform, Strata Data, Data Show, and the upcoming JupyterCon. Along with reporting, we're leveraging AI in Media. This talk dives into O'Reilly uses of deep learning -- combined with ontology, graph algorithms, probabilistic data structures, and even some evolutionary software -- to help editors and customers alike accomplish more of what they need to do. In particular, we'll show two open source projects in Python from O'Reilly's AI team: • pytextrank built atop spaCy, NetworkX, datasketch, providing graph algorithms for advanced NLP and text analytics  • nbtransom leveraging Project Jupyter for a human-in-the-loop design pattern approach to AI work: people and machines collaborating on content annotation <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/humansintheloop-170811150519-thumbnail.jpg?width=120&height=120&fit=bounds" /> Nike Tech Talk, Portland, 2017-08-10 https://niketechtalks-aug2017.splashthat.com/ O'Reilly Media gets to see the forefront of trends in artificial intelligence: what the leading teams are working on, which use cases are getting the most traction, previews of advances before they get announced on stage. Through conferences, publishing, and training programs, we've been assembling resources for anyone who wants to learn. An excellent recent example: Generative Adversarial Networks for Beginners, by Jon Bruner. This talk covers current trends in AI, industry use cases, and recent highlights from the AI Conf series presented by O'Reilly and Intel, plus related materials from Safari learning platform, Strata Data, Data Show, and the upcoming JupyterCon. Along with reporting, we're leveraging AI in Media. This talk dives into O'Reilly uses of deep learning -- combined with ontology, graph algorithms, probabilistic data structures, and even some evolutionary software -- to help editors and customers alike accomplish more of what they need to do. In particular, we'll show two open source projects in Python from O'Reilly's AI team: • pytextrank built atop spaCy, NetworkX, datasketch, providing graph algorithms for advanced NLP and text analytics  • nbtransom leveraging Project Jupyter for a human-in-the-loop design pattern approach to AI work: people and machines collaborating on content annotation

Humans in the loop: AI in open source and industry from Paco Nathan

]]> 1569 10 https://cdn.slidesharecdn.com/ss_thumbnails/humansintheloop-170811150519-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 Computable Content /slideshow/computable-content/77159175 computablecontent-170622004825
Lessons learned from 3 (going on 4) generations of Jupyter use cases at O'Reilly Media. In particular, about "Oriole" tutorials which combine video with Jupyter notebooks, Docker containers, backed by services managed on a cluster by Marathon, Mesos, Redis, and Nginx. https://conferences.oreilly.com/fluent/fl-ca/public/schedule/detail/62859 https://conferences.oreilly.com/velocity/vl-ca/public/schedule/detail/62858]]>
Lessons learned from 3 (going on 4) generations of Jupyter use cases at O'Reilly Media. In particular, about "Oriole" tutorials which combine video with Jupyter notebooks, Docker containers, backed by services managed on a cluster by Marathon, Mesos, Redis, and Nginx. https://conferences.oreilly.com/fluent/fl-ca/public/schedule/detail/62859 https://conferences.oreilly.com/velocity/vl-ca/public/schedule/detail/62858]]> Thu, 22 Jun 2017 00:48:25 GMT /slideshow/computable-content/77159175 pacoid@slideshare.net(pacoid) Computable Content pacoid Lessons learned from 3 (going on 4) generations of Jupyter use cases at O'Reilly Media. In particular, about "Oriole" tutorials which combine video with Jupyter notebooks, Docker containers, backed by services managed on a cluster by Marathon, Mesos, Redis, and Nginx. https://conferences.oreilly.com/fluent/fl-ca/public/schedule/detail/62859 https://conferences.oreilly.com/velocity/vl-ca/public/schedule/detail/62858 <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/computablecontent-170622004825-thumbnail.jpg?width=120&height=120&fit=bounds" /> Lessons learned from 3 (going on 4) generations of Jupyter use cases at O'Reilly Media. In particular, about "Oriole" tutorials which combine video with Jupyter notebooks, Docker containers, backed by services managed on a cluster by Marathon, Mesos, Redis, and Nginx. https://conferences.oreilly.com/fluent/fl-ca/public/schedule/detail/62859 https://conferences.oreilly.com/velocity/vl-ca/public/schedule/detail/62858

Computable Content from Paco Nathan

]]> 678 5 https://cdn.slidesharecdn.com/ss_thumbnails/computablecontent-170622004825-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 Computable Content: Lessons Learned /slideshow/computable-content-lessons-learned/76339257 computablecontent-170525133551
Strata UK 2017. Computable content leverages Jupyter notebooks to make learning materials more powerful by integrating compute engines, data sources, etc. O’Reilly Media extended this approach to create the new Oriole Online Tutorial medium, publishing notebooks from authors along with video timelines. (A free public tutorial, Regex Golf, by Peter Norvig demonstrates what’s possible with this technology integration.) Each user session launches a Docker container on a Mesos cluster for fully personalized compute environments. The UX is entirely browser based.]]>
Strata UK 2017. Computable content leverages Jupyter notebooks to make learning materials more powerful by integrating compute engines, data sources, etc. O’Reilly Media extended this approach to create the new Oriole Online Tutorial medium, publishing notebooks from authors along with video timelines. (A free public tutorial, Regex Golf, by Peter Norvig demonstrates what’s possible with this technology integration.) Each user session launches a Docker container on a Mesos cluster for fully personalized compute environments. The UX is entirely browser based.]]> Thu, 25 May 2017 13:35:51 GMT /slideshow/computable-content-lessons-learned/76339257 pacoid@slideshare.net(pacoid) Computable Content: Lessons Learned pacoid Strata UK 2017. Computable content leverages Jupyter notebooks to make learning materials more powerful by integrating compute engines, data sources, etc. O’Reilly Media extended this approach to create the new Oriole Online Tutorial medium, publishing notebooks from authors along with video timelines. (A free public tutorial, Regex Golf, by Peter Norvig demonstrates what’s possible with this technology integration.) Each user session launches a Docker container on a Mesos cluster for fully personalized compute environments. The UX is entirely browser based. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/computablecontent-170525133551-thumbnail.jpg?width=120&height=120&fit=bounds" /> Strata UK 2017. Computable content leverages Jupyter notebooks to make learning materials more powerful by integrating compute engines, data sources, etc. O’Reilly Media extended this approach to create the new Oriole Online Tutorial medium, publishing notebooks from authors along with video timelines. (A free public tutorial, Regex Golf, by Peter Norvig demonstrates what’s possible with this technology integration.) Each user session launches a Docker container on a Mesos cluster for fully personalized compute environments. The UX is entirely browser based.

Computable Content: Lessons Learned from Paco Nathan

]]> 677 5 https://cdn.slidesharecdn.com/ss_thumbnails/computablecontent-170525133551-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 SF Python Meetup: TextRank in Python /slideshow/sf-python-meetup-textrank-in-python/71979383 sfpython-170209212325
See 2020 update: https://derwen.ai/s/h88s SF Python Meetup, 2017-02-08 https://www.meetup.com/sfpython/events/237153246/ PyTextRank is a pure Python open source implementation of *TextRank*, based on the [Mihalcea 2004 paper](http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) -- a graph algorithm which produces ranked keyphrases from texts. Keyphrases generally more useful than simple keyword extraction. PyTextRank integrates use of `TextBlob` and `SpaCy` for NLP analysis of texts, including full parse, named entity extraction, etc. It also produces auto-summarization of texts, making use of an approximation algorithm, `MinHash`, for better performance at scale. Overall, the package is intended to complement machine learning approaches -- specifically deep learning used for custom search and recommendations -- by developing better feature vectors from raw texts. This package is in production use at O'Reilly Media for text analytics.]]>
See 2020 update: https://derwen.ai/s/h88s SF Python Meetup, 2017-02-08 https://www.meetup.com/sfpython/events/237153246/ PyTextRank is a pure Python open source implementation of *TextRank*, based on the [Mihalcea 2004 paper](http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) -- a graph algorithm which produces ranked keyphrases from texts. Keyphrases generally more useful than simple keyword extraction. PyTextRank integrates use of `TextBlob` and `SpaCy` for NLP analysis of texts, including full parse, named entity extraction, etc. It also produces auto-summarization of texts, making use of an approximation algorithm, `MinHash`, for better performance at scale. Overall, the package is intended to complement machine learning approaches -- specifically deep learning used for custom search and recommendations -- by developing better feature vectors from raw texts. This package is in production use at O'Reilly Media for text analytics.]]> Thu, 09 Feb 2017 21:23:25 GMT /slideshow/sf-python-meetup-textrank-in-python/71979383 pacoid@slideshare.net(pacoid) SF Python Meetup: TextRank in Python pacoid See 2020 update: https://derwen.ai/s/h88s SF Python Meetup, 2017-02-08 https://www.meetup.com/sfpython/events/237153246/ PyTextRank is a pure Python open source implementation of *TextRank*, based on the [Mihalcea 2004 paper](http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) -- a graph algorithm which produces ranked keyphrases from texts. Keyphrases generally more useful than simple keyword extraction. PyTextRank integrates use of `TextBlob` and `SpaCy` for NLP analysis of texts, including full parse, named entity extraction, etc. It also produces auto-summarization of texts, making use of an approximation algorithm, `MinHash`, for better performance at scale. Overall, the package is intended to complement machine learning approaches -- specifically deep learning used for custom search and recommendations -- by developing better feature vectors from raw texts. This package is in production use at O'Reilly Media for text analytics. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sfpython-170209212325-thumbnail.jpg?width=120&height=120&fit=bounds" /> See 2020 update: https://derwen.ai/s/h88s SF Python Meetup, 2017-02-08 https://www.meetup.com/sfpython/events/237153246/ PyTextRank is a pure Python open source implementation of *TextRank*, based on the [Mihalcea 2004 paper](http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) -- a graph algorithm which produces ranked keyphrases from texts. Keyphrases generally more useful than simple keyword extraction. PyTextRank integrates use of `TextBlob` and `SpaCy` for NLP analysis of texts, including full parse, named entity extraction, etc. It also produces auto-summarization of texts, making use of an approximation algorithm, `MinHash`, for better performance at scale. Overall, the package is intended to complement machine learning approaches -- specifically deep learning used for custom search and recommendations -- by developing better feature vectors from raw texts. This package is in production use at O'Reilly Media for text analytics.

SF Python Meetup: TextRank in Python from Paco Nathan

]]> 5766 4 https://cdn.slidesharecdn.com/ss_thumbnails/sfpython-170209212325-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 Use of standards and related issues in predictive analytics /slideshow/use-of-standards-and-related-issues-in-predictive-analytics/65062434 kddtalk-160816213404
My presentation at KDD 2016 in SF, in the "Special Session on Standards in Predictive Analytics In the Era of Big and Fast Data" morning track about PMML and PFA http://dmg.org/kdd2016.html]]>
My presentation at KDD 2016 in SF, in the "Special Session on Standards in Predictive Analytics In the Era of Big and Fast Data" morning track about PMML and PFA http://dmg.org/kdd2016.html]]> Tue, 16 Aug 2016 21:34:04 GMT /slideshow/use-of-standards-and-related-issues-in-predictive-analytics/65062434 pacoid@slideshare.net(pacoid) Use of standards and related issues in predictive analytics pacoid My presentation at KDD 2016 in SF, in the "Special Session on Standards in Predictive Analytics In the Era of Big and Fast Data" morning track about PMML and PFA http://dmg.org/kdd2016.html <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/kddtalk-160816213404-thumbnail.jpg?width=120&height=120&fit=bounds" /> My presentation at KDD 2016 in SF, in the "Special Session on Standards in Predictive Analytics In the Era of Big and Fast Data" morning track about PMML and PFA http://dmg.org/kdd2016.html

Use of standards and related issues in predictive analytics from Paco Nathan

]]> 2017 3 https://cdn.slidesharecdn.com/ss_thumbnails/kddtalk-160816213404-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 Data Science in 2016: Moving Up /slideshow/data-science-in-2016-moving-up/53959314 bds2015-151015065936-lva1-app6892
A keynote presentation for Big Data Spain 2015 in Madrid, 2015-10-15 http://www.bigdataspain.org/program/]]>
A keynote presentation for Big Data Spain 2015 in Madrid, 2015-10-15 http://www.bigdataspain.org/program/]]> Thu, 15 Oct 2015 06:59:36 GMT /slideshow/data-science-in-2016-moving-up/53959314 pacoid@slideshare.net(pacoid) Data Science in 2016: Moving Up pacoid A keynote presentation for Big Data Spain 2015 in Madrid, 2015-10-15 http://www.bigdataspain.org/program/ <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/bds2015-151015065936-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds" /> A keynote presentation for Big Data Spain 2015 in Madrid, 2015-10-15 http://www.bigdataspain.org/program/

Data Science in 2016: Moving Up from Paco Nathan

]]> 10526 8 https://cdn.slidesharecdn.com/ss_thumbnails/bds2015-151015065936-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 Data Science Reinvents Learning? /slideshow/data-science-reinvents-learning/52032394 acmlearning-150825050043-lva1-app6892
Presented 2015-08-24 at SF Bay ACM, held at the eBay south campus in San Jose. http://meetup.com/SF-Bay-ACM/events/221693508/ Project Jupiter https://jupyter.org/ evolved from IPython notebooks, and now supports a wide variety of programming language back-ends. Notebooks have proven to be effective tools used in Data Science, providing convenient packages for what Don Knuth coined as "literate programming" in the 1980s: code plus exposition in markdown. Results of running the code appear in-line as interactive graphics -- all packaged as collaborative, web-based documents. Some have said that the introduction of cloud-based notebooks is nearly as large of a fundamental change in software practice as the introduction of spreadsheets. O'Reilly Media has been considering the question, "What comes after books and video?" Or, as one might imagine more pointedly, what comes after Kindle? To that point we have collaborated with Project Jupyter to integrate notebooks into our content management process, allowing authors to generate articles, tutorials, reports, and other media products as notebooks that also incorporate video segments. Code dependencies are containerized using Docker, and all of the content gets managed in Git repositories. We have added another layer, an open source project called Thebe that provides a kind of "media player" for embedding the containerized notebooks into web pages]]>
Presented 2015-08-24 at SF Bay ACM, held at the eBay south campus in San Jose. http://meetup.com/SF-Bay-ACM/events/221693508/ Project Jupiter https://jupyter.org/ evolved from IPython notebooks, and now supports a wide variety of programming language back-ends. Notebooks have proven to be effective tools used in Data Science, providing convenient packages for what Don Knuth coined as "literate programming" in the 1980s: code plus exposition in markdown. Results of running the code appear in-line as interactive graphics -- all packaged as collaborative, web-based documents. Some have said that the introduction of cloud-based notebooks is nearly as large of a fundamental change in software practice as the introduction of spreadsheets. O'Reilly Media has been considering the question, "What comes after books and video?" Or, as one might imagine more pointedly, what comes after Kindle? To that point we have collaborated with Project Jupyter to integrate notebooks into our content management process, allowing authors to generate articles, tutorials, reports, and other media products as notebooks that also incorporate video segments. Code dependencies are containerized using Docker, and all of the content gets managed in Git repositories. We have added another layer, an open source project called Thebe that provides a kind of "media player" for embedding the containerized notebooks into web pages]]> Tue, 25 Aug 2015 05:00:43 GMT /slideshow/data-science-reinvents-learning/52032394 pacoid@slideshare.net(pacoid) Data Science Reinvents Learning? pacoid Presented 2015-08-24 at SF Bay ACM, held at the eBay south campus in San Jose. http://meetup.com/SF-Bay-ACM/events/221693508/ Project Jupiter https://jupyter.org/ evolved from IPython notebooks, and now supports a wide variety of programming language back-ends. Notebooks have proven to be effective tools used in Data Science, providing convenient packages for what Don Knuth coined as "literate programming" in the 1980s: code plus exposition in markdown. Results of running the code appear in-line as interactive graphics -- all packaged as collaborative, web-based documents. Some have said that the introduction of cloud-based notebooks is nearly as large of a fundamental change in software practice as the introduction of spreadsheets. O'Reilly Media has been considering the question, "What comes after books and video?" Or, as one might imagine more pointedly, what comes after Kindle? To that point we have collaborated with Project Jupyter to integrate notebooks into our content management process, allowing authors to generate articles, tutorials, reports, and other media products as notebooks that also incorporate video segments. Code dependencies are containerized using Docker, and all of the content gets managed in Git repositories. We have added another layer, an open source project called Thebe that provides a kind of "media player" for embedding the containerized notebooks into web pages <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/acmlearning-150825050043-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds" /> Presented 2015-08-24 at SF Bay ACM, held at the eBay south campus in San Jose. http://meetup.com/SF-Bay-ACM/events/221693508/ Project Jupiter https://jupyter.org/ evolved from IPython notebooks, and now supports a wide variety of programming language back-ends. Notebooks have proven to be effective tools used in Data Science, providing convenient packages for what Don Knuth coined as "literate programming" in the 1980s: code plus exposition in markdown. Results of running the code appear in-line as interactive graphics -- all packaged as collaborative, web-based documents. Some have said that the introduction of cloud-based notebooks is nearly as large of a fundamental change in software practice as the introduction of spreadsheets. O'Reilly Media has been considering the question, "What comes after books and video?" Or, as one might imagine more pointedly, what comes after Kindle? To that point we have collaborated with Project Jupyter to integrate notebooks into our content management process, allowing authors to generate articles, tutorials, reports, and other media products as notebooks that also incorporate video segments. Code dependencies are containerized using Docker, and all of the content gets managed in Git repositories. We have added another layer, an open source project called Thebe that provides a kind of "media player" for embedding the containerized notebooks into web pages

Data Science Reinvents Learning? from Paco Nathan

]]> 6133 14 https://cdn.slidesharecdn.com/ss_thumbnails/acmlearning-150825050043-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 Jupyter for Education: Beyond Gutenberg and Erasmus /slideshow/jupyter-for-education-beyond-gutenberg-and-erasmus/50988595 pydatasponsor-150727212815-lva1-app6891
PyData Seattle 2015 sponsored talk about O'Reilly Learning]]>
PyData Seattle 2015 sponsored talk about O'Reilly Learning]]> Mon, 27 Jul 2015 21:28:15 GMT /slideshow/jupyter-for-education-beyond-gutenberg-and-erasmus/50988595 pacoid@slideshare.net(pacoid) Jupyter for Education: Beyond Gutenberg and Erasmus pacoid PyData Seattle 2015 sponsored talk about O'Reilly Learning <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/pydatasponsor-150727212815-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds" /> PyData Seattle 2015 sponsored talk about O'Reilly Learning

Jupyter for Education: Beyond Gutenberg and Erasmus from Paco Nathan

]]> 10545 8 https://cdn.slidesharecdn.com/ss_thumbnails/pydatasponsor-150727212815-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 GalvanizeU Seattle: Eleven Almost-Truisms About Data /slideshow/galvanizeu-seattle-eleven-almosttruisms-about-data/50986686 galvseattle-150727202446-lva1-app6892
http://www.meetup.com/Seattle-Data-Science/events/223445403/ Almost a dozen almost-truisms about Data that almost everyone should consider carefully as they embark on a journey into Data Science. There are a number of preconceptions about working with data at scale where the realities beg to differ. This talk estimates that number to be at least eleven, through probably much larger. At least that number has a great line from a movie. Let's consider some of the less-intuitive directions in which this field is heading, along with likely consequences and corollaries -- especially for those who are just now beginning to study about the technologies, the processes, and the people involved. ]]>
http://www.meetup.com/Seattle-Data-Science/events/223445403/ Almost a dozen almost-truisms about Data that almost everyone should consider carefully as they embark on a journey into Data Science. There are a number of preconceptions about working with data at scale where the realities beg to differ. This talk estimates that number to be at least eleven, through probably much larger. At least that number has a great line from a movie. Let's consider some of the less-intuitive directions in which this field is heading, along with likely consequences and corollaries -- especially for those who are just now beginning to study about the technologies, the processes, and the people involved. ]]> Mon, 27 Jul 2015 20:24:45 GMT /slideshow/galvanizeu-seattle-eleven-almosttruisms-about-data/50986686 pacoid@slideshare.net(pacoid) GalvanizeU Seattle: Eleven Almost-Truisms About Data pacoid http://www.meetup.com/Seattle-Data-Science/events/223445403/ Almost a dozen almost-truisms about Data that almost everyone should consider carefully as they embark on a journey into Data Science. There are a number of preconceptions about working with data at scale where the realities beg to differ. This talk estimates that number to be at least eleven, through probably much larger. At least that number has a great line from a movie. Let's consider some of the less-intuitive directions in which this field is heading, along with likely consequences and corollaries -- especially for those who are just now beginning to study about the technologies, the processes, and the people involved. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/galvseattle-150727202446-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds" /> http://www.meetup.com/Seattle-Data-Science/events/223445403/ Almost a dozen almost-truisms about Data that almost everyone should consider carefully as they embark on a journey into Data Science. There are a number of preconceptions about working with data at scale where the realities beg to differ. This talk estimates that number to be at least eleven, through probably much larger. At least that number has a great line from a movie. Let's consider some of the less-intuitive directions in which this field is heading, along with likely consequences and corollaries -- especially for those who are just now beginning to study about the technologies, the processes, and the people involved.

GalvanizeU Seattle: Eleven Almost-Truisms About Data from Paco Nathan

]]> 9296 11 https://cdn.slidesharecdn.com/ss_thumbnails/galvseattle-150727202446-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 Microservices, containers, and machine learning /slideshow/microservices-containers-and-machine-learning-50862677/50862677 msvccontainml-150723205648-lva1-app6892
http://www.oscon.com/open-source-2015/public/schedule/detail/41579 In this presentation, an open source developer community considers itself algorithmically. This shows how to surface data insights from the developer email forums for just about any Apache open source project. It leverages advanced techniques for natural language processing, machine learning, graph algorithms, time series analysis, etc. As an example, we use data from the Apache Spark email list archives to help understand its community better; however, the code can be applied to many other communities. Exsto is an open source project that demonstrates Apache Spark workflow examples for SQL-based ETL (Spark SQL), machine learning (MLlib), and graph algorithms (GraphX). It surfaces insights about developer communities from their email forums. Natural language processing services in Python (based on NLTK, TextBlob, WordNet, etc.), gets containerized and used to crawl and parse email archives. These produce JSON data sets, then we run machine learning on a Spark cluster to find out insights such as: * What are the trending topic summaries? * Who are the leaders in the community for various topics? * Who discusses most frequently with whom? This talk shows how to use cloud-based notebooks for organizing and running the analytics and visualizations. It reviews the background for how and why the graph analytics and machine learning algorithms generalize patterns within the data — based on open source implementations for two advanced approaches, Word2Vec and TextRank The talk also illustrates best practices for leveraging functional programming for big data.]]>
http://www.oscon.com/open-source-2015/public/schedule/detail/41579 In this presentation, an open source developer community considers itself algorithmically. This shows how to surface data insights from the developer email forums for just about any Apache open source project. It leverages advanced techniques for natural language processing, machine learning, graph algorithms, time series analysis, etc. As an example, we use data from the Apache Spark email list archives to help understand its community better; however, the code can be applied to many other communities. Exsto is an open source project that demonstrates Apache Spark workflow examples for SQL-based ETL (Spark SQL), machine learning (MLlib), and graph algorithms (GraphX). It surfaces insights about developer communities from their email forums. Natural language processing services in Python (based on NLTK, TextBlob, WordNet, etc.), gets containerized and used to crawl and parse email archives. These produce JSON data sets, then we run machine learning on a Spark cluster to find out insights such as: * What are the trending topic summaries? * Who are the leaders in the community for various topics? * Who discusses most frequently with whom? This talk shows how to use cloud-based notebooks for organizing and running the analytics and visualizations. It reviews the background for how and why the graph analytics and machine learning algorithms generalize patterns within the data — based on open source implementations for two advanced approaches, Word2Vec and TextRank The talk also illustrates best practices for leveraging functional programming for big data.]]> Thu, 23 Jul 2015 20:56:48 GMT /slideshow/microservices-containers-and-machine-learning-50862677/50862677 pacoid@slideshare.net(pacoid) Microservices, containers, and machine learning pacoid http://www.oscon.com/open-source-2015/public/schedule/detail/41579 In this presentation, an open source developer community considers itself algorithmically. This shows how to surface data insights from the developer email forums for just about any Apache open source project. It leverages advanced techniques for natural language processing, machine learning, graph algorithms, time series analysis, etc. As an example, we use data from the Apache Spark email list archives to help understand its community better; however, the code can be applied to many other communities. Exsto is an open source project that demonstrates Apache Spark workflow examples for SQL-based ETL (Spark SQL), machine learning (MLlib), and graph algorithms (GraphX). It surfaces insights about developer communities from their email forums. Natural language processing services in Python (based on NLTK, TextBlob, WordNet, etc.), gets containerized and used to crawl and parse email archives. These produce JSON data sets, then we run machine learning on a Spark cluster to find out insights such as: * What are the trending topic summaries? * Who are the leaders in the community for various topics? * Who discusses most frequently with whom? This talk shows how to use cloud-based notebooks for organizing and running the analytics and visualizations. It reviews the background for how and why the graph analytics and machine learning algorithms generalize patterns within the data — based on open source implementations for two advanced approaches, Word2Vec and TextRank The talk also illustrates best practices for leveraging functional programming for big data. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/msvccontainml-150723205648-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds" /> http://www.oscon.com/open-source-2015/public/schedule/detail/41579 In this presentation, an open source developer community considers itself algorithmically. This shows how to surface data insights from the developer email forums for just about any Apache open source project. It leverages advanced techniques for natural language processing, machine learning, graph algorithms, time series analysis, etc. As an example, we use data from the Apache Spark email list archives to help understand its community better; however, the code can be applied to many other communities. Exsto is an open source project that demonstrates Apache Spark workflow examples for SQL-based ETL (Spark SQL), machine learning (MLlib), and graph algorithms (GraphX). It surfaces insights about developer communities from their email forums. Natural language processing services in Python (based on NLTK, TextBlob, WordNet, etc.), gets containerized and used to crawl and parse email archives. These produce JSON data sets, then we run machine learning on a Spark cluster to find out insights such as: * What are the trending topic summaries? * Who are the leaders in the community for various topics? * Who discusses most frequently with whom? This talk shows how to use cloud-based notebooks for organizing and running the analytics and visualizations. It reviews the background for how and why the graph analytics and machine learning algorithms generalize patterns within the data — based on open source implementations for two advanced approaches, Word2Vec and TextRank The talk also illustrates best practices for leveraging functional programming for big data.

Microservices, containers, and machine learning from Paco Nathan

]]> 18506 16 https://cdn.slidesharecdn.com/ss_thumbnails/msvccontainml-150723205648-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 GraphX: Graph analytics for insights about developer communities /slideshow/graphx-graph-analytics-for-insights-about-developer-communities/49169806 pacographanalytics-150609121430-lva1-app6891
]]>
]]> Tue, 09 Jun 2015 12:14:30 GMT /slideshow/graphx-graph-analytics-for-insights-about-developer-communities/49169806 pacoid@slideshare.net(pacoid) GraphX: Graph analytics for insights about developer communities pacoid <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/pacographanalytics-150609121430-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds" />

GraphX: Graph analytics for insights about developer communities from Paco Nathan

]]> 11850 3 https://cdn.slidesharecdn.com/ss_thumbnails/pacographanalytics-150609121430-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 Graph Analytics in Spark /slideshow/graph-analytics-in-spark/49018093 graphanalytics-150605034355-lva1-app6891
https://www.eventbrite.com/e/talk-by-paco-nathan-graph-analytics-in-spark-tickets-17173189472 Big Brains meetup hosted by BloomReach, 2015-06-04 Case study / demo of a large-scale graph analytics project, leveraging GraphX in Apache Spark to surface insights about open source developer communities — based on data mining of their email forums. The project works with any Apache email archive, applying NLP and machine learning techniques to analyze message threads, then constructs a large graph. Graph analytics, based on concise Scala coding examples in Spark, surface themes and interactions within the community. Results are used as feedback for respective developer communities, such as leaderboards, etc. As an example, we will examine analysis of the Spark developer community itself.]]>
https://www.eventbrite.com/e/talk-by-paco-nathan-graph-analytics-in-spark-tickets-17173189472 Big Brains meetup hosted by BloomReach, 2015-06-04 Case study / demo of a large-scale graph analytics project, leveraging GraphX in Apache Spark to surface insights about open source developer communities — based on data mining of their email forums. The project works with any Apache email archive, applying NLP and machine learning techniques to analyze message threads, then constructs a large graph. Graph analytics, based on concise Scala coding examples in Spark, surface themes and interactions within the community. Results are used as feedback for respective developer communities, such as leaderboards, etc. As an example, we will examine analysis of the Spark developer community itself.]]> Fri, 05 Jun 2015 03:43:55 GMT /slideshow/graph-analytics-in-spark/49018093 pacoid@slideshare.net(pacoid) Graph Analytics in Spark pacoid https://www.eventbrite.com/e/talk-by-paco-nathan-graph-analytics-in-spark-tickets-17173189472 Big Brains meetup hosted by BloomReach, 2015-06-04 Case study / demo of a large-scale graph analytics project, leveraging GraphX in Apache Spark to surface insights about open source developer communities — based on data mining of their email forums. The project works with any Apache email archive, applying NLP and machine learning techniques to analyze message threads, then constructs a large graph. Graph analytics, based on concise Scala coding examples in Spark, surface themes and interactions within the community. Results are used as feedback for respective developer communities, such as leaderboards, etc. As an example, we will examine analysis of the Spark developer community itself. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/graphanalytics-150605034355-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds" /> https://www.eventbrite.com/e/talk-by-paco-nathan-graph-analytics-in-spark-tickets-17173189472 Big Brains meetup hosted by BloomReach, 2015-06-04 Case study / demo of a large-scale graph analytics project, leveraging GraphX in Apache Spark to surface insights about open source developer communities — based on data mining of their email forums. The project works with any Apache email archive, applying NLP and machine learning techniques to analyze message threads, then constructs a large graph. Graph analytics, based on concise Scala coding examples in Spark, surface themes and interactions within the community. Results are used as feedback for respective developer communities, such as leaderboards, etc. As an example, we will examine analysis of the Spark developer community itself.

Graph Analytics in Spark from Paco Nathan

]]> 22115 7 https://cdn.slidesharecdn.com/ss_thumbnails/graphanalytics-150605034355-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 Apache Spark and the Emerging Technology Landscape for Big Data /slideshow/apache-spark-and-the-emerging-technology-landscape-for-big-data/47701208 acoruna-150503125933-conversion-gate01
Keynote presentation at Universidade da Coruña on  2015-05-27 for the Apache Spark tutorial]]>
Keynote presentation at Universidade da Coruña on  2015-05-27 for the Apache Spark tutorial]]> Sun, 03 May 2015 12:59:32 GMT /slideshow/apache-spark-and-the-emerging-technology-landscape-for-big-data/47701208 pacoid@slideshare.net(pacoid) Apache Spark and the Emerging Technology Landscape for Big Data pacoid Keynote presentation at Universidade da Coruña on  2015-05-27 for the Apache Spark tutorial <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/acoruna-150503125933-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds" /> Keynote presentation at Universidade da Coruña on  2015-05-27 for the Apache Spark tutorial

Apache Spark and the Emerging Technology Landscape for Big Data from Paco Nathan

]]> 3305 4 https://cdn.slidesharecdn.com/ss_thumbnails/acoruna-150503125933-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 QCon São Paulo: Real-Time Analytics with Spark Streaming /pacoid/qcon-so-paulo-realtime-analytics-with-spark-streaming spstreaming-150326061042-conversion-gate01
"Real-Time Analytics with Spark Streaming" presented at QCon São Paulo, 2015-03-26 http://qconsp.com/presentation/real-time-analytics-spark-streaming This talk presents an overview of Spark and its history and applications, then focuses on the Spark Streaming component used for real-time analytics. We compare it with earlier frameworks such as MillWheel and Storm, and explore industry motivations for open-source micro-batch streaming at scale. The talk will include demos for streaming apps that include machine-learning examples. We also consider public case studies of production deployments at scale. We’ll review the use of open-source sketch algorithms and probabilistic data structures that get leveraged in streaming – for example, the trade-off of 4% error bounds on real-time metrics for two orders of magnitude reduction in required memory footprint of a Spark app.]]>
"Real-Time Analytics with Spark Streaming" presented at QCon São Paulo, 2015-03-26 http://qconsp.com/presentation/real-time-analytics-spark-streaming This talk presents an overview of Spark and its history and applications, then focuses on the Spark Streaming component used for real-time analytics. We compare it with earlier frameworks such as MillWheel and Storm, and explore industry motivations for open-source micro-batch streaming at scale. The talk will include demos for streaming apps that include machine-learning examples. We also consider public case studies of production deployments at scale. We’ll review the use of open-source sketch algorithms and probabilistic data structures that get leveraged in streaming – for example, the trade-off of 4% error bounds on real-time metrics for two orders of magnitude reduction in required memory footprint of a Spark app.]]> Thu, 26 Mar 2015 06:10:42 GMT /pacoid/qcon-so-paulo-realtime-analytics-with-spark-streaming pacoid@slideshare.net(pacoid) QCon São Paulo: Real-Time Analytics with Spark Streaming pacoid "Real-Time Analytics with Spark Streaming" presented at QCon São Paulo, 2015-03-26 http://qconsp.com/presentation/real-time-analytics-spark-streaming This talk presents an overview of Spark and its history and applications, then focuses on the Spark Streaming component used for real-time analytics. We compare it with earlier frameworks such as MillWheel and Storm, and explore industry motivations for open-source micro-batch streaming at scale. The talk will include demos for streaming apps that include machine-learning examples. We also consider public case studies of production deployments at scale. We’ll review the use of open-source sketch algorithms and probabilistic data structures that get leveraged in streaming – for example, the trade-off of 4% error bounds on real-time metrics for two orders of magnitude reduction in required memory footprint of a Spark app. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/spstreaming-150326061042-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds" /> "Real-Time Analytics with Spark Streaming" presented at QCon São Paulo, 2015-03-26 http://qconsp.com/presentation/real-time-analytics-spark-streaming This talk presents an overview of Spark and its history and applications, then focuses on the Spark Streaming component used for real-time analytics. We compare it with earlier frameworks such as MillWheel and Storm, and explore industry motivations for open-source micro-batch streaming at scale. The talk will include demos for streaming apps that include machine-learning examples. We also consider public case studies of production deployments at scale. We’ll review the use of open-source sketch algorithms and probabilistic data structures that get leveraged in streaming – for example, the trade-off of 4% error bounds on real-time metrics for two orders of magnitude reduction in required memory footprint of a Spark app.

QCon S達o Paulo: Real-Time Analytics with Spark Streaming from Paco Nathan

]]> 19733 20 https://cdn.slidesharecdn.com/ss_thumbnails/spstreaming-150326061042-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More /slideshow/strata-2015-data-preview-spark-data-visualization-yarn-and-more/44281964 sparkcampwebcast-150204164754-conversion-gate01
Spark and Databricks component of the O'Reilly Media webcast "2015 Data Preview: Spark, Data Visualization, YARN, and More", as a preview of the 2015 Strata + Hadoop World conference in San Jose http://www.oreilly.com/pub/e/3289]]>
Spark and Databricks component of the O'Reilly Media webcast "2015 Data Preview: Spark, Data Visualization, YARN, and More", as a preview of the 2015 Strata + Hadoop World conference in San Jose http://www.oreilly.com/pub/e/3289]]> Wed, 04 Feb 2015 16:47:54 GMT /slideshow/strata-2015-data-preview-spark-data-visualization-yarn-and-more/44281964 pacoid@slideshare.net(pacoid) Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More pacoid Spark and Databricks component of the O'Reilly Media webcast "2015 Data Preview: Spark, Data Visualization, YARN, and More", as a preview of the 2015 Strata + Hadoop World conference in San Jose http://www.oreilly.com/pub/e/3289 <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sparkcampwebcast-150204164754-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds" /> Spark and Databricks component of the O'Reilly Media webcast "2015 Data Preview: Spark, Data Visualization, YARN, and More", as a preview of the 2015 Strata + Hadoop World conference in San Jose http://www.oreilly.com/pub/e/3289

Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More from Paco Nathan

]]> 3703 6 https://cdn.slidesharecdn.com/ss_thumbnails/sparkcampwebcast-150204164754-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 A New Year in Data Science: ML Unpaused /pacoid/a-new-year-in-data-science-ml-unpaused ddtxkeynote-150110141622-conversion-gate01
Data Day Texas 2015 keynote talk http://datadaytexas.com/]]>
Data Day Texas 2015 keynote talk http://datadaytexas.com/]]> Sat, 10 Jan 2015 14:16:22 GMT /pacoid/a-new-year-in-data-science-ml-unpaused pacoid@slideshare.net(pacoid) A New Year in Data Science: ML Unpaused pacoid Data Day Texas 2015 keynote talk http://datadaytexas.com/ <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/ddtxkeynote-150110141622-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds" /> Data Day Texas 2015 keynote talk http://datadaytexas.com/

A New Year in Data Science: ML Unpaused from Paco Nathan

]]> 20900 9 https://cdn.slidesharecdn.com/ss_thumbnails/ddtxkeynote-150110141622-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

1 https://cdn.slidesharecdn.com/profile-photo-pacoid-48x48.jpg?cb=1712680364 Known as a "player/coach", with core expertise in data science, natural language processing, machine learning, cloud computing; 35+ years tech industry experience, ranging from Bell Labs to early-stage start-ups. Co-chair Rev. Advisor for Amplify Partners, Deep Learning Analytics, Primer, Data Spartan, Recognai. Recent roles: Director, Learning Group @ O'Reilly Media; Director, Community Evangelism @ Databricks and Apache Spark. Cited in 2015 as one of the Top 30 People in Big Data and Analytics by Innovation Enterprise. derwen.ai/paco https://cdn.slidesharecdn.com/ss_thumbnails/hitlstrata-180309162919-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/human-in-the-loop-a-design-pattern-for-managing-teams-working-with-ml-90169772/90169772 Human in the loop: a d... https://cdn.slidesharecdn.com/ss_thumbnails/hitlpxn-171206083050-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/humanintheloop-a-design-pattern-for-managing-teams-that-leverage-ml/83469606 Human-in-the-loop: a d... https://cdn.slidesharecdn.com/ss_thumbnails/hitlpxn-171116193157-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/humaninaloop-a-design-pattern-for-managing-teams-which-leverage-ml-82181008/82181008 Human-in-a-loop: a des...