�ݺ�ߣshows by User: CliffClick

�ݺ�ߣshows by User: CliffClick / http://www.slideshare.net/images/logo.gif �ݺ�ߣshows by User: CliffClick / Mon, 15 Sep 2014 12:23:41 GMT �ݺ�ߣShare feed for �ݺ�ߣshows by User: CliffClick Building a Big Data Machine Learning Platform /slideshow/2014-09-jzone/39109505 201409jzone-140915122341-phpapp02
H2O - It's open source, in-memory, big data, clustered computing - Math At Scale. We got the Worlds Fastest Logistic Regression (by a lot!), world's first (and fastest) distributed Gradient Boosted Method (GBM), plus Random Forest, PCA, KMeans++, etc... R's "plyr" style data munging at-scale, including ddply (Group-By for you SQL'rs) and much of R's expressive coding style. We built H2O, an open-source platform for working with in-memory distributed data. Then we built on top of H2O state-of-the-art predictive modeling and analytics (e.g. GLM & Logistic Regression, GBM, Random Forest, Neural Nets, PCA to name a few) that's 1000x faster than the disk-bound alternatives, and 100x faster than R (we love R but it's tooo slow on big data!). We can run R expressions on tera-scale datasets, or munge data from Scala & Python. We're building our newest algorithms in a few weeks, start to finish, because the platform makes Big Math easy. We routinely test on 100G datasets, have customers using 1T datasets. This talk is about the platform, coding style & API that lets us seamlessly deal with datasets from 1K to 1TB without changing a line of code, lets us use clusters ranging from your laptop to 100 server clusters with many many TB of ram and hundreds of CPUs.]]>
H2O - It's open source, in-memory, big data, clustered computing - Math At Scale. We got the Worlds Fastest Logistic Regression (by a lot!), world's first (and fastest) distributed Gradient Boosted Method (GBM), plus Random Forest, PCA, KMeans++, etc... R's "plyr" style data munging at-scale, including ddply (Group-By for you SQL'rs) and much of R's expressive coding style. We built H2O, an open-source platform for working with in-memory distributed data. Then we built on top of H2O state-of-the-art predictive modeling and analytics (e.g. GLM & Logistic Regression, GBM, Random Forest, Neural Nets, PCA to name a few) that's 1000x faster than the disk-bound alternatives, and 100x faster than R (we love R but it's tooo slow on big data!). We can run R expressions on tera-scale datasets, or munge data from Scala & Python. We're building our newest algorithms in a few weeks, start to finish, because the platform makes Big Math easy. We routinely test on 100G datasets, have customers using 1T datasets. This talk is about the platform, coding style & API that lets us seamlessly deal with datasets from 1K to 1TB without changing a line of code, lets us use clusters ranging from your laptop to 100 server clusters with many many TB of ram and hundreds of CPUs.]]> Mon, 15 Sep 2014 12:23:41 GMT /slideshow/2014-09-jzone/39109505 CliffClick@slideshare.net(CliffClick) Building a Big Data Machine Learning Platform CliffClick H2O - It's open source, in-memory, big data, clustered computing - Math At Scale. We got the Worlds Fastest Logistic Regression (by a lot!), world's first (and fastest) distributed Gradient Boosted Method (GBM), plus Random Forest, PCA, KMeans++, etc... R's "plyr" style data munging at-scale, including ddply (Group-By for you SQL'rs) and much of R's expressive coding style. We built H2O, an open-source platform for working with in-memory distributed data. Then we built on top of H2O state-of-the-art predictive modeling and analytics (e.g. GLM & Logistic Regression, GBM, Random Forest, Neural Nets, PCA to name a few) that's 1000x faster than the disk-bound alternatives, and 100x faster than R (we love R but it's tooo slow on big data!). We can run R expressions on tera-scale datasets, or munge data from Scala & Python. We're building our newest algorithms in a few weeks, start to finish, because the platform makes Big Math easy. We routinely test on 100G datasets, have customers using 1T datasets. This talk is about the platform, coding style & API that lets us seamlessly deal with datasets from 1K to 1TB without changing a line of code, lets us use clusters ranging from your laptop to 100 server clusters with many many TB of ram and hundreds of CPUs. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/201409jzone-140915122341-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds" /><br> H2O - It's open source, in-memory, big data, clustered computing - Math At Scale. We got the Worlds Fastest Logistic Regression (by a lot!), world's first (and fastest) distributed Gradient Boosted Method (GBM), plus Random Forest, PCA, KMeans++, etc... R's "plyr" style data munging at-scale, including ddply (Group-By for you SQL'rs) and much of R's expressive coding style. We built H2O, an open-source platform for working with in-memory distributed data. Then we built on top of H2O state-of-the-art predictive modeling and analytics (e.g. GLM & Logistic Regression, GBM, Random Forest, Neural Nets, PCA to name a few) that's 1000x faster than the disk-bound alternatives, and 100x faster than R (we love R but it's tooo slow on big data!). We can run R expressions on tera-scale datasets, or munge data from Scala & Python. We're building our newest algorithms in a few weeks, start to finish, because the platform makes Big Math easy. We routinely test on 100G datasets, have customers using 1T datasets. This talk is about the platform, coding style & API that lets us seamlessly deal with datasets from 1K to 1TB without changing a line of code, lets us use clusters ranging from your laptop to 100 server clusters with many many TB of ram and hundreds of CPUs.

Building a Big Data Machine Learning Platform from Cliff Click

]]> 634 2 https://cdn.slidesharecdn.com/ss_thumbnails/201409jzone-140915122341-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 https://public.slidesharecdn.com/v2/images/profile-picture.png