ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Scalable Machine Learning using R and Azure HDInsight - Parashar
Scalable Machine Learning using R and Azure HDInsight - Parashar
You can find more at https://gallery.cortanaintelligence.com/experiments
OperationalizeModelPrepare
Details - https://azure.microsoft.com/en-us/documentation/learning-paths/data-science-process/
What is
? The most popular statistical programming language
? A data visualization tool
? Open source
? 3+ Million users
? Taught in most universities
? Thriving user groups worldwide
? 9000+ contributed packages
? New and recent grad¡¯s use it
Language
Platform
Community
Ecosystem
? Rich application & platform integration
? In-Memory Operation
? Lack of Parallelism
? Lack of Guaranteed Support / No SLA
? Any code/package that works today with R will work in R Server.
? Ideal for parameter sweeps, simulation, scoring.
? Transformations: rxDataStep(), Statistics: rxChiSquaredTest(), Algorithms: rxLinMod(), Parallelism: rxSetComputeContext()
Scalable Machine Learning using R and Azure HDInsight - Parashar
? Provisions Azure
compute resources with
Spark installed and
configured.
? Data is stored in Azure
Blob storage (wasb://) or
Azure Data Lake Store
(adl://)
R
R Server
Data in Distributed Storage
R process on Edge Node
HDInsight Gateway
RStudio
R R R R R
R R R R R
R Server
Master R process on Edge Node
Apache YARN and Spark
Worker R processes on Data Nodes
Data in Distributed Storage
R process on Edge Node
HDInsight Gateway
RStudio
Scalable Machine Learning using R and Azure HDInsight - Parashar
Scalable Machine Learning using R and Azure HDInsight - Parashar
R server (single thread on local) R Server on HDInsight (4 nodes)
471 sec 144 sec (-70%)
https://www.microsoft.com/en-us/cloud-platform/r-server
https://azure.microsoft.com/en-us/documentation/learning-paths/data-science-process/
https://gallery.cortanaintelligence.com/experiments
https://www.visualstudio.com/vs/rtvs/
https://blogs.technet.microsoft.com/dataplatforminsider/2017/04/19/introducing-microsoft-r-server-9-1-release/
https://blogs.technet.microsoft.com/machinelearning/2016/12/07/introducing-microsoft-r-server-9-0/
https://blogs.msdn.microsoft.com/rserver/2017/04/19/microsoft-ml-on-spark-and-hadoop/
https://msdn.microsoft.com/en-us/microsoft-r/scaler/packagehelp/rxcomputecontext
https://blogs.msdn.microsoft.com/microsoftrservertigerteam/2017/04/19/new-features-in-9-1-microsoft-r-server-
with-sparklyr-interoperability/
https://msdn.microsoft.com/en-us/microsoft-r/scaler-spark-getting-started
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-r-server-get-started
Scalable Machine Learning using R and Azure HDInsight - Parashar

More Related Content

Scalable Machine Learning using R and Azure HDInsight - Parashar

Editor's Notes

  • #15: Azpowerhour0420(A!,rg)