5. What is
? The most popular statistical programming language
? A data visualization tool
? Open source
? 3+ Million users
? Taught in most universities
? Thriving user groups worldwide
? 9000+ contributed packages
? New and recent grad¡¯s use it
Language
Platform
Community
Ecosystem
? Rich application & platform integration
7. ? Any code/package that works today with R will work in R Server.
? Ideal for parameter sweeps, simulation, scoring.
? Transformations: rxDataStep(), Statistics: rxChiSquaredTest(), Algorithms: rxLinMod(), Parallelism: rxSetComputeContext()
9. ? Provisions Azure
compute resources with
Spark installed and
configured.
? Data is stored in Azure
Blob storage (wasb://) or
Azure Data Lake Store
(adl://)
10. R
R Server
Data in Distributed Storage
R process on Edge Node
HDInsight Gateway
RStudio
11. R R R R R
R R R R R
R Server
Master R process on Edge Node
Apache YARN and Spark
Worker R processes on Data Nodes
Data in Distributed Storage
R process on Edge Node
HDInsight Gateway
RStudio
14. R server (single thread on local) R Server on HDInsight (4 nodes)
471 sec 144 sec (-70%)