Distributed Data Analysis with Hadoop and R - Strangeloop 2011Jonathan Seidman
油
This document describes a talk on interfacing Hadoop and R for distributed data analysis. It introduces Hadoop and R, discusses options for running R on Hadoop's distributed platform including the authors' prototypes, and provides an example use case of analyzing airline on-time performance data using Hadoop Streaming and R code. The authors are data engineers from Orbitz who have built prototypes for user segmentation and analyzing airline and hotel booking data on Hadoop using R.
Why R? A Brief Introduction to the Open Source Statistics PlatformSyracuse University
油
This document discusses the statistical programming language R. It describes R as an open source platform for statistics, data management, and graphics. It notes that R comprises a core program plus thousands of add-in packages. It then compares R to other popular statistical software packages and notes that R is more popular and used by more analysts. Finally, it highlights some advantages of R, including its emphasis on reproducibility through coding data transformations.
This hands-on R course will guide users through a variety of programming functions in the open-source statistical software program, R. Topics covered include indexing, loops, conditional branching, S3 classes, and debugging. Full workshop materials available from http://projects.iq.harvard.edu/rtc/r-prog
The Agenda for the Webinar:
1. Introduction to Python.
2. Python and Big Data.
3. Python and Data Science.
4. Key features of Python and their usage in Business Analytics.
5. Business Analytics with Python Real world Use Cases.