In this course students learned what the expected output of Data Scientist is and how they can use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments included Log Mining, Textual Entity Recognition, Collaborative Filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.
1 of 1
Download to read offline
More Related Content
Introduction to big data with apache spark
1. Professor in Electrical Engineering and Computer Science
University of California, Berkeley
Technical Advisor
Databricks
Anthony D. Joseph
HONOR CODE CERTIFICATE Verify the authenticity of this certificate at
Berkeley
CERTIFICATE
HONOR CODE
Vassilios Rendoumis
successfully completed and received a passing grade in
CS100.1x: Introduction to Big Data with Apache Spark
a course of study offered by BerkeleyX, an online learning
initiative of The University of California, Berkeley through edX.
Issued July 10, 2015 https://verify.edx.org/cert/dee8ce780ec24912930188e8bb5f2982