際際滷

際際滷Share a Scribd company logo
Databricks Community Cloud
By: Robert Sanders
2Page:
Databricks Community Cloud
 Free/Paid Standalone Spark Cluster
 Online Notebook
 Python
 R
 Scala
 SQL
 Tutorials and Guides
 Shareable Notebooks
3Page:
Why is it useful?
 Learning about Spark
 Testing different versions of Spark
 Rapid Prototyping
 Data Analysis
 Saved Code
 Others
4Page:
Forums
https://forums.databricks.com/
5Page:
Login/Sign Up
https://community.cloud.databricks.com/login.html
6Page:
Home Page
7Page:
Active Clusters
8Page:
Create a Cluster - Steps
1. From the Active Clusters page, click the +
Create Cluster button
2. Fill in the cluster name
3. Select the version of Apache Spark
4. Click Create Cluster
5. Wait for the Cluster to start up and be in a
Running state
9Page:
Create a Cluster
10Page:
Active Clusters
11Page:
Active Clusters  Spark Cluster UI - Master
12Page:
Workspaces
13Page:
Create a Notebook - Steps
1. Right click within a Workspace and click Create ->
Notebook
2. Fill in the Name
3. Select the programming language
4. Select the running cluster youve created that you
want to attach to the Notebook
5. Click the Create button
14Page:
Create a Notebook
15Page:
Notebook
16Page:
Using the Notebook
17Page:
Using the Notebook  Code Snippets
> sc
> sc.parallelize(1 to 5).collect()
18Page:
Using the Notebook - Shortcuts
Short Cut Action
Shift + Enter Run Selected Cell and Move to next
Cell
Ctrl + Enter Run Selected Cell
Option + Enter Run Selected Cell and Insert Cell
Bellow
Ctrl + Alt + P Create Cell Above Current Cell
Ctrl + Alt + N Create Cell Bellow Selected Cell
19Page:
Tables
20Page:
Create a Table - Steps
1. From the Tables section, click + Create Table
2. Select the Data Source (bellow steps assume youre using
File as the Data Source)
3. Upload a file from your local file system
1. Supported file types: CSV, JSON, Avro, Parquet
4. Click Preview Table
5. Fill in the Table Name
6. Select the File Type and other Options depending on the File
Type
7. Change Column Names and Types as desired
8. Click Create Table
21Page:
Create a Table  Upload File
22Page:
Create a Table  Configure Table
23Page:
Create a Table  Review Table
24Page:
Notebook  Access Table
25Page:
Notebook  Access Table  Code Snippets
> sqlContext
> sqlContext.sql("show tables").collect()
> val got = sqlContext.sql("select * from
got")
> got.limit(10).collect()
26Page:
Notebook  Display
27Page:
Notebook  Data Cleaning for Charting
28Page:
Notebook  Plot Options
29Page:
Notebook  Charting
30Page:
Notebook  Display and Charting  Code Snippets
> filter(got)
> val got = sqlContext.sql("select * from got")
> got.limit(10).collect()
> import org.apache.spark.sql.functions._
> val allegiancesCleanupUDF = udf[String, String]
(_.toLowerCase().replace("house ", ""))
> val isDeathUDF = udf{ deathYear: Integer => if(deathYear != null) 1 else 0}
> val gotCleaned = got.filter("Allegiances !=
"None"").withColumn("Allegiances",
allegiancesCleanupUDF($"Allegiances")).withColumn("isDeath",
isDeathUDF($"Death Year"))
> display(gotCleaned)
31Page:
Publish Notebook - Steps
1. While in a Notebook, click Publish on the top
right
2. Click Publish on the pop up
3. Copy the link and send it out
32Page:
Publish Notebook

More Related Content

Databricks Community Cloud Overview