An overview on the Databricks Community Cloud platform offered by Databricks at: https://community.cloud.databricks.com/
Provides step by step instructions on how to create a Spark Standalone Cluster and how to use notebooks.
8. 8Page:
Create a Cluster - Steps
1. From the Active Clusters page, click the +
Create Cluster button
2. Fill in the cluster name
3. Select the version of Apache Spark
4. Click Create Cluster
5. Wait for the Cluster to start up and be in a
Running state
13. 13Page:
Create a Notebook - Steps
1. Right click within a Workspace and click Create ->
Notebook
2. Fill in the Name
3. Select the programming language
4. Select the running cluster youve created that you
want to attach to the Notebook
5. Click the Create button
18. 18Page:
Using the Notebook - Shortcuts
Short Cut Action
Shift + Enter Run Selected Cell and Move to next
Cell
Ctrl + Enter Run Selected Cell
Option + Enter Run Selected Cell and Insert Cell
Bellow
Ctrl + Alt + P Create Cell Above Current Cell
Ctrl + Alt + N Create Cell Bellow Selected Cell
20. 20Page:
Create a Table - Steps
1. From the Tables section, click + Create Table
2. Select the Data Source (bellow steps assume youre using
File as the Data Source)
3. Upload a file from your local file system
1. Supported file types: CSV, JSON, Avro, Parquet
4. Click Preview Table
5. Fill in the Table Name
6. Select the File Type and other Options depending on the File
Type
7. Change Column Names and Types as desired
8. Click Create Table
30. 30Page:
Notebook Display and Charting Code Snippets
> filter(got)
> val got = sqlContext.sql("select * from got")
> got.limit(10).collect()
> import org.apache.spark.sql.functions._
> val allegiancesCleanupUDF = udf[String, String]
(_.toLowerCase().replace("house ", ""))
> val isDeathUDF = udf{ deathYear: Integer => if(deathYear != null) 1 else 0}
> val gotCleaned = got.filter("Allegiances !=
"None"").withColumn("Allegiances",
allegiancesCleanupUDF($"Allegiances")).withColumn("isDeath",
isDeathUDF($"Death Year"))
> display(gotCleaned)
31. 31Page:
Publish Notebook - Steps
1. While in a Notebook, click Publish on the top
right
2. Click Publish on the pop up
3. Copy the link and send it out