際際滷

際際滷Share a Scribd company logo
/ 20
Spark GraphFrames のススメ
ビッグデ`タ何 紗忠LT
2016定3埖23晩 祇傲梳LT疾り
/ 20
徭失B初
? 紗 LT
? 2014定4埖゛ DMM.comラボ
? Hadoop児PB
? Spark MLlib, GraphX, spark.ml, GraphFrames を喘いたレコメンド_k
? 挫きな冱Z
? SQL
? Cypher
2
/ 20
GraphFramesとは
? GraphFrames
? http://graphframes.github.io/
? 蛍柊グラフI尖のための Apache Spark パッケ`ジ
? Spark GraphX と DataFrames (SparkSQL) のy栽
? Databricksが2016定3埖3晩にリリ`ス
3
/ 20
なぜGraphFramesか
4
伏b來I尖のきやすさ
スケ`ラビリティ
GraphFrames
☆ 繁の湖襪任
グラフDB/グラフI尖狼u瞳
/ 20
GraphFramesのメリット
? 互レイヤのAPI
? 方佩の峰でグラフの蛍柊I尖がgFできる
? グラフデ`タのBが否叟
? RDBやDataFramesなどのテ`ブル侘塀のデ`タから
返Xにグラフ夛のデ`タを恬撹できる
? ブル`オ`シャン
5
https://www.google.co.jp/search?q=graphframes&ie=utf-8&oe=utf-8&hl=ja (2016.3.23F壓)
/ 20
GraphFramesをす
? Sparkと揖、Scala, Java, Python, R鬚韻APIを聞喘辛嬬
? Spark Shell でインタラクティブにす
? Spark 1.4參貧に
? DataFramesの旋泣を試かすなら恷仟井を容X
6
# spark をダウンロ`ド
$ wget http://ftp.jaist.ac.jp/pub/apache/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz
$ tar xzvf spark-1.6.0-bin-hadoop2.6.tgz
# graphframesパッケ`ジを峺協してspark-shellを軟
$ spark-1.6.0-bin-hadoop2.6/bin/spark-shell --packages graphframes:graphframes:0.1.0-spark1.6
/ 20
GraphFrames C グラフの恬撹
7
// graphframesパッケ`ジのインポ`ト
scala> import org.graphframes._
import org.graphframes._
// Vertex泣となるDataFrameを恬撹
scala> val v = sqlContext.createDataFrame(List(
| (0L, "user", "u1"),
| (1L, "user", "u2"),
| (2L, "item", "i1"),
| (3L, "item", "i2"),
| (4L, "item", "i3"),
| (5L, "item", "i4")
| )).toDF("id", "type", "name")
v: org.apache.spark.sql.DataFrame = [id: bigint, type: string, name: string]
u1
u2
ユ`ザ
i1
i2
i3
i4
アイテム
/ 20
GraphFrames C グラフの恬撹
8
// EdgexとなるDataFrameを恬撹
scala> val e = sqlContext.createDataFrame(List(
| (0L, 2L, "purchase"),
| (0L, 3L, "purchase"),
| (0L, 4L, "purchase"),
| (1L, 3L, "purchase"),
| (1L, 4L, "purchase"),
| (1L, 5L, "purchase")
| )).toDF("src", "dst", "type")
e: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint, type: string]
// GraphFrameを恬撹
scala> val g = GraphFrame(v, e)
g: org.graphframes.GraphFrame = GraphFrame(v:[id: bigint, attr: string, gender: string],
e:[src: bigint, dst: bigint, relationship: string])
u1
u2
i1
i2
i3
i4
澓ログ
/ 20
GraphFrames C アイテムレコメンドのg佩箭
9
// レコメンドアイテムのい栽わせ箭
scala> g.find(
| " (a)-[]->(x); (b)-[]->(x);" +
| " (b)-[]->(y); !(a)-[]->(y)"
| ).groupBy(
| "a.name", "y.name"
| ).count().show()
+----+----+-----+
|name|name|count|
+----+----+-----+
| u1| i4| 2|
| u2| i1| 2|
+----+----+-----+
u1
u2
i1
i2
i3
i4
慌宥の斌瞳を
澓したユ`ザ
まだ澓していないアイテムをレコメンド
(b)
(y)
(a)
(x)
/ 20
GraphFrames C サンプルグラフの旋喘 (1)
10
// スタ`グラフを恬撹
scala> val star = examples.Graphs.star(5)
// Tripletsを燕幣
scala> star.triplets.show()
+------------+----------+--------+
| edge| src| dst|
+------------+----------+--------+
|[1,0,edge-1]|[1,node-1]|[0,root]|
|[2,0,edge-2]|[2,node-2]|[0,root]|
|[3,0,edge-3]|[3,node-3]|[0,root]|
|[4,0,edge-4]|[4,node-4]|[0,root]|
|[5,0,edge-5]|[5,node-5]|[0,root]|
+------------+----------+--------+
0
1
2
3 4
5
node-1
node-2
node-3 node-4
node-5
root
edge-1
edge-2
edge-3 edge-4
edge-5
/ 20
GraphFrames C PageRankのg佩箭
11
// PageRankを麻
scala> val pr = g.pageRank.resetProbability(0.1).tol(0.01).run()
// PageRankのスコアを燕幣
scala> pr.vertices.show()
+---+-------+--------+
| id|v_attr1|pagerank|
+---+-------+--------+
| 0| root| 0.55|
| 1| node-1| 0.1|
| 2| node-2| 0.1|
| 3| node-3| 0.1|
| 4| node-4| 0.1|
| 5| node-5| 0.1|
+---+-------+--------+
0
1
2
3 4
5
0.1 0.1
0.1
0.1 0.1
0.55
/ 20
GraphFrames C サンプルグラフの旋喘 (2)
12
// 嗔_vSのサンプルグラフを恬撹
scala> val friends = examples.Graphs.friends
// Tripletsを燕幣
scala> friends.triplets.show()
+------------+--------------+--------------+
| edge| src| dst|
+------------+--------------+--------------+
|[a,b,friend]| [a,Alice,34]| [b,Bob,36]|
|[b,c,follow]| [b,Bob,36]|[c,Charlie,30]|
|[c,b,follow]|[c,Charlie,30]| [b,Bob,36]|
|[f,c,follow]| [f,Fanny,36]|[c,Charlie,30]|
|[e,f,follow]| [e,Esther,32]| [f,Fanny,36]|
|[e,d,friend]| [e,Esther,32]| [d,David,29]|
|[d,a,friend]| [d,David,29]| [a,Alice,34]|
|[a,e,friend]| [a,Alice,34]| [e,Esther,32]|
+------------+--------------+--------------+
a
b
c
de
f
g
Alice, 34
Bob, 36
Charlie, 30
Fanny, 36
Esther, 32 David, 29
Gabby, 60
friend
friend
friend
follow
follow
follow
friend
/ 20
a
GraphFrames C 恷玉鉦xを麻
13
// すべてのユ`ザからユ`ザ ^a ̄ までの恷玉鉦xを麻
scala> val d1 = friends.shortestPaths.landmarks(Seq("a")).run()
// Y惚を燕幣
scala> d1.show()
+---+-------+---+-----------+
| id| name|age| distances|
+---+-------+---+-----------+
| f| Fanny| 36| Map()|
| g| Gabby| 60| Map()|
| a| Alice| 34|Map(a -> 0)|
| b| Bob| 36| Map()|
| c|Charlie| 30| Map()|
| d| David| 29|Map(a -> 1)|
| e| Esther| 32|Map(a -> 2)|
+---+-------+---+-----------+
a
b
c
de
f
a -> 0
g
a -> 2 a -> 1
/ 20
a
c
GraphFrames C 恷玉鉦xを麻
14
// すべてのユ`ザからユ`ザ ^a ̄, ^c ̄ までの恷玉鉦xを麻
scala> val d2 = friends.shortestPaths.landmarks(Seq("a", "c")).run()
// Y惚を燕幣
scala> d2.show()
+---+-------+---+-------------------+
| id| name|age| distances|
+---+-------+---+-------------------+
| f| Fanny| 36| Map(c -> 1)|
| g| Gabby| 60| Map()|
| a| Alice| 34|Map(a -> 0, c -> 2)|
| b| Bob| 36| Map(c -> 1)|
| c|Charlie| 30| Map(c -> 0)|
| d| David| 29|Map(a -> 1, c -> 3)|
| e| Esther| 32|Map(a -> 2, c -> 2)|
+---+-------+---+-------------------+
a
b
c
de
f
g
a -> 0
c -> 2
a -> 2
c -> 2
a -> 1
c -> 3
c -> 0
c -> 1
c -> 1
/ 20
a
b
c
d
GraphFrames C 恷玉U揃の冥沫
15
// ユ`ザ ^d ̄から^c ̄ への恷玉U揃を冥沫
scala> val path = friends.bfs.fromExpr("id = 'd'").toExpr("id = 'c'").run()
// Y惚を燕幣
scala> path.show()
+------------+------------+------------+
| from| e0| v1|
+------------+------------+------------+
|[d,David,29]|[d,a,friend]|[a,Alice,34]|
+------------+------------+------------+
+------------+----------+------------+--------------+
| e1| v2| e2| to|
+------------+----------+------------+--------------+
|[a,b,friend]|[b,Bob,36]|[b,c,follow]|[c,Charlie,30]|
+------------+----------+------------+--------------+
a
b
c
de
f
g
Alice, 34
Bob, 36
Charlie, 30
David, 29
friend
follow
friend
/ 20
GraphFrames C その麿のC嬬
? GraphFrames User Guide
16
http://graphframes.github.io/user-guide.html
/ 20
GraphFrames C ユ`スケ`ス
? On-Time Flight Performance with GraphFrames for Apache Spark
17
https://databricks.com/blog/2016/03/16/on-time-flight-performance-with-spark-graphframes.html
/ 20
GraphFrames vs. Neo4j
18
哈喘http://www.slideshare.net/SparkSummit/graphframes-graph-queries-in-spark-sql-by-ankur-dave
/ 20
GraphFrames 〜 Spark 2.0
19
哈喘 http://www.slideshare.net/databricks/2016-spark-summit-east-keynote-matei-zaharia
/ 20
GraphFramesまとめ
? 互レイヤの蛍柊グラフI尖API
? 互い伏b來
? 互堀な蛍柊グラフI尖
? 書埖リリ`スされたばかり
? まだまだC嬬や秤鵑鷲戮覆
? 書瘁のk婢や試喘に豚棋
20

More Related Content

Spark GraphFrames のススメ

  • 1. / 20 Spark GraphFrames のススメ ビッグデ`タ何 紗忠LT 2016定3埖23晩 祇傲梳LT疾り
  • 2. / 20 徭失B初 ? 紗 LT ? 2014定4埖゛ DMM.comラボ ? Hadoop児PB ? Spark MLlib, GraphX, spark.ml, GraphFrames を喘いたレコメンド_k ? 挫きな冱Z ? SQL ? Cypher 2
  • 3. / 20 GraphFramesとは ? GraphFrames ? http://graphframes.github.io/ ? 蛍柊グラフI尖のための Apache Spark パッケ`ジ ? Spark GraphX と DataFrames (SparkSQL) のy栽 ? Databricksが2016定3埖3晩にリリ`ス 3
  • 5. / 20 GraphFramesのメリット ? 互レイヤのAPI ? 方佩の峰でグラフの蛍柊I尖がgFできる ? グラフデ`タのBが否叟 ? RDBやDataFramesなどのテ`ブル侘塀のデ`タから 返Xにグラフ夛のデ`タを恬撹できる ? ブル`オ`シャン 5 https://www.google.co.jp/search?q=graphframes&ie=utf-8&oe=utf-8&hl=ja (2016.3.23F壓)
  • 6. / 20 GraphFramesをす ? Sparkと揖、Scala, Java, Python, R鬚韻APIを聞喘辛嬬 ? Spark Shell でインタラクティブにす ? Spark 1.4參貧に ? DataFramesの旋泣を試かすなら恷仟井を容X 6 # spark をダウンロ`ド $ wget http://ftp.jaist.ac.jp/pub/apache/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz $ tar xzvf spark-1.6.0-bin-hadoop2.6.tgz # graphframesパッケ`ジを峺協してspark-shellを軟 $ spark-1.6.0-bin-hadoop2.6/bin/spark-shell --packages graphframes:graphframes:0.1.0-spark1.6
  • 7. / 20 GraphFrames C グラフの恬撹 7 // graphframesパッケ`ジのインポ`ト scala> import org.graphframes._ import org.graphframes._ // Vertex泣となるDataFrameを恬撹 scala> val v = sqlContext.createDataFrame(List( | (0L, "user", "u1"), | (1L, "user", "u2"), | (2L, "item", "i1"), | (3L, "item", "i2"), | (4L, "item", "i3"), | (5L, "item", "i4") | )).toDF("id", "type", "name") v: org.apache.spark.sql.DataFrame = [id: bigint, type: string, name: string] u1 u2 ユ`ザ i1 i2 i3 i4 アイテム
  • 8. / 20 GraphFrames C グラフの恬撹 8 // EdgexとなるDataFrameを恬撹 scala> val e = sqlContext.createDataFrame(List( | (0L, 2L, "purchase"), | (0L, 3L, "purchase"), | (0L, 4L, "purchase"), | (1L, 3L, "purchase"), | (1L, 4L, "purchase"), | (1L, 5L, "purchase") | )).toDF("src", "dst", "type") e: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint, type: string] // GraphFrameを恬撹 scala> val g = GraphFrame(v, e) g: org.graphframes.GraphFrame = GraphFrame(v:[id: bigint, attr: string, gender: string], e:[src: bigint, dst: bigint, relationship: string]) u1 u2 i1 i2 i3 i4 澓ログ
  • 9. / 20 GraphFrames C アイテムレコメンドのg佩箭 9 // レコメンドアイテムのい栽わせ箭 scala> g.find( | " (a)-[]->(x); (b)-[]->(x);" + | " (b)-[]->(y); !(a)-[]->(y)" | ).groupBy( | "a.name", "y.name" | ).count().show() +----+----+-----+ |name|name|count| +----+----+-----+ | u1| i4| 2| | u2| i1| 2| +----+----+-----+ u1 u2 i1 i2 i3 i4 慌宥の斌瞳を 澓したユ`ザ まだ澓していないアイテムをレコメンド (b) (y) (a) (x)
  • 10. / 20 GraphFrames C サンプルグラフの旋喘 (1) 10 // スタ`グラフを恬撹 scala> val star = examples.Graphs.star(5) // Tripletsを燕幣 scala> star.triplets.show() +------------+----------+--------+ | edge| src| dst| +------------+----------+--------+ |[1,0,edge-1]|[1,node-1]|[0,root]| |[2,0,edge-2]|[2,node-2]|[0,root]| |[3,0,edge-3]|[3,node-3]|[0,root]| |[4,0,edge-4]|[4,node-4]|[0,root]| |[5,0,edge-5]|[5,node-5]|[0,root]| +------------+----------+--------+ 0 1 2 3 4 5 node-1 node-2 node-3 node-4 node-5 root edge-1 edge-2 edge-3 edge-4 edge-5
  • 11. / 20 GraphFrames C PageRankのg佩箭 11 // PageRankを麻 scala> val pr = g.pageRank.resetProbability(0.1).tol(0.01).run() // PageRankのスコアを燕幣 scala> pr.vertices.show() +---+-------+--------+ | id|v_attr1|pagerank| +---+-------+--------+ | 0| root| 0.55| | 1| node-1| 0.1| | 2| node-2| 0.1| | 3| node-3| 0.1| | 4| node-4| 0.1| | 5| node-5| 0.1| +---+-------+--------+ 0 1 2 3 4 5 0.1 0.1 0.1 0.1 0.1 0.55
  • 12. / 20 GraphFrames C サンプルグラフの旋喘 (2) 12 // 嗔_vSのサンプルグラフを恬撹 scala> val friends = examples.Graphs.friends // Tripletsを燕幣 scala> friends.triplets.show() +------------+--------------+--------------+ | edge| src| dst| +------------+--------------+--------------+ |[a,b,friend]| [a,Alice,34]| [b,Bob,36]| |[b,c,follow]| [b,Bob,36]|[c,Charlie,30]| |[c,b,follow]|[c,Charlie,30]| [b,Bob,36]| |[f,c,follow]| [f,Fanny,36]|[c,Charlie,30]| |[e,f,follow]| [e,Esther,32]| [f,Fanny,36]| |[e,d,friend]| [e,Esther,32]| [d,David,29]| |[d,a,friend]| [d,David,29]| [a,Alice,34]| |[a,e,friend]| [a,Alice,34]| [e,Esther,32]| +------------+--------------+--------------+ a b c de f g Alice, 34 Bob, 36 Charlie, 30 Fanny, 36 Esther, 32 David, 29 Gabby, 60 friend friend friend follow follow follow friend
  • 13. / 20 a GraphFrames C 恷玉鉦xを麻 13 // すべてのユ`ザからユ`ザ ^a ̄ までの恷玉鉦xを麻 scala> val d1 = friends.shortestPaths.landmarks(Seq("a")).run() // Y惚を燕幣 scala> d1.show() +---+-------+---+-----------+ | id| name|age| distances| +---+-------+---+-----------+ | f| Fanny| 36| Map()| | g| Gabby| 60| Map()| | a| Alice| 34|Map(a -> 0)| | b| Bob| 36| Map()| | c|Charlie| 30| Map()| | d| David| 29|Map(a -> 1)| | e| Esther| 32|Map(a -> 2)| +---+-------+---+-----------+ a b c de f a -> 0 g a -> 2 a -> 1
  • 14. / 20 a c GraphFrames C 恷玉鉦xを麻 14 // すべてのユ`ザからユ`ザ ^a ̄, ^c ̄ までの恷玉鉦xを麻 scala> val d2 = friends.shortestPaths.landmarks(Seq("a", "c")).run() // Y惚を燕幣 scala> d2.show() +---+-------+---+-------------------+ | id| name|age| distances| +---+-------+---+-------------------+ | f| Fanny| 36| Map(c -> 1)| | g| Gabby| 60| Map()| | a| Alice| 34|Map(a -> 0, c -> 2)| | b| Bob| 36| Map(c -> 1)| | c|Charlie| 30| Map(c -> 0)| | d| David| 29|Map(a -> 1, c -> 3)| | e| Esther| 32|Map(a -> 2, c -> 2)| +---+-------+---+-------------------+ a b c de f g a -> 0 c -> 2 a -> 2 c -> 2 a -> 1 c -> 3 c -> 0 c -> 1 c -> 1
  • 15. / 20 a b c d GraphFrames C 恷玉U揃の冥沫 15 // ユ`ザ ^d ̄から^c ̄ への恷玉U揃を冥沫 scala> val path = friends.bfs.fromExpr("id = 'd'").toExpr("id = 'c'").run() // Y惚を燕幣 scala> path.show() +------------+------------+------------+ | from| e0| v1| +------------+------------+------------+ |[d,David,29]|[d,a,friend]|[a,Alice,34]| +------------+------------+------------+ +------------+----------+------------+--------------+ | e1| v2| e2| to| +------------+----------+------------+--------------+ |[a,b,friend]|[b,Bob,36]|[b,c,follow]|[c,Charlie,30]| +------------+----------+------------+--------------+ a b c de f g Alice, 34 Bob, 36 Charlie, 30 David, 29 friend follow friend
  • 16. / 20 GraphFrames C その麿のC嬬 ? GraphFrames User Guide 16 http://graphframes.github.io/user-guide.html
  • 17. / 20 GraphFrames C ユ`スケ`ス ? On-Time Flight Performance with GraphFrames for Apache Spark 17 https://databricks.com/blog/2016/03/16/on-time-flight-performance-with-spark-graphframes.html
  • 18. / 20 GraphFrames vs. Neo4j 18 哈喘http://www.slideshare.net/SparkSummit/graphframes-graph-queries-in-spark-sql-by-ankur-dave
  • 19. / 20 GraphFrames 〜 Spark 2.0 19 哈喘 http://www.slideshare.net/databricks/2016-spark-summit-east-keynote-matei-zaharia
  • 20. / 20 GraphFramesまとめ ? 互レイヤの蛍柊グラフI尖API ? 互い伏b來 ? 互堀な蛍柊グラフI尖 ? 書埖リリ`スされたばかり ? まだまだC嬬や秤鵑鷲戮覆 ? 書瘁のk婢や試喘に豚棋 20