�ݺ�ߣ

2GP: A Two-Phase Graph Partitioner
Hadi Sinaee
Supervisor: Margo Seltzer

2GP: A Two-Phase Graph
Partitioner
Hadi Sinaee
Supervisor: Margo Seltzer

We Use Graphs To Model
Relationships
6

7
10 1
1
1
1
1
1
1
2
4
2
3
2
2 2
2
3
3
3
3
3
3
2
4
4
Degree
Freq.
Degree
Freq.

Common Algorithms We Use On
Graphs!
8

9
PageRan
k
*all images are taken from Wikipedia
Community Detection BFS & DFS
We Frequently Need to Traverse
Neighbors!

What is the problem?
11
- They may not fit into a single machine!
- Algorithms get slow!

12
Break Them Apart and Process in
Parallel!

Graph Sub
graph
Sub
graph
.
.
.
Sub
graph
Part-2
Part-3
Part-K
Sub
graph
Part-1
● A Graph
● K Units of Computation
14
.
.
.

How to Measure the Partitioning
Quality?
15

16
Balanced Partitions # Edge-Cuts
Communication Volume
(CV)
#Nodes Per Partition,
or
#Edges Per Partition
Edge-Cuts: 2
Edge-Cuts: 3
1
1
0
0
2
CV: 1 + 1 + 2 = 4
CV: 1 + 1 + 1 + 1 +2 =
6
1
1
1
1
2

17
There are different partitioning algorithms!
In-Memory v.s. Streaming

18
In-Memory Streaming
DRA
M
DRA
M
Input
Graph
1
1: Load
2
2: Partition
Input
Graph
2
2: Partition
1
1: Load
DRA
M
Low quality partitions!
Fast partitioning!
Low memory demand!
High quality partitions!
Slow partitioning!
High memory demand!

19
High
Low
Slow
Fast
In-Memory Partitioners
Streaming
Partitioners
Partition Quality
(# edge-cuts or communication volume)
Time

Can we combine the best of both worlds?
a partitioner with the partition quality of in-memory
and the speed of streaming
20

Streaming partitioners are fast but lack neighborhood information of
nodes!
21

22
As Much as We Can!
In-Memory Partitioner
Streaming Partitioner
Input Graph

2GP
24
Segment Graph

METIS
26
In-Memory
- Smaller graphs are easier to
partition
- It has three phases:
- Coarsening
- Partitioning
- Un-coarsening
METIS produces high quality partitions!
But it gets really slow and cannot handle
large graphs!

FENNEL
28
Streaming
- FENNEL is a vertex-based partitioner.
- Input format is an adjacency list and we place only the source vertex for each list.
- Calculate a score for each partition
- Scores are based on the number of available neighbors in each partition

29
2GP
METI
S
In-Memory
FENNEL
Streaming

Let’s pass the most information to streaming partitioner!
31
Let’s go with higher degrees!

Communication Volume (CV)
High-Degree Ordering
32
- See the effect of CV for different number of partitions!
- We choose Twitter graph with 41M nodes and 1.2B edges

Why Did It FAIL?
33
HDO tends to create highly connected graphs!
Partitioning highly connected graphs are harder!

Low-Degree Ordering (LDO)
34
Lesson 1:
LDO ordering is better than HDO ordering when using 2GP!

Observation!
35
There many nodes with less than 50% of their neighbours
partitioned!

36
Delay partitioning of any high-degree node that less than 50% of its neighbors are
partitioned!
(Average node degree in the graph) *
FIXED_FACTOR
FENNEL
- DelayPart
Streamin
g

Lowest-Degree Ordering with Delayed Partitioning
37
Lesson 2:
Delaying high-degree nodes for partitioning later makes CV
better!

38
2GP
METI
S
In-Memory
FENNEL-DelayPart
Streaming

Experiments
39
Environment
- 28 CPUs; 2 Threads per core; 14 Cores
- Intel(R) Xeon(R) W-2275 CPU @ 3.30GHz
- 130GB DRAM
Datasets
- Dblp-cite (12K, 49K)
- Dimacs9-USA (23M, 28M)
- Twitter (41M, 1.2B)
Baselines
- METIS (in-memory)
- FENNEL (streaming)
Metrics
Measuring the following across different #partitions:
- Communication Volume (CV)
- Partitioning Time

(8.4K,
6.8K)
41
Dblp-cite (12K, 49K)

44
Twitter (41M, 1.2B)
2GP produces better partitions than Fennel even
with a small % of input graph given to it!

48
Twitter (41M, 1.2B)
2GP is faster than Fennel for small % of input
graph!
The overhead comes from the repartition part which can be mitigated by adjusting
how we choose high-degree nodes.

- Won Quality
- Won Time
- Won Quality
- Won Time

52
Dimacs9-USA (23M, 28M) Twitter (41M, 1.2B)
1. There is a trade-off between CV and Time!
2. Skewed degree dist are harder to optimize
for!

Conclusion
53
- We can leverage the best-of-both worlds of in-memory
and streaming partitioners.
- 2GP can be used as the alternative to FENNEL when
partitioning large graphs
- Achieving up to 70% improvement for some
datasets.
- Delayed re-partitioning policy helps to produce better
communication volume.
- Lowest-degree ordering helps to partition higher degree
nodes better when using a streaming partitioner.
Collaboration

�ݺ�ߣ

Master Thesis - UBC.pptx

More Related Content

Master Thesis - UBC.pptx

Editor's Notes