Graph partitioning is a crucial problem for processing and analyzing large graphs. The two main classes of graph partitioners are in-memory partitioners and streaming partitioners. In-memory partitioners require that the entire graph be read into memory before being partitioned into some number, $k$, of partitions. Conversely, a streaming partitioner reads the graph one edge at a time and immediately places the source and destination vertices in a partition, unless they have already been placed. While in-memory partitioners produce higher-quality partitions, they require a large amount of memory and are slow, which makes them less practical for larger graphs. Streaming partitioners can partition large graphs quickly. However, since they lack more information about the overall graph, their placement decisions are not as good as in-memory partitioners, leading to lower-quality partitions.
We implemented a two-phase partitioning algorithm (\texttt{2GP}) that combines the best of both worlds. \texttt{2GP} has two phases: first, it buffers a specific amount of an input graph and partitions it using an in-memory partitioner. Then, it partitions the remaining graph using a streaming partitioner. \texttt{2GP} achieves better partition quality than a state-of-the-art streaming partitioner with significantly better performance. We then show that re-ordering the input graph can improve partition quality still further, without taking significantly more time.
1 of 53
Download to read offline
More Related Content
Master Thesis - UBC.pptx
1. 2GP: A Two-Phase Graph Partitioner
Hadi Sinaee
Supervisor: Margo Seltzer
2. 2GP: A Two-Phase Graph Partitioner
Hadi Sinaee
Supervisor: Margo Seltzer
3. 2GP: A Two-Phase Graph Partitioner
Hadi Sinaee
Supervisor: Margo Seltzer
4. 2GP: A Two-Phase Graph
Partitioner
Hadi Sinaee
Supervisor: Margo Seltzer
5. 2GP: A Two-Phase Graph Partitioner
Hadi Sinaee
Supervisor: Margo Seltzer
26. METIS
26
In-Memory
- Smaller graphs are easier to
partition
- It has three phases:
- Coarsening
- Partitioning
- Un-coarsening
METIS produces high quality partitions!
But it gets really slow and cannot handle
large graphs!
28. FENNEL
28
Streaming
- FENNEL is a vertex-based partitioner.
- Input format is an adjacency list and we place only the source vertex for each list.
- Calculate a score for each partition
- Scores are based on the number of available neighbors in each partition
31. Lets pass the most information to streaming partitioner!
31
Lets go with higher degrees!
32. Communication Volume (CV)
High-Degree Ordering
32
- See the effect of CV for different number of partitions!
- We choose Twitter graph with 41M nodes and 1.2B edges
33. Why Did It FAIL?
33
HDO tends to create highly connected graphs!
Partitioning highly connected graphs are harder!
36. 36
Delay partitioning of any high-degree node that less than 50% of its neighbors are
partitioned!
(Average node degree in the graph) *
FIXED_FACTOR
FENNEL
- DelayPart
Streamin
g
48. 48
Twitter (41M, 1.2B)
2GP is faster than Fennel for small % of input
graph!
The overhead comes from the repartition part which can be mitigated by adjusting
how we choose high-degree nodes.
52. 52
Dimacs9-USA (23M, 28M) Twitter (41M, 1.2B)
1. There is a trade-off between CV and Time!
2. Skewed degree dist are harder to optimize
for!
53. Conclusion
53
- We can leverage the best-of-both worlds of in-memory
and streaming partitioners.
- 2GP can be used as the alternative to FENNEL when
partitioning large graphs
- Achieving up to 70% improvement for some
datasets.
- Delayed re-partitioning policy helps to produce better
communication volume.
- Lowest-degree ordering helps to partition higher degree
nodes better when using a streaming partitioner.
Collaboration
Editor's Notes
#2: OK,
Thanks for coming to todays talk!
I want to present Applic . It was published last yeat at SIGMOD. I find it interesting bec I sort of had a similar idea which these guys implemented!
Well, for this talk, first ill start with the graph partitioning itself
I briefly talk about How people usually do partitioning? And how we decide which partitioning strategy is a good one?
After that, Ill add a missing part to this scenario which is the application: How having a knowledge about the application could change partitioning?
#3: OK,
Thanks for coming to todays talk!
I want to present Applic . It was published last yeat at SIGMOD. I find it interesting bec I sort of had a similar idea which these guys implemented!
Well, for this talk, first ill start with the graph partitioning itself
I briefly talk about How people usually do partitioning? And how we decide which partitioning strategy is a good one?
After that, Ill add a missing part to this scenario which is the application: How having a knowledge about the application could change partitioning?
#4: OK,
Thanks for coming to todays talk!
I want to present Applic . It was published last yeat at SIGMOD. I find it interesting bec I sort of had a similar idea which these guys implemented!
Well, for this talk, first ill start with the graph partitioning itself
I briefly talk about How people usually do partitioning? And how we decide which partitioning strategy is a good one?
After that, Ill add a missing part to this scenario which is the application: How having a knowledge about the application could change partitioning?
#5: OK,
Thanks for coming to todays talk!
I want to present Applic . It was published last yeat at SIGMOD. I find it interesting bec I sort of had a similar idea which these guys implemented!
Well, for this talk, first ill start with the graph partitioning itself
I briefly talk about How people usually do partitioning? And how we decide which partitioning strategy is a good one?
After that, Ill add a missing part to this scenario which is the application: How having a knowledge about the application could change partitioning?
#6: OK,
Thanks for coming to todays talk!
I want to present Applic . It was published last yeat at SIGMOD. I find it interesting bec I sort of had a similar idea which these guys implemented!
Well, for this talk, first ill start with the graph partitioning itself
I briefly talk about How people usually do partitioning? And how we decide which partitioning strategy is a good one?
After that, Ill add a missing part to this scenario which is the application: How having a knowledge about the application could change partitioning?
#14: OK,
Thanks for coming to todays talk!
I want to present Applic . It was published last yeat at SIGMOD. I find it interesting bec I sort of had a similar idea which these guys implemented!
Well, for this talk, first ill start with the graph partitioning itself
I briefly talk about How people usually do partitioning? And how we decide which partitioning strategy is a good one?
After that, Ill add a missing part to this scenario which is the application: How having a knowledge about the application could change partitioning?
#15: Usually we have a graph and an environment which has k units of computation. The goal here is to utilize all these resources. Why? Besides the fact weve paid for that, we want to increase the parallelism!
So, we take a graph and create a set of sub-graphs and call them partition. Next, we run our algorithm on each them.
For example, we could assign each of theses partitions to a cpu to run our algorithm.
Or, we could assign it to different nodes in a cluster network!
Now! The questions here are that how do we do the partitionin? And which partitioning is considered a good one!
#24: OK,
Thanks for coming to todays talk!
I want to present Applic . It was published last yeat at SIGMOD. I find it interesting bec I sort of had a similar idea which these guys implemented!
Well, for this talk, first ill start with the graph partitioning itself
I briefly talk about How people usually do partitioning? And how we decide which partitioning strategy is a good one?
After that, Ill add a missing part to this scenario which is the application: How having a knowledge about the application could change partitioning?