際際滷shows by User: ytian1 / http://www.slideshare.net/images/logo.gif 際際滷shows by User: ytian1 / Fri, 25 Aug 2017 21:33:56 GMT 際際滷Share feed for 際際滷shows by User: ytian1 Big Data Analytics: From SQL to Machine Learning and Graph Analysis /slideshow/big-data-analytics-from-sql-to-machine-learning-and-graph-analysis/79162658 bigdaskeynoteonline-170825213356
Dr Yuanyuan Tian's Keynote speech at the BigDas Workshop, SIGKDD'2017, August 2017. ]]>

Dr Yuanyuan Tian's Keynote speech at the BigDas Workshop, SIGKDD'2017, August 2017. ]]>
Fri, 25 Aug 2017 21:33:56 GMT /slideshow/big-data-analytics-from-sql-to-machine-learning-and-graph-analysis/79162658 ytian1@slideshare.net(ytian1) Big Data Analytics: From SQL to Machine Learning and Graph Analysis ytian1 Dr Yuanyuan Tian's Keynote speech at the BigDas Workshop, SIGKDD'2017, August 2017. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/bigdaskeynoteonline-170825213356-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Dr Yuanyuan Tian&#39;s Keynote speech at the BigDas Workshop, SIGKDD&#39;2017, August 2017.
Big Data Analytics: From SQL to Machine Learning and Graph Analysis from Yuanyuan Tian
]]>
1208 8 https://cdn.slidesharecdn.com/ss_thumbnails/bigdaskeynoteonline-170825213356-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Giraph++: From "Think Like a Vertex" to "Think Like a Graph" /slideshow/giraph-from-think-like-a-vertex-to-think-like-a-graph/70831226 giraphvldb14-170109183454
To meet the challenge of processing rapidly growing graph and network data created by modern applications, a number of distributed graph processing systems have emerged, such as Pregel and GraphLab. All these systems divide input graphs into partitions, and employ a think like a vertex programming model to support iterative graph computation. This vertex-centric model is easy to program and has been proved useful for many graph algorithms. However, this model hides the partitioning information from the users, thus prevents many algorithm-specific optimizations. This often results in longer execution time due to excessive network messages (e.g. in Pregel) or heavy scheduling overhead to ensure data consistency (e.g. in GraphLab). To address this limitation, we propose a new think like a graph programming paradigm. Under this graph-centric model, the partition structure is opened up to the users, and can be utilized so that communication within a partition can bypass the heavy message passing or scheduling machinery. We implemented this model in a new system, called Giraph++, based on Apache Giraph, an open source implementation of Pregel. We explore the applicability of the graph-centric model to three categories of graph algorithms, and demonstrate its flexibility and superior performance, especially on well-partitioned data. ]]>

To meet the challenge of processing rapidly growing graph and network data created by modern applications, a number of distributed graph processing systems have emerged, such as Pregel and GraphLab. All these systems divide input graphs into partitions, and employ a think like a vertex programming model to support iterative graph computation. This vertex-centric model is easy to program and has been proved useful for many graph algorithms. However, this model hides the partitioning information from the users, thus prevents many algorithm-specific optimizations. This often results in longer execution time due to excessive network messages (e.g. in Pregel) or heavy scheduling overhead to ensure data consistency (e.g. in GraphLab). To address this limitation, we propose a new think like a graph programming paradigm. Under this graph-centric model, the partition structure is opened up to the users, and can be utilized so that communication within a partition can bypass the heavy message passing or scheduling machinery. We implemented this model in a new system, called Giraph++, based on Apache Giraph, an open source implementation of Pregel. We explore the applicability of the graph-centric model to three categories of graph algorithms, and demonstrate its flexibility and superior performance, especially on well-partitioned data. ]]>
Mon, 09 Jan 2017 18:34:54 GMT /slideshow/giraph-from-think-like-a-vertex-to-think-like-a-graph/70831226 ytian1@slideshare.net(ytian1) Giraph++: From "Think Like a Vertex" to "Think Like a Graph" ytian1 To meet the challenge of processing rapidly growing graph and network data created by modern applications, a number of distributed graph processing systems have emerged, such as Pregel and GraphLab. All these systems divide input graphs into partitions, and employ a think like a vertex programming model to support iterative graph computation. This vertex-centric model is easy to program and has been proved useful for many graph algorithms. However, this model hides the partitioning information from the users, thus prevents many algorithm-specific optimizations. This often results in longer execution time due to excessive network messages (e.g. in Pregel) or heavy scheduling overhead to ensure data consistency (e.g. in GraphLab). To address this limitation, we propose a new think like a graph programming paradigm. Under this graph-centric model, the partition structure is opened up to the users, and can be utilized so that communication within a partition can bypass the heavy message passing or scheduling machinery. We implemented this model in a new system, called Giraph++, based on Apache Giraph, an open source implementation of Pregel. We explore the applicability of the graph-centric model to three categories of graph algorithms, and demonstrate its flexibility and superior performance, especially on well-partitioned data. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/giraphvldb14-170109183454-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> To meet the challenge of processing rapidly growing graph and network data created by modern applications, a number of distributed graph processing systems have emerged, such as Pregel and GraphLab. All these systems divide input graphs into partitions, and employ a think like a vertex programming model to support iterative graph computation. This vertex-centric model is easy to program and has been proved useful for many graph algorithms. However, this model hides the partitioning information from the users, thus prevents many algorithm-specific optimizations. This often results in longer execution time due to excessive network messages (e.g. in Pregel) or heavy scheduling overhead to ensure data consistency (e.g. in GraphLab). To address this limitation, we propose a new think like a graph programming paradigm. Under this graph-centric model, the partition structure is opened up to the users, and can be utilized so that communication within a partition can bypass the heavy message passing or scheduling machinery. We implemented this model in a new system, called Giraph++, based on Apache Giraph, an open source implementation of Pregel. We explore the applicability of the graph-centric model to three categories of graph algorithms, and demonstrate its flexibility and superior performance, especially on well-partitioned data.
Giraph++: From "Think Like a Vertex" to "Think Like a Graph" from Yuanyuan Tian
]]>
950 2 https://cdn.slidesharecdn.com/ss_thumbnails/giraphvldb14-170109183454-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and Enterprise Data Warehouse (EDW) /ytian1/building-a-hybrid-warehouse-efficient-joins-between-data-stored-in-hdfs-and-enterprise-data-warehouse-edw bigjoinlunchtalknobackups-170109182402
With the advent of big data, the enterprise analytics landscape has dramatically changed. The HDFS has become an important data repository for all business analytics. Enterprises are using various big data technologies to process data and drive actionable insights. HDFS serves as the storage where other distributed processing frameworks, such as Hadoop and Spark, access and operate on large volumes of data. At the same time, enterprise data warehouses (EDWs) continue to support critical business analytics. EDWs are usually shared-nothing parallel databases that support complex SQL processing, updates, and transactions. As a result, they manage up-to-date data and support various business analytics tools, such as reporting and dashboards. A new generation of applications have emerged, requiring access and correlation of data stored in HDFS and EDWs. This has created the need for a new generation of a special federation between Hadoop-like big data platforms and EDWs, which we call the hybrid warehouse. In this talk, we identify the best hybrid warehouse architecture by studying various algorithms to join database and HDFS tables. ]]>

With the advent of big data, the enterprise analytics landscape has dramatically changed. The HDFS has become an important data repository for all business analytics. Enterprises are using various big data technologies to process data and drive actionable insights. HDFS serves as the storage where other distributed processing frameworks, such as Hadoop and Spark, access and operate on large volumes of data. At the same time, enterprise data warehouses (EDWs) continue to support critical business analytics. EDWs are usually shared-nothing parallel databases that support complex SQL processing, updates, and transactions. As a result, they manage up-to-date data and support various business analytics tools, such as reporting and dashboards. A new generation of applications have emerged, requiring access and correlation of data stored in HDFS and EDWs. This has created the need for a new generation of a special federation between Hadoop-like big data platforms and EDWs, which we call the hybrid warehouse. In this talk, we identify the best hybrid warehouse architecture by studying various algorithms to join database and HDFS tables. ]]>
Mon, 09 Jan 2017 18:24:02 GMT /ytian1/building-a-hybrid-warehouse-efficient-joins-between-data-stored-in-hdfs-and-enterprise-data-warehouse-edw ytian1@slideshare.net(ytian1) Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and Enterprise Data Warehouse (EDW) ytian1 With the advent of big data, the enterprise analytics landscape has dramatically changed. The HDFS has become an important data repository for all business analytics. Enterprises are using various big data technologies to process data and drive actionable insights. HDFS serves as the storage where other distributed processing frameworks, such as Hadoop and Spark, access and operate on large volumes of data. At the same time, enterprise data warehouses (EDWs) continue to support critical business analytics. EDWs are usually shared-nothing parallel databases that support complex SQL processing, updates, and transactions. As a result, they manage up-to-date data and support various business analytics tools, such as reporting and dashboards. A new generation of applications have emerged, requiring access and correlation of data stored in HDFS and EDWs. This has created the need for a new generation of a special federation between Hadoop-like big data platforms and EDWs, which we call the hybrid warehouse. In this talk, we identify the best hybrid warehouse architecture by studying various algorithms to join database and HDFS tables. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/bigjoinlunchtalknobackups-170109182402-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> With the advent of big data, the enterprise analytics landscape has dramatically changed. The HDFS has become an important data repository for all business analytics. Enterprises are using various big data technologies to process data and drive actionable insights. HDFS serves as the storage where other distributed processing frameworks, such as Hadoop and Spark, access and operate on large volumes of data. At the same time, enterprise data warehouses (EDWs) continue to support critical business analytics. EDWs are usually shared-nothing parallel databases that support complex SQL processing, updates, and transactions. As a result, they manage up-to-date data and support various business analytics tools, such as reporting and dashboards. A new generation of applications have emerged, requiring access and correlation of data stored in HDFS and EDWs. This has created the need for a new generation of a special federation between Hadoop-like big data platforms and EDWs, which we call the hybrid warehouse. In this talk, we identify the best hybrid warehouse architecture by studying various algorithms to join database and HDFS tables.
Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and Enterprise Data Warehouse (EDW) from Yuanyuan Tian
]]>
564 2 https://cdn.slidesharecdn.com/ss_thumbnails/bigjoinlunchtalknobackups-170109182402-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Big Graph Analytics Systems (Sigmod16 Tutorial) /slideshow/big-graph-analytics-systems-sigmod16-tutorial/63791236 sigmod16tutorial-160706202924
In recent years we have witnessed a surging interest in developing Big Graph processing systems. To date, tens of Big Graph systems have been proposed. This tutorial provides a timely and comprehensive review of existing Big Graph systems, and summarizes their pros and cons from various perspectives. We start from the existing vertex-centric systems, which which a programmer thinks intuitively like a vertex when developing parallel graph algorithms. We then introduce systems that adopt other computation paradigms and execution settings. The topics covered in this tutorial include programming models and algorithm design, computation models, communication mechanisms, out-of-core support, fault tolerance, dynamic graph support, and so on. We also highlight future research opportunities on Big Graph analytics.]]>

In recent years we have witnessed a surging interest in developing Big Graph processing systems. To date, tens of Big Graph systems have been proposed. This tutorial provides a timely and comprehensive review of existing Big Graph systems, and summarizes their pros and cons from various perspectives. We start from the existing vertex-centric systems, which which a programmer thinks intuitively like a vertex when developing parallel graph algorithms. We then introduce systems that adopt other computation paradigms and execution settings. The topics covered in this tutorial include programming models and algorithm design, computation models, communication mechanisms, out-of-core support, fault tolerance, dynamic graph support, and so on. We also highlight future research opportunities on Big Graph analytics.]]>
Wed, 06 Jul 2016 20:29:24 GMT /slideshow/big-graph-analytics-systems-sigmod16-tutorial/63791236 ytian1@slideshare.net(ytian1) Big Graph Analytics Systems (Sigmod16 Tutorial) ytian1 In recent years we have witnessed a surging interest in developing Big Graph processing systems. To date, tens of Big Graph systems have been proposed. This tutorial provides a timely and comprehensive review of existing Big Graph systems, and summarizes their pros and cons from various perspectives. We start from the existing vertex-centric systems, which which a programmer thinks intuitively like a vertex when developing parallel graph algorithms. We then introduce systems that adopt other computation paradigms and execution settings. The topics covered in this tutorial include programming models and algorithm design, computation models, communication mechanisms, out-of-core support, fault tolerance, dynamic graph support, and so on. We also highlight future research opportunities on Big Graph analytics. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sigmod16tutorial-160706202924-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In recent years we have witnessed a surging interest in developing Big Graph processing systems. To date, tens of Big Graph systems have been proposed. This tutorial provides a timely and comprehensive review of existing Big Graph systems, and summarizes their pros and cons from various perspectives. We start from the existing vertex-centric systems, which which a programmer thinks intuitively like a vertex when developing parallel graph algorithms. We then introduce systems that adopt other computation paradigms and execution settings. The topics covered in this tutorial include programming models and algorithm design, computation models, communication mechanisms, out-of-core support, fault tolerance, dynamic graph support, and so on. We also highlight future research opportunities on Big Graph analytics.
Big Graph Analytics Systems (Sigmod16 Tutorial) from Yuanyuan Tian
]]>
1663 12 https://cdn.slidesharecdn.com/ss_thumbnails/sigmod16tutorial-160706202924-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Scalable Topic-Specific Influence Analysis on Microblogs /slideshow/scalable-topicspecific-influence-analysis-on-microblogs/41535590 fldauc-141113183411-conversion-gate02
Invited talks at UC Santa Barbara and UC Santa Cruz, May 2013.]]>

Invited talks at UC Santa Barbara and UC Santa Cruz, May 2013.]]>
Thu, 13 Nov 2014 18:34:11 GMT /slideshow/scalable-topicspecific-influence-analysis-on-microblogs/41535590 ytian1@slideshare.net(ytian1) Scalable Topic-Specific Influence Analysis on Microblogs ytian1 Invited talks at UC Santa Barbara and UC Santa Cruz, May 2013. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/fldauc-141113183411-conversion-gate02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Invited talks at UC Santa Barbara and UC Santa Cruz, May 2013.
Scalable Topic-Specific Influence Analysis on Microblogs from Yuanyuan Tian
]]>
710 7 https://cdn.slidesharecdn.com/ss_thumbnails/fldauc-141113183411-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-ytian1-48x48.jpg?cb=1629586624 Dr. Yuanyuan Tian is a Principal Scientist at Microsoft Gray Systems Lab (GSL), and an ACM Distinguished Member. Before Microsoft, she was a Principal Research Staff Member at IBM Almaden Research Center. She obtained her Ph.D. in computer science from the University of Michigan. Her research interests include HTAP, SQL-on-Hadoop, big data federation, graph analytics platforms, and large-scale systems for machine learning. She has published two books and over 40 articles in top database venues with 4500+ citations. Dr. Tian has served in the editorial board for the new encyclopedia for Big Data, as an Associate Editor for VLDB Journal and PVLDB, and chaired various tracks in top database con humming80.github.io/ https://cdn.slidesharecdn.com/ss_thumbnails/bigdaskeynoteonline-170825213356-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/big-data-analytics-from-sql-to-machine-learning-and-graph-analysis/79162658 Big Data Analytics: Fr... https://cdn.slidesharecdn.com/ss_thumbnails/giraphvldb14-170109183454-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/giraph-from-think-like-a-vertex-to-think-like-a-graph/70831226 Giraph++: From &quot;Think ... https://cdn.slidesharecdn.com/ss_thumbnails/bigjoinlunchtalknobackups-170109182402-thumbnail.jpg?width=320&height=320&fit=bounds ytian1/building-a-hybrid-warehouse-efficient-joins-between-data-stored-in-hdfs-and-enterprise-data-warehouse-edw Building A Hybrid Ware...