際際滷shows by User: hlshih / http://www.slideshare.net/images/logo.gif 際際滷shows by User: hlshih / Fri, 27 Sep 2013 12:38:41 GMT 際際滷Share feed for 際際滷shows by User: hlshih Real-time Big Data Analytics Engine using Impala /slideshow/hit2013-impala-0925etu/26625797 hit2013impala0925etu-130927123841-phpapp01
Cloudera Impala is an open-source under Apache Licence enable real-time, interactive analytical SQL queries of the data stored in HBase or HDFS. The work was inspired by Google Dremel paper which is also the basis for Google BigQuery. It provide access same unified storage platform base on it's own distributed query engine but does not use mapreduce. In addition, it use also the same metadata, SQL syntax (HiveQL-like) ODBC driver and user interface (Hue Beeswax) as Hive. Besides the traditional Hadoop approach, aim to provide low-cost solution for resiliency and batch-oriented distributed data processing, we found more and more effort in the Big Data world pursuing the right solution for ad-hoc, fast queries and realtime data processing for large datasets. In this presentation, we'll explore how to run interactive queries inside Impala, advantages of the approach, architecture and understand how it optimizes data systems including also practical performance analysis.]]>

Cloudera Impala is an open-source under Apache Licence enable real-time, interactive analytical SQL queries of the data stored in HBase or HDFS. The work was inspired by Google Dremel paper which is also the basis for Google BigQuery. It provide access same unified storage platform base on it's own distributed query engine but does not use mapreduce. In addition, it use also the same metadata, SQL syntax (HiveQL-like) ODBC driver and user interface (Hue Beeswax) as Hive. Besides the traditional Hadoop approach, aim to provide low-cost solution for resiliency and batch-oriented distributed data processing, we found more and more effort in the Big Data world pursuing the right solution for ad-hoc, fast queries and realtime data processing for large datasets. In this presentation, we'll explore how to run interactive queries inside Impala, advantages of the approach, architecture and understand how it optimizes data systems including also practical performance analysis.]]>
Fri, 27 Sep 2013 12:38:41 GMT /slideshow/hit2013-impala-0925etu/26625797 hlshih@slideshare.net(hlshih) Real-time Big Data Analytics Engine using Impala hlshih Cloudera Impala is an open-source under Apache Licence enable real-time, interactive analytical SQL queries of the data stored in HBase or HDFS. The work was inspired by Google Dremel paper which is also the basis for Google BigQuery. It provide access same unified storage platform base on it's own distributed query engine but does not use mapreduce. In addition, it use also the same metadata, SQL syntax (HiveQL-like) ODBC driver and user interface (Hue Beeswax) as Hive. Besides the traditional Hadoop approach, aim to provide low-cost solution for resiliency and batch-oriented distributed data processing, we found more and more effort in the Big Data world pursuing the right solution for ad-hoc, fast queries and realtime data processing for large datasets. In this presentation, we'll explore how to run interactive queries inside Impala, advantages of the approach, architecture and understand how it optimizes data systems including also practical performance analysis. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/hit2013impala0925etu-130927123841-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Cloudera Impala is an open-source under Apache Licence enable real-time, interactive analytical SQL queries of the data stored in HBase or HDFS. The work was inspired by Google Dremel paper which is also the basis for Google BigQuery. It provide access same unified storage platform base on it&#39;s own distributed query engine but does not use mapreduce. In addition, it use also the same metadata, SQL syntax (HiveQL-like) ODBC driver and user interface (Hue Beeswax) as Hive. Besides the traditional Hadoop approach, aim to provide low-cost solution for resiliency and batch-oriented distributed data processing, we found more and more effort in the Big Data world pursuing the right solution for ad-hoc, fast queries and realtime data processing for large datasets. In this presentation, we&#39;ll explore how to run interactive queries inside Impala, advantages of the approach, architecture and understand how it optimizes data systems including also practical performance analysis.
Real-time Big Data Analytics Engine using Impala from Jason Shih
]]>
8727 4 https://cdn.slidesharecdn.com/ss_thumbnails/hit2013impala0925etu-130927123841-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Petabye scale data challenge /slideshow/petabye-scale-data-challenge/15234590 petabyescaledatachallenge05312010-121118133313-phpapp02
]]>

]]>
Sun, 18 Nov 2012 13:33:10 GMT /slideshow/petabye-scale-data-challenge/15234590 hlshih@slideshare.net(hlshih) Petabye scale data challenge hlshih <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/petabyescaledatachallenge05312010-121118133313-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br>
Petabye scale data challenge from Jason Shih
]]>
870 5 https://cdn.slidesharecdn.com/ss_thumbnails/petabyescaledatachallenge05312010-121118133313-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
High performance computing - building blocks, production & perspective /slideshow/high-performance-computing-building-blocks-production-perspective/15234578 highperformancecomputing-buildingblocksproductionperspective-121118133217-phpapp01
]]>

]]>
Sun, 18 Nov 2012 13:32:15 GMT /slideshow/high-performance-computing-building-blocks-production-perspective/15234578 hlshih@slideshare.net(hlshih) High performance computing - building blocks, production & perspective hlshih <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/highperformancecomputing-buildingblocksproductionperspective-121118133217-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br>
High performance computing - building blocks, production & perspective from Jason Shih
]]>
16975 10 https://cdn.slidesharecdn.com/ss_thumbnails/highperformancecomputing-buildingblocksproductionperspective-121118133217-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Hpc, grid and cloud computing - the past, present, and future challenge /slideshow/hpc-grid-and-cloud-computing-the-past-present-and-future-challenge/15234484 hpcgridandcloudcomputing1103-121118132005-phpapp01
]]>

]]>
Sun, 18 Nov 2012 13:20:03 GMT /slideshow/hpc-grid-and-cloud-computing-the-past-present-and-future-challenge/15234484 hlshih@slideshare.net(hlshih) Hpc, grid and cloud computing - the past, present, and future challenge hlshih <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/hpcgridandcloudcomputing1103-121118132005-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br>
Hpc, grid and cloud computing - the past, present, and future challenge from Jason Shih
]]>
1670 4 https://cdn.slidesharecdn.com/ss_thumbnails/hpcgridandcloudcomputing1103-121118132005-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Hadoop security overview_hit2012_1117rev /slideshow/hadoop-security-overviewhit20121117rev/15225798 hadoopsecurityoverviewhit20121117rev-121117153606-phpapp01
Overview of Hadoop security (revise from presentation in Hadoop in Taiwan, 2012). Detail configuration of security infrastructure leveraging kerberos and also extensive integration with LDAP aiming for fast exchange of cluster information. Introduction also Etu Appliance end of the slide. ]]>

Overview of Hadoop security (revise from presentation in Hadoop in Taiwan, 2012). Detail configuration of security infrastructure leveraging kerberos and also extensive integration with LDAP aiming for fast exchange of cluster information. Introduction also Etu Appliance end of the slide. ]]>
Sat, 17 Nov 2012 15:36:03 GMT /slideshow/hadoop-security-overviewhit20121117rev/15225798 hlshih@slideshare.net(hlshih) Hadoop security overview_hit2012_1117rev hlshih Overview of Hadoop security (revise from presentation in Hadoop in Taiwan, 2012). Detail configuration of security infrastructure leveraging kerberos and also extensive integration with LDAP aiming for fast exchange of cluster information. Introduction also Etu Appliance end of the slide. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/hadoopsecurityoverviewhit20121117rev-121117153606-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Overview of Hadoop security (revise from presentation in Hadoop in Taiwan, 2012). Detail configuration of security infrastructure leveraging kerberos and also extensive integration with LDAP aiming for fast exchange of cluster information. Introduction also Etu Appliance end of the slide.
Hadoop security overview_hit2012_1117rev from Jason Shih
]]>
2775 7 https://cdn.slidesharecdn.com/ss_thumbnails/hadoopsecurityoverviewhit20121117rev-121117153606-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-hlshih-48x48.jpg?cb=1676909869 Through actively involving in WLCG grid computing project for past 8 yrs and high performance/throughput computing infrastructure design for various of research projects, I've been passionate about large-scale distributed computing, high performance computing and computational science applications. I've worked alongside collaborators from Asia-Pacific regional wide grid initiatives and operation centers to facilitate distributed computing design and high available/reliable computing and storage resources for researchers by leveraging system automation, grid middlewares, monitoring system framework and Tier1 operation experiences. Toward resource integration with cloud infrastructure/platf... https://cdn.slidesharecdn.com/ss_thumbnails/hit2013impala0925etu-130927123841-phpapp01-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/hit2013-impala-0925etu/26625797 Real-time Big Data Ana... https://cdn.slidesharecdn.com/ss_thumbnails/petabyescaledatachallenge05312010-121118133313-phpapp02-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/petabye-scale-data-challenge/15234590 Petabye scale data cha... https://cdn.slidesharecdn.com/ss_thumbnails/highperformancecomputing-buildingblocksproductionperspective-121118133217-phpapp01-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/high-performance-computing-building-blocks-production-perspective/15234578 High performance compu...