ºÝºÝߣshows by User: ni_po / http://www.slideshare.net/images/logo.gif ºÝºÝߣshows by User: ni_po / Mon, 26 Aug 2019 19:05:17 GMT ºÝºÝߣShare feed for ºÝºÝߣshows by User: ni_po Benchmarking Elastic Cloud Big Data Services under SLA Constraints /slideshow/benchmarking-elastic-cloud-big-data-services-under-sla-constraints/166621842 benchmarkingwithslafinal-split-190826190518
We introduce an extension for TPC benchmarks addressing the requirements of big data processing in cloud environments. We characterize it as the Elasticity Test and evaluate under TPCx-BB (BigBench). First, the Elasticity Test incorporates an approach to generate real-world query submissions patterns with distinct data scale factors based on major industrial cluster logs. Second, a new metric is introduced based on Service Level Agreements (SLAs) that takes the quality of service requirements of each query under consideration. Experiments with Apache Hive and Spark on the cloud platforms of three major vendors validate our approach by comparing to the current TPCx-BB metric. Results show how systems who fail to meet SLAs under concurrency due to queuing or degraded performance negatively affect the new metric. On the other hand, elastic systems meet a higher percentage of SLAs and thus are rewarded in the new metric. Such systems have the ability to scale up and down compute workers according to the demands of a varying workload and can thus save dollar costs.]]>

We introduce an extension for TPC benchmarks addressing the requirements of big data processing in cloud environments. We characterize it as the Elasticity Test and evaluate under TPCx-BB (BigBench). First, the Elasticity Test incorporates an approach to generate real-world query submissions patterns with distinct data scale factors based on major industrial cluster logs. Second, a new metric is introduced based on Service Level Agreements (SLAs) that takes the quality of service requirements of each query under consideration. Experiments with Apache Hive and Spark on the cloud platforms of three major vendors validate our approach by comparing to the current TPCx-BB metric. Results show how systems who fail to meet SLAs under concurrency due to queuing or degraded performance negatively affect the new metric. On the other hand, elastic systems meet a higher percentage of SLAs and thus are rewarded in the new metric. Such systems have the ability to scale up and down compute workers according to the demands of a varying workload and can thus save dollar costs.]]>
Mon, 26 Aug 2019 19:05:17 GMT /slideshow/benchmarking-elastic-cloud-big-data-services-under-sla-constraints/166621842 ni_po@slideshare.net(ni_po) Benchmarking Elastic Cloud Big Data Services under SLA Constraints ni_po We introduce an extension for TPC benchmarks addressing the requirements of big data processing in cloud environments. We characterize it as the Elasticity Test and evaluate under TPCx-BB (BigBench). First, the Elasticity Test incorporates an approach to generate real-world query submissions patterns with distinct data scale factors based on major industrial cluster logs. Second, a new metric is introduced based on Service Level Agreements (SLAs) that takes the quality of service requirements of each query under consideration. Experiments with Apache Hive and Spark on the cloud platforms of three major vendors validate our approach by comparing to the current TPCx-BB metric. Results show how systems who fail to meet SLAs under concurrency due to queuing or degraded performance negatively affect the new metric. On the other hand, elastic systems meet a higher percentage of SLAs and thus are rewarded in the new metric. Such systems have the ability to scale up and down compute workers according to the demands of a varying workload and can thus save dollar costs. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/benchmarkingwithslafinal-split-190826190518-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> We introduce an extension for TPC benchmarks addressing the requirements of big data processing in cloud environments. We characterize it as the Elasticity Test and evaluate under TPCx-BB (BigBench). First, the Elasticity Test incorporates an approach to generate real-world query submissions patterns with distinct data scale factors based on major industrial cluster logs. Second, a new metric is introduced based on Service Level Agreements (SLAs) that takes the quality of service requirements of each query under consideration. Experiments with Apache Hive and Spark on the cloud platforms of three major vendors validate our approach by comparing to the current TPCx-BB metric. Results show how systems who fail to meet SLAs under concurrency due to queuing or degraded performance negatively affect the new metric. On the other hand, elastic systems meet a higher percentage of SLAs and thus are rewarded in the new metric. Such systems have the ability to scale up and down compute workers according to the demands of a varying workload and can thus save dollar costs.
Benchmarking Elastic Cloud Big Data Services under SLA Constraints from Nicolas Poggi
]]>
393 1 https://cdn.slidesharecdn.com/ss_thumbnails/benchmarkingwithslafinal-split-190826190518-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Correctness and Performance of Apache Spark SQL /ni_po/correctness-and-performance-of-apache-spark-sql sparksummiteucorrectnessandperformanceofapachesparksql-190123083132
Presented at the Spark Summit EU 2018.]]>

Presented at the Spark Summit EU 2018.]]>
Wed, 23 Jan 2019 08:31:32 GMT /ni_po/correctness-and-performance-of-apache-spark-sql ni_po@slideshare.net(ni_po) Correctness and Performance of Apache Spark SQL ni_po Presented at the Spark Summit EU 2018. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sparksummiteucorrectnessandperformanceofapachesparksql-190123083132-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presented at the Spark Summit EU 2018.
Correctness and Performance of Apache Spark SQL from Nicolas Poggi
]]>
150 1 https://cdn.slidesharecdn.com/ss_thumbnails/sparksummiteucorrectnessandperformanceofapachesparksql-190123083132-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
State of Spark in the cloud (Spark Summit EU 2017) /slideshow/state-of-spark-in-the-cloud-spark-summit-eu-2017/81201290 cloud-bigbenchsparksummiteu2017-171025161500
Originally presented at: https://spark-summit.org/eu-2017/events/the-state-of-apache-spark-in-the-cloud/ Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. Nicolas Poggi evaluates the out-of-the-box support for Spark and compares the offerings, reliability, scalability, and price-performance from major PaaS providers, including Azure HDinsight, Amazon Web Services EMR, Google Dataproc with an on-premises commodity cluster as baseline. Nicolas uses BigBench, the brand new standard (TPCx-BB) for big data systems, with both Spark and Hive implementations for benchmarking the systems. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.). The work is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines and BigBench. The ALOJA project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization. Nicolas highlights how to easily repeat the benchmarks through ALOJA and benefit from BigBench to optimize your Spark cluster for advanced users. The work is a continuation of a paper to be published at the IEEE Big Data 16 conference.]]>

Originally presented at: https://spark-summit.org/eu-2017/events/the-state-of-apache-spark-in-the-cloud/ Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. Nicolas Poggi evaluates the out-of-the-box support for Spark and compares the offerings, reliability, scalability, and price-performance from major PaaS providers, including Azure HDinsight, Amazon Web Services EMR, Google Dataproc with an on-premises commodity cluster as baseline. Nicolas uses BigBench, the brand new standard (TPCx-BB) for big data systems, with both Spark and Hive implementations for benchmarking the systems. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.). The work is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines and BigBench. The ALOJA project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization. Nicolas highlights how to easily repeat the benchmarks through ALOJA and benefit from BigBench to optimize your Spark cluster for advanced users. The work is a continuation of a paper to be published at the IEEE Big Data 16 conference.]]>
Wed, 25 Oct 2017 16:15:00 GMT /slideshow/state-of-spark-in-the-cloud-spark-summit-eu-2017/81201290 ni_po@slideshare.net(ni_po) State of Spark in the cloud (Spark Summit EU 2017) ni_po Originally presented at: https://spark-summit.org/eu-2017/events/the-state-of-apache-spark-in-the-cloud/ Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. Nicolas Poggi evaluates the out-of-the-box support for Spark and compares the offerings, reliability, scalability, and price-performance from major PaaS providers, including Azure HDinsight, Amazon Web Services EMR, Google Dataproc with an on-premises commodity cluster as baseline. Nicolas uses BigBench, the brand new standard (TPCx-BB) for big data systems, with both Spark and Hive implementations for benchmarking the systems. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.). The work is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines and BigBench. The ALOJA project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization. Nicolas highlights how to easily repeat the benchmarks through ALOJA and benefit from BigBench to optimize your Spark cluster for advanced users. The work is a continuation of a paper to be published at the IEEE Big Data 16 conference. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/cloud-bigbenchsparksummiteu2017-171025161500-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Originally presented at: https://spark-summit.org/eu-2017/events/the-state-of-apache-spark-in-the-cloud/ Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. Nicolas Poggi evaluates the out-of-the-box support for Spark and compares the offerings, reliability, scalability, and price-performance from major PaaS providers, including Azure HDinsight, Amazon Web Services EMR, Google Dataproc with an on-premises commodity cluster as baseline. Nicolas uses BigBench, the brand new standard (TPCx-BB) for big data systems, with both Spark and Hive implementations for benchmarking the systems. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.). The work is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines and BigBench. The ALOJA project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization. Nicolas highlights how to easily repeat the benchmarks through ALOJA and benefit from BigBench to optimize your Spark cluster for advanced users. The work is a continuation of a paper to be published at the IEEE Big Data 16 conference.
State of Spark in the cloud (Spark Summit EU 2017) from Nicolas Poggi
]]>
105 1 https://cdn.slidesharecdn.com/ss_thumbnails/cloud-bigbenchsparksummiteu2017-171025161500-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
The state of Hive and Spark in the Cloud (July 2017) /slideshow/the-state-of-hive-and-spark-in-the-cloud-july-2017/78116258 thestateofhiveandsparkinthecloud-170721093856
Originally presented at the BDOOP and Spark Barcelona meetup groups: http://meetu.ps/3bwCTM Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. The talk compares: • The performance of both v1 and v2 for Spark and Hive • PaaS cloud services: Azure HDinsight, Amazon Web Services EMR, Google Cloud Dataproc • Out-of-the-box support for Spark and Hive versions from providers • PaaS reliability, scalability, and price-performance of the solutions Using BigBench, the new Big Data benchmark standard. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.). ]]>

Originally presented at the BDOOP and Spark Barcelona meetup groups: http://meetu.ps/3bwCTM Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. The talk compares: • The performance of both v1 and v2 for Spark and Hive • PaaS cloud services: Azure HDinsight, Amazon Web Services EMR, Google Cloud Dataproc • Out-of-the-box support for Spark and Hive versions from providers • PaaS reliability, scalability, and price-performance of the solutions Using BigBench, the new Big Data benchmark standard. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.). ]]>
Fri, 21 Jul 2017 09:38:56 GMT /slideshow/the-state-of-hive-and-spark-in-the-cloud-july-2017/78116258 ni_po@slideshare.net(ni_po) The state of Hive and Spark in the Cloud (July 2017) ni_po Originally presented at the BDOOP and Spark Barcelona meetup groups: http://meetu.ps/3bwCTM Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. The talk compares: • The performance of both v1 and v2 for Spark and Hive • PaaS cloud services: Azure HDinsight, Amazon Web Services EMR, Google Cloud Dataproc • Out-of-the-box support for Spark and Hive versions from providers • PaaS reliability, scalability, and price-performance of the solutions Using BigBench, the new Big Data benchmark standard. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.). <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/thestateofhiveandsparkinthecloud-170721093856-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Originally presented at the BDOOP and Spark Barcelona meetup groups: http://meetu.ps/3bwCTM Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. The talk compares: • The performance of both v1 and v2 for Spark and Hive • PaaS cloud services: Azure HDinsight, Amazon Web Services EMR, Google Cloud Dataproc • Out-of-the-box support for Spark and Hive versions from providers • PaaS reliability, scalability, and price-performance of the solutions Using BigBench, the new Big Data benchmark standard. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.).
The state of Hive and Spark in the Cloud (July 2017) from Nicolas Poggi
]]>
1076 6 https://cdn.slidesharecdn.com/ss_thumbnails/thestateofhiveandsparkinthecloud-170721093856-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
The state of Spark in the cloud /slideshow/the-state-of-spark-in-the-cloud/76341015 thestateofsparkinthecloud-170525141208
Originally presented at Strata EU 2017: https://conferences.oreilly.com/strata/strata-eu/public/schedule/detail/57631 Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. Nicolas Poggi evaluates the out-of-the-box support for Spark and compares the offerings, reliability, scalability, and price-performance from major PaaS providers, including Azure HDinsight, Amazon Web Services EMR, Google Dataproc, and Rackspace Cloud Big Data, with an on-premises commodity cluster as baseline. Nicolas uses BigBench, the brand new standard (TPCx-BB) for big data systems, with both Spark and Hive implementations for benchmarking the systems. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.). The work is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines and BigBench. The ALOJA project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization. Nicolas highlights how to easily repeat the benchmarks through ALOJA and benefit from BigBench to optimize your Spark cluster for advanced users. The work is a continuation of a paper to be published at the IEEE Big Data 16 conference. (A preprint copy can be obtained here.) ]]>

Originally presented at Strata EU 2017: https://conferences.oreilly.com/strata/strata-eu/public/schedule/detail/57631 Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. Nicolas Poggi evaluates the out-of-the-box support for Spark and compares the offerings, reliability, scalability, and price-performance from major PaaS providers, including Azure HDinsight, Amazon Web Services EMR, Google Dataproc, and Rackspace Cloud Big Data, with an on-premises commodity cluster as baseline. Nicolas uses BigBench, the brand new standard (TPCx-BB) for big data systems, with both Spark and Hive implementations for benchmarking the systems. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.). The work is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines and BigBench. The ALOJA project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization. Nicolas highlights how to easily repeat the benchmarks through ALOJA and benefit from BigBench to optimize your Spark cluster for advanced users. The work is a continuation of a paper to be published at the IEEE Big Data 16 conference. (A preprint copy can be obtained here.) ]]>
Thu, 25 May 2017 14:12:08 GMT /slideshow/the-state-of-spark-in-the-cloud/76341015 ni_po@slideshare.net(ni_po) The state of Spark in the cloud ni_po Originally presented at Strata EU 2017: https://conferences.oreilly.com/strata/strata-eu/public/schedule/detail/57631 Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. Nicolas Poggi evaluates the out-of-the-box support for Spark and compares the offerings, reliability, scalability, and price-performance from major PaaS providers, including Azure HDinsight, Amazon Web Services EMR, Google Dataproc, and Rackspace Cloud Big Data, with an on-premises commodity cluster as baseline. Nicolas uses BigBench, the brand new standard (TPCx-BB) for big data systems, with both Spark and Hive implementations for benchmarking the systems. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.). The work is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines and BigBench. The ALOJA project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization. Nicolas highlights how to easily repeat the benchmarks through ALOJA and benefit from BigBench to optimize your Spark cluster for advanced users. The work is a continuation of a paper to be published at the IEEE Big Data 16 conference. (A preprint copy can be obtained here.) <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/thestateofsparkinthecloud-170525141208-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Originally presented at Strata EU 2017: https://conferences.oreilly.com/strata/strata-eu/public/schedule/detail/57631 Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. Nicolas Poggi evaluates the out-of-the-box support for Spark and compares the offerings, reliability, scalability, and price-performance from major PaaS providers, including Azure HDinsight, Amazon Web Services EMR, Google Dataproc, and Rackspace Cloud Big Data, with an on-premises commodity cluster as baseline. Nicolas uses BigBench, the brand new standard (TPCx-BB) for big data systems, with both Spark and Hive implementations for benchmarking the systems. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.). The work is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines and BigBench. The ALOJA project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization. Nicolas highlights how to easily repeat the benchmarks through ALOJA and benefit from BigBench to optimize your Spark cluster for advanced users. The work is a continuation of a paper to be published at the IEEE Big Data 16 conference. (A preprint copy can be obtained here.)
The state of Spark in the cloud from Nicolas Poggi
]]>
1356 6 https://cdn.slidesharecdn.com/ss_thumbnails/thestateofsparkinthecloud-170525141208-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Using BigBench to compare Hive and Spark (Long version) /slideshow/using-bigbench-to-compare-hive-and-spark-75667382/75667382 usingbigbenchtocomparehiveandsparklongv3-170504090941
BigBench is the brand new standard (TPCx-BB) for benchmarking and testing Big Data systems. The BigBench specification describes several application use cases combining the need for SQL queries, Map/Reduce, user code (UDF), Machine Learning, and even streaming. From the available implementation, we can test the different framework combinations such as Hadoop+Hive (with Mahout) and Spark (SparkSQL+MLlib) in their different versions and configurations, helping us to spot problems and possible optimizations of our data stacks. This talk first introduces BigBench and how problems can it solve. Then, presents both Hive and Spark benchmark results with their respective 1 and 2 versions under distinct configurations including Tez, Mahout, MLlib. Experiments are run on Cloud and On-Prem clusters of different numbers of nodes and data scales, taking into account interactive and batch usage. Results are further classified by use cases, showing where each platform shines (or doesn't), and why, based on performance metrics and logfile analysis. The talk concludes with the main findings, the scalability, and limits of each framework. Originally presented at: https://dataworkssummit.com/munich-2017/sessions/using-bigbench-to-compare-hive-and-spark-versions-and-features/ ]]>

BigBench is the brand new standard (TPCx-BB) for benchmarking and testing Big Data systems. The BigBench specification describes several application use cases combining the need for SQL queries, Map/Reduce, user code (UDF), Machine Learning, and even streaming. From the available implementation, we can test the different framework combinations such as Hadoop+Hive (with Mahout) and Spark (SparkSQL+MLlib) in their different versions and configurations, helping us to spot problems and possible optimizations of our data stacks. This talk first introduces BigBench and how problems can it solve. Then, presents both Hive and Spark benchmark results with their respective 1 and 2 versions under distinct configurations including Tez, Mahout, MLlib. Experiments are run on Cloud and On-Prem clusters of different numbers of nodes and data scales, taking into account interactive and batch usage. Results are further classified by use cases, showing where each platform shines (or doesn't), and why, based on performance metrics and logfile analysis. The talk concludes with the main findings, the scalability, and limits of each framework. Originally presented at: https://dataworkssummit.com/munich-2017/sessions/using-bigbench-to-compare-hive-and-spark-versions-and-features/ ]]>
Thu, 04 May 2017 09:09:41 GMT /slideshow/using-bigbench-to-compare-hive-and-spark-75667382/75667382 ni_po@slideshare.net(ni_po) Using BigBench to compare Hive and Spark (Long version) ni_po BigBench is the brand new standard (TPCx-BB) for benchmarking and testing Big Data systems. The BigBench specification describes several application use cases combining the need for SQL queries, Map/Reduce, user code (UDF), Machine Learning, and even streaming. From the available implementation, we can test the different framework combinations such as Hadoop+Hive (with Mahout) and Spark (SparkSQL+MLlib) in their different versions and configurations, helping us to spot problems and possible optimizations of our data stacks. This talk first introduces BigBench and how problems can it solve. Then, presents both Hive and Spark benchmark results with their respective 1 and 2 versions under distinct configurations including Tez, Mahout, MLlib. Experiments are run on Cloud and On-Prem clusters of different numbers of nodes and data scales, taking into account interactive and batch usage. Results are further classified by use cases, showing where each platform shines (or doesn't), and why, based on performance metrics and logfile analysis. The talk concludes with the main findings, the scalability, and limits of each framework. Originally presented at: https://dataworkssummit.com/munich-2017/sessions/using-bigbench-to-compare-hive-and-spark-versions-and-features/ <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/usingbigbenchtocomparehiveandsparklongv3-170504090941-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> BigBench is the brand new standard (TPCx-BB) for benchmarking and testing Big Data systems. The BigBench specification describes several application use cases combining the need for SQL queries, Map/Reduce, user code (UDF), Machine Learning, and even streaming. From the available implementation, we can test the different framework combinations such as Hadoop+Hive (with Mahout) and Spark (SparkSQL+MLlib) in their different versions and configurations, helping us to spot problems and possible optimizations of our data stacks. This talk first introduces BigBench and how problems can it solve. Then, presents both Hive and Spark benchmark results with their respective 1 and 2 versions under distinct configurations including Tez, Mahout, MLlib. Experiments are run on Cloud and On-Prem clusters of different numbers of nodes and data scales, taking into account interactive and batch usage. Results are further classified by use cases, showing where each platform shines (or doesn&#39;t), and why, based on performance metrics and logfile analysis. The talk concludes with the main findings, the scalability, and limits of each framework. Originally presented at: https://dataworkssummit.com/munich-2017/sessions/using-bigbench-to-compare-hive-and-spark-versions-and-features/
Using BigBench to compare Hive and Spark (Long version) from Nicolas Poggi
]]>
753 5 https://cdn.slidesharecdn.com/ss_thumbnails/usingbigbenchtocomparehiveandsparklongv3-170504090941-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Using BigBench to compare Hive and Spark (short version) /ni_po/using-bigbench-to-compare-hive-and-spark usingbigbenchtocomparehiveandspark-170205120522
BigBench is the brand new standard for benchmarking and testing Big Data systems. This talk first introduces BigBench and how problems can it solve. Then, presents both Hive and Spark benchmark results with with their respective 1 and 2 versions under different configurations. Results are further classified by use cases, showing where each platform shines (or doesn't), and why, based on performance metrics and log-file analysis. The talk concludes with the main findings, the scalability and limits of each framework.]]>

BigBench is the brand new standard for benchmarking and testing Big Data systems. This talk first introduces BigBench and how problems can it solve. Then, presents both Hive and Spark benchmark results with with their respective 1 and 2 versions under different configurations. Results are further classified by use cases, showing where each platform shines (or doesn't), and why, based on performance metrics and log-file analysis. The talk concludes with the main findings, the scalability and limits of each framework.]]>
Sun, 05 Feb 2017 12:05:22 GMT /ni_po/using-bigbench-to-compare-hive-and-spark ni_po@slideshare.net(ni_po) Using BigBench to compare Hive and Spark (short version) ni_po BigBench is the brand new standard for benchmarking and testing Big Data systems. This talk first introduces BigBench and how problems can it solve. Then, presents both Hive and Spark benchmark results with with their respective 1 and 2 versions under different configurations. Results are further classified by use cases, showing where each platform shines (or doesn't), and why, based on performance metrics and log-file analysis. The talk concludes with the main findings, the scalability and limits of each framework. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/usingbigbenchtocomparehiveandspark-170205120522-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> BigBench is the brand new standard for benchmarking and testing Big Data systems. This talk first introduces BigBench and how problems can it solve. Then, presents both Hive and Spark benchmark results with with their respective 1 and 2 versions under different configurations. Results are further classified by use cases, showing where each platform shines (or doesn&#39;t), and why, based on performance metrics and log-file analysis. The talk concludes with the main findings, the scalability and limits of each framework.
Using BigBench to compare Hive and Spark (short version) from Nicolas Poggi
]]>
414 3 https://cdn.slidesharecdn.com/ss_thumbnails/usingbigbenchtocomparehiveandspark-170205120522-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Accelerating HBase with NVMe and Bucket Cache /slideshow/accelerating-hbase-with-nvme-and-bucket-cache/71776388 acceleratinghbasewithnvmeandbucketcache-170205115914
on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS. First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark. In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.]]>

on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS. First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark. In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.]]>
Sun, 05 Feb 2017 11:59:14 GMT /slideshow/accelerating-hbase-with-nvme-and-bucket-cache/71776388 ni_po@slideshare.net(ni_po) Accelerating HBase with NVMe and Bucket Cache ni_po on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS. First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark. In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/acceleratinghbasewithnvmeandbucketcache-170205115914-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS. First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark. In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.
Accelerating HBase with NVMe and Bucket Cache from Nicolas Poggi
]]>
1242 6 https://cdn.slidesharecdn.com/ss_thumbnails/acceleratinghbasewithnvmeandbucketcache-170205115914-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
The state of SQL-on-Hadoop in the Cloud /slideshow/the-state-of-sqlonhadoop-in-the-cloud/65570272 2016-08-315-160831235351
With the increase of Hadoop offerings in the Cloud, users are faced with many decisions to make: which Cloud provider, VMs to choose, cluster sizing, storage type, or even if to go to fully managed Platform-as-a-Service (PaaS) Hadoop? As the answer is always "depends on your data and usage", this talk will guide participants over an overview of the different PaaS solutions for the leading Cloud providers. By highlighting the main results benchmarking their SQL-on-Hadoop (i.e., Hive) services using the ALOJA benchmarking project. To compare their current offerings in terms of readiness, architectural differences, and cost-effectiveness (performance-to-price), to entry-level Hadoop based deployments. As well as briefly presenting how to replicate results and create custom benchmarks from internal apps. So that users can make their own decisions about choosing the right provider to their particular data needs. ]]>

With the increase of Hadoop offerings in the Cloud, users are faced with many decisions to make: which Cloud provider, VMs to choose, cluster sizing, storage type, or even if to go to fully managed Platform-as-a-Service (PaaS) Hadoop? As the answer is always "depends on your data and usage", this talk will guide participants over an overview of the different PaaS solutions for the leading Cloud providers. By highlighting the main results benchmarking their SQL-on-Hadoop (i.e., Hive) services using the ALOJA benchmarking project. To compare their current offerings in terms of readiness, architectural differences, and cost-effectiveness (performance-to-price), to entry-level Hadoop based deployments. As well as briefly presenting how to replicate results and create custom benchmarks from internal apps. So that users can make their own decisions about choosing the right provider to their particular data needs. ]]>
Wed, 31 Aug 2016 23:53:51 GMT /slideshow/the-state-of-sqlonhadoop-in-the-cloud/65570272 ni_po@slideshare.net(ni_po) The state of SQL-on-Hadoop in the Cloud ni_po With the increase of Hadoop offerings in the Cloud, users are faced with many decisions to make: which Cloud provider, VMs to choose, cluster sizing, storage type, or even if to go to fully managed Platform-as-a-Service (PaaS) Hadoop? As the answer is always "depends on your data and usage", this talk will guide participants over an overview of the different PaaS solutions for the leading Cloud providers. By highlighting the main results benchmarking their SQL-on-Hadoop (i.e., Hive) services using the ALOJA benchmarking project. To compare their current offerings in terms of readiness, architectural differences, and cost-effectiveness (performance-to-price), to entry-level Hadoop based deployments. As well as briefly presenting how to replicate results and create custom benchmarks from internal apps. So that users can make their own decisions about choosing the right provider to their particular data needs. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/2016-08-315-160831235351-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> With the increase of Hadoop offerings in the Cloud, users are faced with many decisions to make: which Cloud provider, VMs to choose, cluster sizing, storage type, or even if to go to fully managed Platform-as-a-Service (PaaS) Hadoop? As the answer is always &quot;depends on your data and usage&quot;, this talk will guide participants over an overview of the different PaaS solutions for the leading Cloud providers. By highlighting the main results benchmarking their SQL-on-Hadoop (i.e., Hive) services using the ALOJA benchmarking project. To compare their current offerings in terms of readiness, architectural differences, and cost-effectiveness (performance-to-price), to entry-level Hadoop based deployments. As well as briefly presenting how to replicate results and create custom benchmarks from internal apps. So that users can make their own decisions about choosing the right provider to their particular data needs.
The state of SQL-on-Hadoop in the Cloud from Nicolas Poggi
]]>
824 4 https://cdn.slidesharecdn.com/ss_thumbnails/2016-08-315-160831235351-thumbnail.jpg?width=120&height=120&fit=bounds presentation 000000 http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
sudoers: Benchmarking Hadoop with ALOJA /slideshow/sudoers-benchmarking-hadoop-with-aloja/53645990 sudoersbenchmarkinghadoopwithaloja-151007130128-lva1-app6891
Presentation for the sudoers Barcelona group 0ct 06 2015, on benchmarking Hadoop with ALOJA open source benchmarking platform. The presentation was mostly a live DEMO, posting some slides for the people who could not attend. http://lanyrd.com/2015/sudoers-barcelona-october/]]>

Presentation for the sudoers Barcelona group 0ct 06 2015, on benchmarking Hadoop with ALOJA open source benchmarking platform. The presentation was mostly a live DEMO, posting some slides for the people who could not attend. http://lanyrd.com/2015/sudoers-barcelona-october/]]>
Wed, 07 Oct 2015 13:01:28 GMT /slideshow/sudoers-benchmarking-hadoop-with-aloja/53645990 ni_po@slideshare.net(ni_po) sudoers: Benchmarking Hadoop with ALOJA ni_po Presentation for the sudoers Barcelona group 0ct 06 2015, on benchmarking Hadoop with ALOJA open source benchmarking platform. The presentation was mostly a live DEMO, posting some slides for the people who could not attend. http://lanyrd.com/2015/sudoers-barcelona-october/ <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sudoersbenchmarkinghadoopwithaloja-151007130128-lva1-app6891-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presentation for the sudoers Barcelona group 0ct 06 2015, on benchmarking Hadoop with ALOJA open source benchmarking platform. The presentation was mostly a live DEMO, posting some slides for the people who could not attend. http://lanyrd.com/2015/sudoers-barcelona-october/
sudoers: Benchmarking Hadoop with ALOJA from Nicolas Poggi
]]>
696 7 https://cdn.slidesharecdn.com/ss_thumbnails/sudoersbenchmarkinghadoopwithaloja-151007130128-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Benchmarking Hadoop and Big Data /slideshow/benchmarking-hadoop/49002081 benchmarkinghadoop-nicolaspoggi-150604175558-lva1-app6892
The slides from the BDOOP meetup group presentation on Benchmarking Big Data systems with use cases from Hadoop]]>

The slides from the BDOOP meetup group presentation on Benchmarking Big Data systems with use cases from Hadoop]]>
Thu, 04 Jun 2015 17:55:58 GMT /slideshow/benchmarking-hadoop/49002081 ni_po@slideshare.net(ni_po) Benchmarking Hadoop and Big Data ni_po The slides from the BDOOP meetup group presentation on Benchmarking Big Data systems with use cases from Hadoop <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/benchmarkinghadoop-nicolaspoggi-150604175558-lva1-app6892-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> The slides from the BDOOP meetup group presentation on Benchmarking Big Data systems with use cases from Hadoop
Benchmarking Hadoop and Big Data from Nicolas Poggi
]]>
3324 2 https://cdn.slidesharecdn.com/ss_thumbnails/benchmarkinghadoop-nicolaspoggi-150604175558-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Vagrant + Docker provider [+Puppet] /slideshow/n-poggi-vagrantdocker/37358875 npoggivagrant-docker-140725100147-phpapp02
An on-going presentation for the Docker workshop on how to integrate docker into Vagrant as a provider. In order to remove the requirement of having a VM, and speedup development environments. It also features Puppet as the configuration management system. The code can be found in: https://github.com/npoggi/vagrant-docker]]>

An on-going presentation for the Docker workshop on how to integrate docker into Vagrant as a provider. In order to remove the requirement of having a VM, and speedup development environments. It also features Puppet as the configuration management system. The code can be found in: https://github.com/npoggi/vagrant-docker]]>
Fri, 25 Jul 2014 10:01:46 GMT /slideshow/n-poggi-vagrantdocker/37358875 ni_po@slideshare.net(ni_po) Vagrant + Docker provider [+Puppet] ni_po An on-going presentation for the Docker workshop on how to integrate docker into Vagrant as a provider. In order to remove the requirement of having a VM, and speedup development environments. It also features Puppet as the configuration management system. The code can be found in: https://github.com/npoggi/vagrant-docker <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/npoggivagrant-docker-140725100147-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> An on-going presentation for the Docker workshop on how to integrate docker into Vagrant as a provider. In order to remove the requirement of having a VM, and speedup development environments. It also features Puppet as the configuration management system. The code can be found in: https://github.com/npoggi/vagrant-docker
Vagrant + Docker provider [+Puppet] from Nicolas Poggi
]]>
17069 6 https://cdn.slidesharecdn.com/ss_thumbnails/npoggivagrant-docker-140725100147-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
The case for Hadoop performance /slideshow/the-case-for-hadoop-performance/35206403 thecaseforhadoopperformance-140528044642-phpapp01
The case for Hadoop performance, short presentation on the need for BDOOP, the Big Data Operations On Performance group. Presents the case why analyzing Hadoop performance is complex, but important to devise cost effective infrastructures. Originally presented at the Barcelona BDOOP meetup group: http://www.meetup.com/BDOOP-BigData-Operations-On-Perfomance-Barcelona/]]>

The case for Hadoop performance, short presentation on the need for BDOOP, the Big Data Operations On Performance group. Presents the case why analyzing Hadoop performance is complex, but important to devise cost effective infrastructures. Originally presented at the Barcelona BDOOP meetup group: http://www.meetup.com/BDOOP-BigData-Operations-On-Perfomance-Barcelona/]]>
Wed, 28 May 2014 04:46:42 GMT /slideshow/the-case-for-hadoop-performance/35206403 ni_po@slideshare.net(ni_po) The case for Hadoop performance ni_po The case for Hadoop performance, short presentation on the need for BDOOP, the Big Data Operations On Performance group. Presents the case why analyzing Hadoop performance is complex, but important to devise cost effective infrastructures. Originally presented at the Barcelona BDOOP meetup group: http://www.meetup.com/BDOOP-BigData-Operations-On-Perfomance-Barcelona/ <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/thecaseforhadoopperformance-140528044642-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> The case for Hadoop performance, short presentation on the need for BDOOP, the Big Data Operations On Performance group. Presents the case why analyzing Hadoop performance is complex, but important to devise cost effective infrastructures. Originally presented at the Barcelona BDOOP meetup group: http://www.meetup.com/BDOOP-BigData-Operations-On-Perfomance-Barcelona/
The case for Hadoop performance from Nicolas Poggi
]]>
628 2 https://cdn.slidesharecdn.com/ss_thumbnails/thecaseforhadoopperformance-140528044642-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-ni_po-48x48.jpg?cb=1574108386 Nico is an IT researcher and developer with focus on performance and scalability of Data intensive and Web applications. Always looking for new Internet-scale challenges, with interests ranging from Cloud and data-center optimization, to Machine Learning and human-computer interaction. With a mission of advancing open Internet technologies, to make them more reachable, and most importantly, efficient to host and operate. Currently leading an open source research project (ALOJA) on upcoming architectures for data processing at the Barcelona Super Computing (BSC) and Microsoft Research joint center (http://www.bscmsrc.eu/). Previously has been involved in different Web and communication... personals.ac.upc.edu/npoggi/ https://cdn.slidesharecdn.com/ss_thumbnails/benchmarkingwithslafinal-split-190826190518-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/benchmarking-elastic-cloud-big-data-services-under-sla-constraints/166621842 Benchmarking Elastic C... https://cdn.slidesharecdn.com/ss_thumbnails/sparksummiteucorrectnessandperformanceofapachesparksql-190123083132-thumbnail.jpg?width=320&height=320&fit=bounds ni_po/correctness-and-performance-of-apache-spark-sql Correctness and Perfor... https://cdn.slidesharecdn.com/ss_thumbnails/cloud-bigbenchsparksummiteu2017-171025161500-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/state-of-spark-in-the-cloud-spark-summit-eu-2017/81201290 State of Spark in the ...