�ݺ�ߣshows by User: ergherh

�ݺ�ߣshows by User: ergherh / http://www.slideshare.net/images/logo.gif �ݺ�ߣshows by User: ergherh / Tue, 02 Oct 2018 20:48:10 GMT �ݺ�ߣShare feed for �ݺ�ߣshows by User: ergherh Designing a Horizontally Scalable Event-Driven Big Data Architecture with Apache Spark /slideshow/designing-a-horizontally-scalable-eventdriven-big-data-architecture-with-apache-spark/117854790 sparksummitricardofanjul-final-181002204810
Traditional data architectures are not enough to handle the huge amounts of data generated from millions of users. In addition, the diversity of data sources are increasing every day: Distributed file systems, relational, columnar-oriented, document-oriented or graph Databases. Letgo has been growing quickly during the last years. Because of this, we needed to improve the scalability or our data platform and endow it further capabilities, like “dynamic infrastructure elasticity”, real-time processing or real-time complex event processing. In this talk, we are going to dive deeper into our journey. We started from a traditional data architecture with ETL and Redshift, till nowadays where we successfully have made an event oriented and horizontally scalable data architecture. We will explain in detail from the event ingestion with Kafka / Kafka Connect to its processing in streaming and batch with Spark. On top of that, we will discuss how we have used Spark Thrift Server / Hive Metastore as glue to exploit all our data sources: HDFS, S3, Cassandra, Redshift, MariaDB … in a unified way from any point of our ecosystem, using technologies like: Jupyter, Zeppelin, Superset â¦ We will also describe how to made ETL only with pure Spark SQL using Airflow for orchestration. Along the way, we will highlight the challenges that we found and how we solved them. We will share a lot of useful tips for the ones that also want to start this journey in their own companies. ]]>
Traditional data architectures are not enough to handle the huge amounts of data generated from millions of users. In addition, the diversity of data sources are increasing every day: Distributed file systems, relational, columnar-oriented, document-oriented or graph Databases. Letgo has been growing quickly during the last years. Because of this, we needed to improve the scalability or our data platform and endow it further capabilities, like “dynamic infrastructure elasticity”, real-time processing or real-time complex event processing. In this talk, we are going to dive deeper into our journey. We started from a traditional data architecture with ETL and Redshift, till nowadays where we successfully have made an event oriented and horizontally scalable data architecture. We will explain in detail from the event ingestion with Kafka / Kafka Connect to its processing in streaming and batch with Spark. On top of that, we will discuss how we have used Spark Thrift Server / Hive Metastore as glue to exploit all our data sources: HDFS, S3, Cassandra, Redshift, MariaDB … in a unified way from any point of our ecosystem, using technologies like: Jupyter, Zeppelin, Superset â¦ We will also describe how to made ETL only with pure Spark SQL using Airflow for orchestration. Along the way, we will highlight the challenges that we found and how we solved them. We will share a lot of useful tips for the ones that also want to start this journey in their own companies. ]]> Tue, 02 Oct 2018 20:48:10 GMT /slideshow/designing-a-horizontally-scalable-eventdriven-big-data-architecture-with-apache-spark/117854790 ergherh@slideshare.net(ergherh) Designing a Horizontally Scalable Event-Driven Big Data Architecture with Apache Spark ergherh Traditional data architectures are not enough to handle the huge amounts of data generated from millions of users. In addition, the diversity of data sources are increasing every day: Distributed file systems, relational, columnar-oriented, document-oriented or graph Databases. Letgo has been growing quickly during the last years. Because of this, we needed to improve the scalability or our data platform and endow it further capabilities, like “dynamic infrastructure elasticity”, real-time processing or real-time complex event processing. In this talk, we are going to dive deeper into our journey. We started from a traditional data architecture with ETL and Redshift, till nowadays where we successfully have made an event oriented and horizontally scalable data architecture. We will explain in detail from the event ingestion with Kafka / Kafka Connect to its processing in streaming and batch with Spark. On top of that, we will discuss how we have used Spark Thrift Server / Hive Metastore as glue to exploit all our data sources: HDFS, S3, Cassandra, Redshift, MariaDB … in a unified way from any point of our ecosystem, using technologies like: Jupyter, Zeppelin, Superset â€¦ We will also describe how to made ETL only with pure Spark SQL using Airflow for orchestration. Along the way, we will highlight the challenges that we found and how we solved them. We will share a lot of useful tips for the ones that also want to start this journey in their own companies. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sparksummitricardofanjul-final-181002204810-thumbnail.jpg?width=120&height=120&fit=bounds" /><br> Traditional data architectures are not enough to handle the huge amounts of data generated from millions of users. In addition, the diversity of data sources are increasing every day: Distributed file systems, relational, columnar-oriented, document-oriented or graph Databases. Letgo has been growing quickly during the last years. Because of this, we needed to improve the scalability or our data platform and endow it further capabilities, like “dynamic infrastructure elasticity”, real-time processing or real-time complex event processing. In this talk, we are going to dive deeper into our journey. We started from a traditional data architecture with ETL and Redshift, till nowadays where we successfully have made an event oriented and horizontally scalable data architecture. We will explain in detail from the event ingestion with Kafka / Kafka Connect to its processing in streaming and batch with Spark. On top of that, we will discuss how we have used Spark Thrift Server / Hive Metastore as glue to exploit all our data sources: HDFS, S3, Cassandra, Redshift, MariaDB … in a unified way from any point of our ecosystem, using technologies like: Jupyter, Zeppelin, Superset â€¦ We will also describe how to made ETL only with pure Spark SQL using Airflow for orchestration. Along the way, we will highlight the challenges that we found and how we solved them. We will share a lot of useful tips for the ones that also want to start this journey in their own companies.

Designing a Horizontally Scalable Event-Driven Big Data Architecture with Apache Spark from Ricardo Fanjul Fandi単o

]]> 290 2 https://cdn.slidesharecdn.com/ss_thumbnails/sparksummitricardofanjul-final-181002204810-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Letgo Data Platform: A global overview /slideshow/letgo-data-platform-a-global-overview-94261968/94261968 sparkmeetupfinal-180418193859
How to develop a Big Data platform around Spark.]]>
How to develop a Big Data platform around Spark.]]> Wed, 18 Apr 2018 19:38:59 GMT /slideshow/letgo-data-platform-a-global-overview-94261968/94261968 ergherh@slideshare.net(ergherh) Letgo Data Platform: A global overview ergherh How to develop a Big Data platform around Spark. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sparkmeetupfinal-180418193859-thumbnail.jpg?width=120&height=120&fit=bounds" /><br> How to develop a Big Data platform around Spark.

Letgo Data Platform: A global overview from Ricardo Fanjul Fandi単o

]]> 973 9 https://cdn.slidesharecdn.com/ss_thumbnails/sparkmeetupfinal-180418193859-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 https://cdn.slidesharecdn.com/profile-photo-ergherh-48x48.jpg?cb=1678573698 https://cdn.slidesharecdn.com/ss_thumbnails/sparksummitricardofanjul-final-181002204810-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/designing-a-horizontally-scalable-eventdriven-big-data-architecture-with-apache-spark/117854790 Designing a Horizontal... https://cdn.slidesharecdn.com/ss_thumbnails/sparkmeetupfinal-180418193859-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/letgo-data-platform-a-global-overview-94261968/94261968 Letgo Data Platform: A...