�ݺ�ߣshows by User: aseigneurin

�ݺ�ߣshows by User: aseigneurin / http://www.slideshare.net/images/logo.gif �ݺ�ߣshows by User: aseigneurin / Thu, 05 Oct 2017 14:26:55 GMT �ݺ�ߣShare feed for �ݺ�ߣshows by User: aseigneurin Data Quality Monitoring in Realtime and at Scale /slideshow/data-quality-monitoring-in-realtime-and-at-scale/80498775 pres-kafka-dq-171005142655
Kafka has become extremely popular to stream data, but it imposes very little constraints over the format of the data that is being streamed. As we wanted all of our Data Engineers and Data Scientists to use the data in our Kafka clusters, we soon faced the challenge of keeping the quality of the data to its highest. We developed a tool to monitor the quality of the streams in realtime, and we had to make it scalable and fault tolerant. In this talk, we will see the technical difficulties we encountered with our Kafka Streams implementation, and how we went through a major rewrite of the application to make it scale.]]>
Kafka has become extremely popular to stream data, but it imposes very little constraints over the format of the data that is being streamed. As we wanted all of our Data Engineers and Data Scientists to use the data in our Kafka clusters, we soon faced the challenge of keeping the quality of the data to its highest. We developed a tool to monitor the quality of the streams in realtime, and we had to make it scalable and fault tolerant. In this talk, we will see the technical difficulties we encountered with our Kafka Streams implementation, and how we went through a major rewrite of the application to make it scale.]]> Thu, 05 Oct 2017 14:26:55 GMT /slideshow/data-quality-monitoring-in-realtime-and-at-scale/80498775 aseigneurin@slideshare.net(aseigneurin) Data Quality Monitoring in Realtime and at Scale aseigneurin Kafka has become extremely popular to stream data, but it imposes very little constraints over the format of the data that is being streamed. As we wanted all of our Data Engineers and Data Scientists to use the data in our Kafka clusters, we soon faced the challenge of keeping the quality of the data to its highest. We developed a tool to monitor the quality of the streams in realtime, and we had to make it scalable and fault tolerant. In this talk, we will see the technical difficulties we encountered with our Kafka Streams implementation, and how we went through a major rewrite of the application to make it scale. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/pres-kafka-dq-171005142655-thumbnail.jpg?width=120&height=120&fit=bounds" /> Kafka has become extremely popular to stream data, but it imposes very little constraints over the format of the data that is being streamed. As we wanted all of our Data Engineers and Data Scientists to use the data in our Kafka clusters, we soon faced the challenge of keeping the quality of the data to its highest. We developed a tool to monitor the quality of the streams in realtime, and we had to make it scalable and fault tolerant. In this talk, we will see the technical difficulties we encountered with our Kafka Streams implementation, and how we went through a major rewrite of the application to make it scale.

Data Quality Monitoring in Realtime and at Scale from Alexis Seigneurin

]]> 2009 2 https://cdn.slidesharecdn.com/ss_thumbnails/pres-kafka-dq-171005142655-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 0712_Seigneurin /aseigneurin/0712seigneurin 1018aa1a-cb94-42f8-b672-e43042ac0702-161112172409
]]>
]]> Sat, 12 Nov 2016 17:24:09 GMT /aseigneurin/0712seigneurin aseigneurin@slideshare.net(aseigneurin) 0712_Seigneurin aseigneurin <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/1018aa1a-cb94-42f8-b672-e43042ac0702-161112172409-thumbnail.jpg?width=120&height=120&fit=bounds" />

0712_Seigneurin from Alexis Seigneurin

]]> 487 2 https://cdn.slidesharecdn.com/ss_thumbnails/1018aa1a-cb94-42f8-b672-e43042ac0702-161112172409-thumbnail.jpg?width=120&height=120&fit=bounds presentation 000000 http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Lessons Learned: Using Spark and Microservices /slideshow/lessons-learned-using-spark-and-microservices/65325893 sparkandmicroservices-160824162008
Lessons Learned: Using Spark and Microservices to Empower Data Scientists and Data Engineers]]>
Lessons Learned: Using Spark and Microservices to Empower Data Scientists and Data Engineers]]> Wed, 24 Aug 2016 16:20:08 GMT /slideshow/lessons-learned-using-spark-and-microservices/65325893 aseigneurin@slideshare.net(aseigneurin) Lessons Learned: Using Spark and Microservices aseigneurin Lessons Learned: Using Spark and Microservices to Empower Data Scientists and Data Engineers <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sparkandmicroservices-160824162008-thumbnail.jpg?width=120&height=120&fit=bounds" /> Lessons Learned: Using Spark and Microservices to Empower Data Scientists and Data Engineers

Lessons Learned: Using Spark and Microservices from Alexis Seigneurin

]]> 7881 8 https://cdn.slidesharecdn.com/ss_thumbnails/sparkandmicroservices-160824162008-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Data Science meets Software Development /aseigneurin/data-science-meets-software-development datasciencemeetssoftwaredevelopment-150827141746-lva1-app6892
I work in a Data Innovation Lab with a horde of Data Scientists. Data Scientists gather data, clean data, apply Machine Learning algorithms and produce results, all of that with specialized tools (Dataiku, Scikit-Learn, R...). These processes run on a single machine, on data that is fixed in time, and they have no constraint on execution speed. With my fellow Developers, our goal is to bring these processes to production. Our constraints are very different: we want the code to be versioned, to be tested, to be deployed automatically and to produce logs. We also need it to run in production on distributed architectures (Spark, Hadoop), with fixed versions of languages and frameworks (Scala...), and with data that changes every day. In this talk, I will explain how we, Developers, work hand-in-hand with Data Scientists to shorten the path to running data workflows in production.]]>
I work in a Data Innovation Lab with a horde of Data Scientists. Data Scientists gather data, clean data, apply Machine Learning algorithms and produce results, all of that with specialized tools (Dataiku, Scikit-Learn, R...). These processes run on a single machine, on data that is fixed in time, and they have no constraint on execution speed. With my fellow Developers, our goal is to bring these processes to production. Our constraints are very different: we want the code to be versioned, to be tested, to be deployed automatically and to produce logs. We also need it to run in production on distributed architectures (Spark, Hadoop), with fixed versions of languages and frameworks (Scala...), and with data that changes every day. In this talk, I will explain how we, Developers, work hand-in-hand with Data Scientists to shorten the path to running data workflows in production.]]> Thu, 27 Aug 2015 14:17:45 GMT /aseigneurin/data-science-meets-software-development aseigneurin@slideshare.net(aseigneurin) Data Science meets Software Development aseigneurin I work in a Data Innovation Lab with a horde of Data Scientists. Data Scientists gather data, clean data, apply Machine Learning algorithms and produce results, all of that with specialized tools (Dataiku, Scikit-Learn, R...). These processes run on a single machine, on data that is fixed in time, and they have no constraint on execution speed. With my fellow Developers, our goal is to bring these processes to production. Our constraints are very different: we want the code to be versioned, to be tested, to be deployed automatically and to produce logs. We also need it to run in production on distributed architectures (Spark, Hadoop), with fixed versions of languages and frameworks (Scala...), and with data that changes every day. In this talk, I will explain how we, Developers, work hand-in-hand with Data Scientists to shorten the path to running data workflows in production. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/datasciencemeetssoftwaredevelopment-150827141746-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds" /> I work in a Data Innovation Lab with a horde of Data Scientists. Data Scientists gather data, clean data, apply Machine Learning algorithms and produce results, all of that with specialized tools (Dataiku, Scikit-Learn, R...). These processes run on a single machine, on data that is fixed in time, and they have no constraint on execution speed. With my fellow Developers, our goal is to bring these processes to production. Our constraints are very different: we want the code to be versioned, to be tested, to be deployed automatically and to produce logs. We also need it to run in production on distributed architectures (Spark, Hadoop), with fixed versions of languages and frameworks (Scala...), and with data that changes every day. In this talk, I will explain how we, Developers, work hand-in-hand with Data Scientists to shorten the path to running data workflows in production.

Data Science meets Software Development from Alexis Seigneurin

]]> 1054 11 https://cdn.slidesharecdn.com/ss_thumbnails/datasciencemeetssoftwaredevelopment-150827141746-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Spark (v1.3) - Présentation (Français) https://fr.slideshare.net/slideshow/spark-prsentation-franais/48343845 spark-prsentationfranais-150519161540-lva1-app6891
Présentation de Spark mise à jour pour Spark 1.3 (DataFrames).]]>
Présentation de Spark mise à jour pour Spark 1.3 (DataFrames).]]> Tue, 19 May 2015 16:15:39 GMT https://fr.slideshare.net/slideshow/spark-prsentation-franais/48343845 aseigneurin@slideshare.net(aseigneurin) Spark (v1.3) - Présentation (Français) aseigneurin Présentation de Spark mise à jour pour Spark 1.3 (DataFrames). <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/spark-prsentationfranais-150519161540-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds" /> Présentation de Spark mise à jour pour Spark 1.3 (DataFrames).

from Alexis Seigneurin

]]> 6258 64 https://cdn.slidesharecdn.com/ss_thumbnails/spark-prsentationfranais-150519161540-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds presentation 000000 http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Spark - Ippevent 19-02-2015 https://fr.slideshare.net/slideshow/spark-ippevent-19022015/44924051 spark-ippevent19-02-2015-150220065249-conversion-gate01
]]>
]]> Fri, 20 Feb 2015 06:52:48 GMT https://fr.slideshare.net/slideshow/spark-ippevent-19022015/44924051 aseigneurin@slideshare.net(aseigneurin) Spark - Ippevent 19-02-2015 aseigneurin <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/spark-ippevent19-02-2015-150220065249-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds" />

from Alexis Seigneurin

]]> 4171 32 https://cdn.slidesharecdn.com/ss_thumbnails/spark-ippevent19-02-2015-150220065249-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Spark - Alexis Seigneurin (Français) https://fr.slideshare.net/slideshow/spark-alexis-seigneurin-franais/43748427 spark-presentationfrancais-150121104355-conversion-gate01
Présentation sur Spark en Français]]>
Présentation sur Spark en Français]]> Wed, 21 Jan 2015 10:43:55 GMT https://fr.slideshare.net/slideshow/spark-alexis-seigneurin-franais/43748427 aseigneurin@slideshare.net(aseigneurin) Spark - Alexis Seigneurin (Français) aseigneurin Présentation sur Spark en Français <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/spark-presentationfrancais-150121104355-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds" /> Présentation sur Spark en Français

from Alexis Seigneurin

]]> 2052 39 https://cdn.slidesharecdn.com/ss_thumbnails/spark-presentationfrancais-150121104355-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Spark - Alexis Seigneurin (English) /slideshow/spark-alexis-seigneurin-english/43748401 spark-presentationanglais-150121104311-conversion-gate02
Spark presentation in English]]>
Spark presentation in English]]> Wed, 21 Jan 2015 10:43:11 GMT /slideshow/spark-alexis-seigneurin-english/43748401 aseigneurin@slideshare.net(aseigneurin) Spark - Alexis Seigneurin (English) aseigneurin Spark presentation in English <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/spark-presentationanglais-150121104311-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds" /> Spark presentation in English

Spark - Alexis Seigneurin (English) from Alexis Seigneurin

]]> 1108 1 https://cdn.slidesharecdn.com/ss_thumbnails/spark-presentationanglais-150121104311-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Spark, ou comment traiter des données à la vitesse de l'éclair https://fr.slideshare.net/slideshow/spark-ou-comment-traiter-des-donnes-la-vitesse-de-lclair/42633658 slides-141212030610-conversion-gate02
Spark fait partie de la nouvelle génération de frameworks de manipulation de données basés sur Hadoop. L’outil utilise agressivement la mémoire pour offrir des temps de traitement jusqu’à 100 fois plus rapides qu'Hadoop. Dans cette session, nous découvrirons les principes de traitement de données (notamment MapReduce) et les options mises à disposition pour monter un cluster (Zookeper, Mesos…). Nous ferons un point sur les différents modules proposés par le framework, et notamment sur Spark Streaming pour le traitement de données en flux continu. Présentation jouée chez Ippon le 11 décembre 2014.]]>
Spark fait partie de la nouvelle génération de frameworks de manipulation de données basés sur Hadoop. L’outil utilise agressivement la mémoire pour offrir des temps de traitement jusqu’à 100 fois plus rapides qu'Hadoop. Dans cette session, nous découvrirons les principes de traitement de données (notamment MapReduce) et les options mises à disposition pour monter un cluster (Zookeper, Mesos…). Nous ferons un point sur les différents modules proposés par le framework, et notamment sur Spark Streaming pour le traitement de données en flux continu. Présentation jouée chez Ippon le 11 décembre 2014.]]> Fri, 12 Dec 2014 03:06:10 GMT https://fr.slideshare.net/slideshow/spark-ou-comment-traiter-des-donnes-la-vitesse-de-lclair/42633658 aseigneurin@slideshare.net(aseigneurin) Spark, ou comment traiter des données à la vitesse de l'éclair aseigneurin Spark fait partie de la nouvelle génération de frameworks de manipulation de données basés sur Hadoop. L’outil utilise agressivement la mémoire pour offrir des temps de traitement jusqu’à 100 fois plus rapides qu'Hadoop. Dans cette session, nous découvrirons les principes de traitement de données (notamment MapReduce) et les options mises à disposition pour monter un cluster (Zookeper, Mesos…). Nous ferons un point sur les différents modules proposés par le framework, et notamment sur Spark Streaming pour le traitement de données en flux continu. Présentation jouée chez Ippon le 11 décembre 2014. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/slides-141212030610-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds" /> Spark fait partie de la nouvelle génération de frameworks de manipulation de données basés sur Hadoop. L’outil utilise agressivement la mémoire pour offrir des temps de traitement jusqu’à 100 fois plus rapides qu'Hadoop. Dans cette session, nous découvrirons les principes de traitement de données (notamment MapReduce) et les options mises à disposition pour monter un cluster (Zookeper, Mesos…). Nous ferons un point sur les différents modules proposés par le framework, et notamment sur Spark Streaming pour le traitement de données en flux continu. Présentation jouée chez Ippon le 11 décembre 2014.

from Alexis Seigneurin

]]> 4829 68 https://cdn.slidesharecdn.com/ss_thumbnails/slides-141212030610-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 https://cdn.slidesharecdn.com/profile-photo-aseigneurin-48x48.jpg?cb=1523625202 Data Engineer - Kafka, Spark, Scala... https://cdn.slidesharecdn.com/ss_thumbnails/pres-kafka-dq-171005142655-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/data-quality-monitoring-in-realtime-and-at-scale/80498775 Data Quality Monitorin... https://cdn.slidesharecdn.com/ss_thumbnails/1018aa1a-cb94-42f8-b672-e43042ac0702-161112172409-thumbnail.jpg?width=320&height=320&fit=bounds aseigneurin/0712seigneurin 0712_Seigneurin https://cdn.slidesharecdn.com/ss_thumbnails/sparkandmicroservices-160824162008-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/lessons-learned-using-spark-and-microservices/65325893 Lessons Learned: Using...