際際滷

際際滷Share a Scribd company logo
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Part 2 of 3: Setting up a pipeline
Getting started with streaming analytics
Javier Ramirez
AWS Developer Advocate
@supercoco9
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Agenda
Building and running a basic Apache Kafka + Apache
Flink pipeline locally
Deploying to Amazon MSK + Kinesis Data Analytics
Adding aggregations and using Elasticsearch and
Kibana for the dashboards
Replacing our Kafka input by Kinesis Data Streams
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Ingestion/in-stream storage: Apache Kafka
A distributed streaming platform
Concepts:
Producers
Topics
Brokers
Consumers
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Stream Processing: Apache Flink
Stateful computation over Data Streams
Concepts:
Job Manager/Workers
Source
DataStream
Transforms/Operators
TableAPI/SQL
Sinks
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Stream Processing: Apache Flink
Stateful computation over Data Streams
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Dashboard: Elasticsearch with Kibana
Elasticsearch is a distributed JSON-based search and
analytics engine. Kibana gives shape to your data
https://www.elastic.co/kibana
Wikimedia has a live
interactive dashboard
powered by Kibana at
https://wikimedia.biterg.io/
Concepts:
Master Node
Data Nodes
Shard
Index
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Amazon Kinesis Data Streams
 Easy administration and low cost
 Real-time, elastic performance
 Secure, durable storage
 Available to multiple real-time analytics applications
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Amazon Kinesis Data Streams
Time-based
seek
 Data streams are made of Shards
 Each Shard ingests data up to 1MB/sec,
and up to 1000 TPS
 Each Shard emits up to 2 MB/sec
 All data is stored for 24 hours  7 days
 Scale Kinesis data streams by splitting or
merging Shards
 Replay data inside of 24Hr -7days
Window
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Getting Started
https://ci.apache.org/projects/flink/flink-docs-stable/
Apache Flink official docs
https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
Elasticsearch official docs
https://docs.aws.amazon.com/msk/latest/developerguide/what-is-msk.html
Getting started with Apache Kafka/Amazon MSK
https://aws.amazon.com/kinesis/
Amazon Kinesis Services for streaming data
https://aws.amazon.com/elasticsearch-service/
Amazon ElasticSearch Service
https://kafka.apache.org/documentation/
Apache Kafka official docs
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
ThanksJavier Ramirez
AWS Developer Advocate
@supercoco9

More Related Content

Similar to Getting started with streaming analytics: Setting up a pipeline (13)

Need for Speed Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed  Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Need for Speed  Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
AWS Summits
Serverless in Big Data
Serverless in Big DataServerless in Big Data
Serverless in Big Data
Eric Johnson
Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWS
Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWSKeynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWS
Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWS
Flink Forward
AWS Data Immersion Webinar Week - Entenda como ampliar suas possibilidades de...
AWS Data Immersion Webinar Week - Entenda como ampliar suas possibilidades de...AWS Data Immersion Webinar Week - Entenda como ampliar suas possibilidades de...
AWS Data Immersion Webinar Week - Entenda como ampliar suas possibilidades de...
Amazon Web Services LATAM
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
AWS Riyadh User Group
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
Cobus Bernard
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
Steven Hsieh
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
AWS Germany
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin Briskman
Sameer Kenkare
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
javier ramirez
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
Adir Sharabi
[AWS Media Symposium 2019] Perfecting the Media Experience with AWS - Bhavik ...
[AWS Media Symposium 2019] Perfecting the Media Experience with AWS - Bhavik ...[AWS Media Symposium 2019] Perfecting the Media Experience with AWS - Bhavik ...
[AWS Media Symposium 2019] Perfecting the Media Experience with AWS - Bhavik ...
Amazon Web Services Korea
How Disney+ uses fast data ubiquity to improve the customer experience
 How Disney+ uses fast data ubiquity to improve the customer experience  How Disney+ uses fast data ubiquity to improve the customer experience
How Disney+ uses fast data ubiquity to improve the customer experience
Martin Zapletal
Need for Speed Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed  Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Need for Speed  Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
AWS Summits
Serverless in Big Data
Serverless in Big DataServerless in Big Data
Serverless in Big Data
Eric Johnson
Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWS
Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWSKeynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWS
Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWS
Flink Forward
AWS Data Immersion Webinar Week - Entenda como ampliar suas possibilidades de...
AWS Data Immersion Webinar Week - Entenda como ampliar suas possibilidades de...AWS Data Immersion Webinar Week - Entenda como ampliar suas possibilidades de...
AWS Data Immersion Webinar Week - Entenda como ampliar suas possibilidades de...
Amazon Web Services LATAM
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
AWS Riyadh User Group
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
Cobus Bernard
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
Steven Hsieh
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
AWS Germany
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin Briskman
Sameer Kenkare
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
javier ramirez
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
Adir Sharabi
[AWS Media Symposium 2019] Perfecting the Media Experience with AWS - Bhavik ...
[AWS Media Symposium 2019] Perfecting the Media Experience with AWS - Bhavik ...[AWS Media Symposium 2019] Perfecting the Media Experience with AWS - Bhavik ...
[AWS Media Symposium 2019] Perfecting the Media Experience with AWS - Bhavik ...
Amazon Web Services Korea
How Disney+ uses fast data ubiquity to improve the customer experience
 How Disney+ uses fast data ubiquity to improve the customer experience  How Disney+ uses fast data ubiquity to improve the customer experience
How Disney+ uses fast data ubiquity to improve the customer experience
Martin Zapletal

More from javier ramirez (20)

The Future of Fast Databases: Lessons from a Decade of QuestDB
The Future of Fast Databases: Lessons from a Decade of QuestDBThe Future of Fast Databases: Lessons from a Decade of QuestDB
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
C坦mo hemos implementado sem叩ntica de "Exactly Once" en nuestra base de datos ...
C坦mo hemos implementado sem叩ntica de "Exactly Once" en nuestra base de datos ...C坦mo hemos implementado sem叩ntica de "Exactly Once" en nuestra base de datos ...
C坦mo hemos implementado sem叩ntica de "Exactly Once" en nuestra base de datos ...
javier ramirez
How We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeachHow We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeach
javier ramirez
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
多Se puede vivir del open source? T3chfest多Se puede vivir del open source? T3chfest
多Se puede vivir del open source? T3chfest
javier ramirez
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series database
javier ramirez
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
javier ramirez
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDB
javier ramirez
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
javier ramirez
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
javier ramirez
C坦mo se dise単a una base de datos que pueda ingerir m叩s de cuatro millones de ...
C坦mo se dise単a una base de datos que pueda ingerir m叩s de cuatro millones de ...C坦mo se dise単a una base de datos que pueda ingerir m叩s de cuatro millones de ...
C坦mo se dise単a una base de datos que pueda ingerir m叩s de cuatro millones de ...
javier ramirez
QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728
javier ramirez
Processing and analysing streaming data with Python. Pycon Italy 2022
Processing and analysing streaming  data with Python. Pycon Italy 2022Processing and analysing streaming  data with Python. Pycon Italy 2022
Processing and analysing streaming data with Python. Pycon Italy 2022
javier ramirez
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
javier ramirez
Servicios e infraestructura de AWS y la pr坦xima regi坦n en Arag坦nServicios e infraestructura de AWS y la pr坦xima regi坦n en Arag坦n
Servicios e infraestructura de AWS y la pr坦xima regi坦n en Arag坦n
javier ramirez
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverless
javier ramirez
How AWS is reinventing the cloud
How AWS is reinventing the cloudHow AWS is reinventing the cloud
How AWS is reinventing the cloud
javier ramirez
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
javier ramirez
Getting started with streaming analytics
Getting started with streaming analyticsGetting started with streaming analytics
Getting started with streaming analytics
javier ramirez
The Future of Fast Databases: Lessons from a Decade of QuestDB
The Future of Fast Databases: Lessons from a Decade of QuestDBThe Future of Fast Databases: Lessons from a Decade of QuestDB
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
C坦mo hemos implementado sem叩ntica de "Exactly Once" en nuestra base de datos ...
C坦mo hemos implementado sem叩ntica de "Exactly Once" en nuestra base de datos ...C坦mo hemos implementado sem叩ntica de "Exactly Once" en nuestra base de datos ...
C坦mo hemos implementado sem叩ntica de "Exactly Once" en nuestra base de datos ...
javier ramirez
How We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeachHow We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeach
javier ramirez
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
多Se puede vivir del open source? T3chfest多Se puede vivir del open source? T3chfest
多Se puede vivir del open source? T3chfest
javier ramirez
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series database
javier ramirez
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
javier ramirez
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDB
javier ramirez
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
javier ramirez
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
javier ramirez
C坦mo se dise単a una base de datos que pueda ingerir m叩s de cuatro millones de ...
C坦mo se dise単a una base de datos que pueda ingerir m叩s de cuatro millones de ...C坦mo se dise単a una base de datos que pueda ingerir m叩s de cuatro millones de ...
C坦mo se dise単a una base de datos que pueda ingerir m叩s de cuatro millones de ...
javier ramirez
QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728
javier ramirez
Processing and analysing streaming data with Python. Pycon Italy 2022
Processing and analysing streaming  data with Python. Pycon Italy 2022Processing and analysing streaming  data with Python. Pycon Italy 2022
Processing and analysing streaming data with Python. Pycon Italy 2022
javier ramirez
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
javier ramirez
Servicios e infraestructura de AWS y la pr坦xima regi坦n en Arag坦nServicios e infraestructura de AWS y la pr坦xima regi坦n en Arag坦n
Servicios e infraestructura de AWS y la pr坦xima regi坦n en Arag坦n
javier ramirez
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverless
javier ramirez
How AWS is reinventing the cloud
How AWS is reinventing the cloudHow AWS is reinventing the cloud
How AWS is reinventing the cloud
javier ramirez
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
javier ramirez
Getting started with streaming analytics
Getting started with streaming analyticsGetting started with streaming analytics
Getting started with streaming analytics
javier ramirez

Recently uploaded (20)

Big-O notations, Algorithm and complexity analaysis
Big-O notations, Algorithm and complexity analaysisBig-O notations, Algorithm and complexity analaysis
Big-O notations, Algorithm and complexity analaysis
drsomya2019
Chapter-4-Plane-Wave-Propagation-pdf.pdf
Chapter-4-Plane-Wave-Propagation-pdf.pdfChapter-4-Plane-Wave-Propagation-pdf.pdf
Chapter-4-Plane-Wave-Propagation-pdf.pdf
ShamsAli42
Satisfaction_Framework_Presentation.pptx
Satisfaction_Framework_Presentation.pptxSatisfaction_Framework_Presentation.pptx
Satisfaction_Framework_Presentation.pptx
nagom47355
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
jimmy841199
Model Context Protocol - path to LLM standartization
Model Context Protocol - path to LLM standartizationModel Context Protocol - path to LLM standartization
Model Context Protocol - path to LLM standartization
Antons Kranga
DII-WS Training Manual with Links_V2.pdf
DII-WS Training Manual with Links_V2.pdfDII-WS Training Manual with Links_V2.pdf
DII-WS Training Manual with Links_V2.pdf
coolprince739
GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptxGLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
KunalBhadana3
PostGIS Workshop: a comprehensive tutorial.ppt
PostGIS Workshop: a comprehensive tutorial.pptPostGIS Workshop: a comprehensive tutorial.ppt
PostGIS Workshop: a comprehensive tutorial.ppt
LonJames2
networkmonitoringtools-200615094423.pptx
networkmonitoringtools-200615094423.pptxnetworkmonitoringtools-200615094423.pptx
networkmonitoringtools-200615094423.pptx
kelvinzallan5
ARCH 2025: New Mexico Respite Provider Registry
ARCH 2025: New Mexico Respite Provider RegistryARCH 2025: New Mexico Respite Provider Registry
ARCH 2025: New Mexico Respite Provider Registry
Allen Shaw
AI-Powered Contact Centre Virtual Assistant DS
AI-Powered Contact Centre Virtual Assistant DSAI-Powered Contact Centre Virtual Assistant DS
AI-Powered Contact Centre Virtual Assistant DS
Srinivasan N
ENG8-Q4-MOD2.pdfajxnjdabajbadjbiadbiwdhiwdhwdhiwd
ENG8-Q4-MOD2.pdfajxnjdabajbadjbiadbiwdhiwdhwdhiwdENG8-Q4-MOD2.pdfajxnjdabajbadjbiadbiwdhiwdhwdhiwd
ENG8-Q4-MOD2.pdfajxnjdabajbadjbiadbiwdhiwdhwdhiwd
shekainahrosej
Database's & presentation's for beginners
Database's & presentation's for beginnersDatabase's & presentation's for beginners
Database's & presentation's for beginners
chubzja07
Information Security Management-Planning 1.pptx
Information Security Management-Planning 1.pptxInformation Security Management-Planning 1.pptx
Information Security Management-Planning 1.pptx
FrancisFayiah
diagram ANN of factor and responses.pptx
diagram ANN of factor and responses.pptxdiagram ANN of factor and responses.pptx
diagram ANN of factor and responses.pptx
EdunjobiTunde1
Control_Chart_Presentation copy for Business process and planning.pptx
Control_Chart_Presentation copy for Business process and planning.pptxControl_Chart_Presentation copy for Business process and planning.pptx
Control_Chart_Presentation copy for Business process and planning.pptx
PranavRaut36
STS-PRELIM-2025.pptxtyyfddjugggfssghghihf
STS-PRELIM-2025.pptxtyyfddjugggfssghghihfSTS-PRELIM-2025.pptxtyyfddjugggfssghghihf
STS-PRELIM-2025.pptxtyyfddjugggfssghghihf
TristanEvasco
Chat Bots - An Analytical study including Indian players
Chat Bots - An Analytical study including Indian playersChat Bots - An Analytical study including Indian players
Chat Bots - An Analytical study including Indian players
DR. Ram Kumar Pathak
Mastering Data Science with Tutort Academy
Mastering Data Science with Tutort AcademyMastering Data Science with Tutort Academy
Mastering Data Science with Tutort Academy
yashikanigam1
537116365-Domain-6-Presentation-New.pptx
537116365-Domain-6-Presentation-New.pptx537116365-Domain-6-Presentation-New.pptx
537116365-Domain-6-Presentation-New.pptx
PorshaAbril1
Big-O notations, Algorithm and complexity analaysis
Big-O notations, Algorithm and complexity analaysisBig-O notations, Algorithm and complexity analaysis
Big-O notations, Algorithm and complexity analaysis
drsomya2019
Chapter-4-Plane-Wave-Propagation-pdf.pdf
Chapter-4-Plane-Wave-Propagation-pdf.pdfChapter-4-Plane-Wave-Propagation-pdf.pdf
Chapter-4-Plane-Wave-Propagation-pdf.pdf
ShamsAli42
Satisfaction_Framework_Presentation.pptx
Satisfaction_Framework_Presentation.pptxSatisfaction_Framework_Presentation.pptx
Satisfaction_Framework_Presentation.pptx
nagom47355
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
jimmy841199
Model Context Protocol - path to LLM standartization
Model Context Protocol - path to LLM standartizationModel Context Protocol - path to LLM standartization
Model Context Protocol - path to LLM standartization
Antons Kranga
DII-WS Training Manual with Links_V2.pdf
DII-WS Training Manual with Links_V2.pdfDII-WS Training Manual with Links_V2.pdf
DII-WS Training Manual with Links_V2.pdf
coolprince739
GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptxGLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
KunalBhadana3
PostGIS Workshop: a comprehensive tutorial.ppt
PostGIS Workshop: a comprehensive tutorial.pptPostGIS Workshop: a comprehensive tutorial.ppt
PostGIS Workshop: a comprehensive tutorial.ppt
LonJames2
networkmonitoringtools-200615094423.pptx
networkmonitoringtools-200615094423.pptxnetworkmonitoringtools-200615094423.pptx
networkmonitoringtools-200615094423.pptx
kelvinzallan5
ARCH 2025: New Mexico Respite Provider Registry
ARCH 2025: New Mexico Respite Provider RegistryARCH 2025: New Mexico Respite Provider Registry
ARCH 2025: New Mexico Respite Provider Registry
Allen Shaw
AI-Powered Contact Centre Virtual Assistant DS
AI-Powered Contact Centre Virtual Assistant DSAI-Powered Contact Centre Virtual Assistant DS
AI-Powered Contact Centre Virtual Assistant DS
Srinivasan N
ENG8-Q4-MOD2.pdfajxnjdabajbadjbiadbiwdhiwdhwdhiwd
ENG8-Q4-MOD2.pdfajxnjdabajbadjbiadbiwdhiwdhwdhiwdENG8-Q4-MOD2.pdfajxnjdabajbadjbiadbiwdhiwdhwdhiwd
ENG8-Q4-MOD2.pdfajxnjdabajbadjbiadbiwdhiwdhwdhiwd
shekainahrosej
Database's & presentation's for beginners
Database's & presentation's for beginnersDatabase's & presentation's for beginners
Database's & presentation's for beginners
chubzja07
Information Security Management-Planning 1.pptx
Information Security Management-Planning 1.pptxInformation Security Management-Planning 1.pptx
Information Security Management-Planning 1.pptx
FrancisFayiah
diagram ANN of factor and responses.pptx
diagram ANN of factor and responses.pptxdiagram ANN of factor and responses.pptx
diagram ANN of factor and responses.pptx
EdunjobiTunde1
Control_Chart_Presentation copy for Business process and planning.pptx
Control_Chart_Presentation copy for Business process and planning.pptxControl_Chart_Presentation copy for Business process and planning.pptx
Control_Chart_Presentation copy for Business process and planning.pptx
PranavRaut36
STS-PRELIM-2025.pptxtyyfddjugggfssghghihf
STS-PRELIM-2025.pptxtyyfddjugggfssghghihfSTS-PRELIM-2025.pptxtyyfddjugggfssghghihf
STS-PRELIM-2025.pptxtyyfddjugggfssghghihf
TristanEvasco
Chat Bots - An Analytical study including Indian players
Chat Bots - An Analytical study including Indian playersChat Bots - An Analytical study including Indian players
Chat Bots - An Analytical study including Indian players
DR. Ram Kumar Pathak
Mastering Data Science with Tutort Academy
Mastering Data Science with Tutort AcademyMastering Data Science with Tutort Academy
Mastering Data Science with Tutort Academy
yashikanigam1
537116365-Domain-6-Presentation-New.pptx
537116365-Domain-6-Presentation-New.pptx537116365-Domain-6-Presentation-New.pptx
537116365-Domain-6-Presentation-New.pptx
PorshaAbril1

Getting started with streaming analytics: Setting up a pipeline

  • 1. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Part 2 of 3: Setting up a pipeline Getting started with streaming analytics Javier Ramirez AWS Developer Advocate @supercoco9
  • 2. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Agenda Building and running a basic Apache Kafka + Apache Flink pipeline locally Deploying to Amazon MSK + Kinesis Data Analytics Adding aggregations and using Elasticsearch and Kibana for the dashboards Replacing our Kafka input by Kinesis Data Streams
  • 3. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Ingestion/in-stream storage: Apache Kafka A distributed streaming platform Concepts: Producers Topics Brokers Consumers
  • 4. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Stream Processing: Apache Flink Stateful computation over Data Streams Concepts: Job Manager/Workers Source DataStream Transforms/Operators TableAPI/SQL Sinks
  • 5. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Stream Processing: Apache Flink Stateful computation over Data Streams
  • 6. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Dashboard: Elasticsearch with Kibana Elasticsearch is a distributed JSON-based search and analytics engine. Kibana gives shape to your data https://www.elastic.co/kibana Wikimedia has a live interactive dashboard powered by Kibana at https://wikimedia.biterg.io/ Concepts: Master Node Data Nodes Shard Index
  • 7. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Amazon Kinesis Data Streams Easy administration and low cost Real-time, elastic performance Secure, durable storage Available to multiple real-time analytics applications
  • 8. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Amazon Kinesis Data Streams Time-based seek Data streams are made of Shards Each Shard ingests data up to 1MB/sec, and up to 1000 TPS Each Shard emits up to 2 MB/sec All data is stored for 24 hours 7 days Scale Kinesis data streams by splitting or merging Shards Replay data inside of 24Hr -7days Window
  • 9. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Getting Started https://ci.apache.org/projects/flink/flink-docs-stable/ Apache Flink official docs https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html Elasticsearch official docs https://docs.aws.amazon.com/msk/latest/developerguide/what-is-msk.html Getting started with Apache Kafka/Amazon MSK https://aws.amazon.com/kinesis/ Amazon Kinesis Services for streaming data https://aws.amazon.com/elasticsearch-service/ Amazon ElasticSearch Service https://kafka.apache.org/documentation/ Apache Kafka official docs
  • 10. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential ThanksJavier Ramirez AWS Developer Advocate @supercoco9