際際滷

際際滷Share a Scribd company logo
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Part 3 of 3: Deep Dive
Getting started with streaming analytics
Javier Ramirez
AWS Developer Advocate
@supercoco9
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Agenda
Working with Avro and schemas
Time windows, session windows, and accumulators
Complex Event Processing
Streaming analytics with SQL
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Ingestion/in-stream storage: Apache Kafka
A distributed streaming platform
Concepts:
Producers
Topics
Brokers
Consumers
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Apache Avro
Apache Avro is a data serialization system.
Avro provides:
 Rich data structures.
 A compact, fast, binary data format.
 A container file, to store persistent data.
 Remote procedure call (RPC).
 Simple integration with dynamic languages. Code
generation is not required
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Confluent Schema Registry
https://docs.confluent.io/current/schema-registry/index.html
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Stream Processing: Apache Flink
Stateful computation over Data Streams
Concepts:
Job Manager/Workers
Source
DataStream
Transforms/Operators
TableAPI/SQL
Sinks
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Different types of streaming windows
13:00 14:008:00 9:00 10:00 11:00 12:00
Processing
Time
Event Time
Processing
Time
11:0010:00 15:0014:0013:0012:00
11:0010:00 15:0014:0013:0012:00
Input
Output
Event Time
Processing
Time 11:0
0
10:0
0
15:0
0
14:0
0
13:0
0
12:0
0
11:0
0
10:0
0
15:0
0
14:0
0
13:0
0
12:0
0
Input
Output
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Apache Flink Windows
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
FlinkCEP
Complex Event Processing for Flink
FlinkCEP is the Complex Event Processing (CEP) library
implemented on top of Flink.
It allows you to detect event patterns in an endless stream
of events, giving you the opportunity to get hold of whats
important in your data.
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Kinesis Data Analytics for SQL
 Sub-second end to end processing latencies
 SQL steps can be chained together in serial or parallel
steps
 Build applications with one or hundreds of queries
 Pre-built functions include everything from sum and
count distinct to machine learning algorithms
 Aggregations run continuously using window operators
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Automatic schema discovery which
works for CSV and JSON data
Supports multiple event types,
arbitrary object nesting, single level
of array nesting
Connect to Streaming Data Sources
Easily connect to Kinesis Data streams and
Kinesis Data Firehose delivery streams
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Firehose
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Built-in AWS Lambda integration provides flexible pre-
processing ahead of SQL code for:
 Normalizing different event types
 Converting other data formats (AVRO, Protobuf, ZIP) to JSON and CSV
 Custom enrichment from database tables or API calls
Pre-process Data Streams Using AWS Lambda
AWS Lambda function
raw data
Amazon Kinesis Data Analytics application
transformed
data
SQL
code
source destination
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Fast, iterative development with SQL templates in console to get started
Interactive SQL Editor
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Window Types
 Sliding, tumbling, and custom windows
 Tumbling windows are fixed size and grouped keys do not overlap
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
 Add a SQL table to your streaming application from Amazon S3
 Periodically update the table by calling the update application API
Enrich your Data Stream using Amazon S3 Data
In-application stream
Amazon Kinesis Data Analytics application
SQL code joining
table and stream
streaming source destination
Amazon
S3
In-application table
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Getting Started
https://ci.apache.org/projects/flink/flink-docs-stable/
Apache Flink official docs
https://docs.aws.amazon.com/msk/latest/developerguide/what-is-msk.html
Getting started with Apache Kafka/Amazon MSK
https://aws.amazon.com/kinesis/
Amazon Kinesis Services for streaming data
https://aws.amazon.com/elasticsearch-service/
Amazon ElasticSearch Service
https://kafka.apache.org/documentation/
Apache Kafka official docs
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
ThanksJavier Ramirez
AWS Developer Advocate
@supercoco9

More Related Content

Similar to Getting started with streaming analytics: Deep Dive (7)

Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018
Brendan Bouffler
Building API Driven Microservices
Building API Driven MicroservicesBuilding API Driven Microservices
Building API Driven Microservices
Chris Munns
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
AWS Riyadh User Group
The serverless LAMP stack
The serverless LAMP stackThe serverless LAMP stack
The serverless LAMP stack
件 Ben Smith
Re cap2018
Re cap2018Re cap2018
Re cap2018
Richard Harvey
AWS Outposts Update
AWS Outposts UpdateAWS Outposts Update
AWS Outposts Update
AWS Daily News
Serverless use cases with AWS Lambda - More Serverless Event
Serverless use cases with AWS Lambda - More Serverless EventServerless use cases with AWS Lambda - More Serverless Event
Serverless use cases with AWS Lambda - More Serverless Event
Boaz Ziniman
Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018
Brendan Bouffler
Building API Driven Microservices
Building API Driven MicroservicesBuilding API Driven Microservices
Building API Driven Microservices
Chris Munns
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
AWS Riyadh User Group
The serverless LAMP stack
The serverless LAMP stackThe serverless LAMP stack
The serverless LAMP stack
件 Ben Smith
AWS Outposts Update
AWS Outposts UpdateAWS Outposts Update
AWS Outposts Update
AWS Daily News
Serverless use cases with AWS Lambda - More Serverless Event
Serverless use cases with AWS Lambda - More Serverless EventServerless use cases with AWS Lambda - More Serverless Event
Serverless use cases with AWS Lambda - More Serverless Event
Boaz Ziniman

More from javier ramirez (20)

The Future of Fast Databases: Lessons from a Decade of QuestDB
The Future of Fast Databases: Lessons from a Decade of QuestDBThe Future of Fast Databases: Lessons from a Decade of QuestDB
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
C坦mo hemos implementado sem叩ntica de "Exactly Once" en nuestra base de datos ...
C坦mo hemos implementado sem叩ntica de "Exactly Once" en nuestra base de datos ...C坦mo hemos implementado sem叩ntica de "Exactly Once" en nuestra base de datos ...
C坦mo hemos implementado sem叩ntica de "Exactly Once" en nuestra base de datos ...
javier ramirez
How We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeachHow We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeach
javier ramirez
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
多Se puede vivir del open source? T3chfest多Se puede vivir del open source? T3chfest
多Se puede vivir del open source? T3chfest
javier ramirez
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series database
javier ramirez
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
javier ramirez
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDB
javier ramirez
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
javier ramirez
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
javier ramirez
C坦mo se dise単a una base de datos que pueda ingerir m叩s de cuatro millones de ...
C坦mo se dise単a una base de datos que pueda ingerir m叩s de cuatro millones de ...C坦mo se dise単a una base de datos que pueda ingerir m叩s de cuatro millones de ...
C坦mo se dise単a una base de datos que pueda ingerir m叩s de cuatro millones de ...
javier ramirez
QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728
javier ramirez
Processing and analysing streaming data with Python. Pycon Italy 2022
Processing and analysing streaming  data with Python. Pycon Italy 2022Processing and analysing streaming  data with Python. Pycon Italy 2022
Processing and analysing streaming data with Python. Pycon Italy 2022
javier ramirez
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
javier ramirez
Servicios e infraestructura de AWS y la pr坦xima regi坦n en Arag坦nServicios e infraestructura de AWS y la pr坦xima regi坦n en Arag坦n
Servicios e infraestructura de AWS y la pr坦xima regi坦n en Arag坦n
javier ramirez
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverless
javier ramirez
How AWS is reinventing the cloud
How AWS is reinventing the cloudHow AWS is reinventing the cloud
How AWS is reinventing the cloud
javier ramirez
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
javier ramirez
Getting started with streaming analytics
Getting started with streaming analyticsGetting started with streaming analytics
Getting started with streaming analytics
javier ramirez
The Future of Fast Databases: Lessons from a Decade of QuestDB
The Future of Fast Databases: Lessons from a Decade of QuestDBThe Future of Fast Databases: Lessons from a Decade of QuestDB
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
C坦mo hemos implementado sem叩ntica de "Exactly Once" en nuestra base de datos ...
C坦mo hemos implementado sem叩ntica de "Exactly Once" en nuestra base de datos ...C坦mo hemos implementado sem叩ntica de "Exactly Once" en nuestra base de datos ...
C坦mo hemos implementado sem叩ntica de "Exactly Once" en nuestra base de datos ...
javier ramirez
How We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeachHow We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeach
javier ramirez
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
多Se puede vivir del open source? T3chfest多Se puede vivir del open source? T3chfest
多Se puede vivir del open source? T3chfest
javier ramirez
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series database
javier ramirez
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
javier ramirez
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDB
javier ramirez
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
javier ramirez
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
javier ramirez
C坦mo se dise単a una base de datos que pueda ingerir m叩s de cuatro millones de ...
C坦mo se dise単a una base de datos que pueda ingerir m叩s de cuatro millones de ...C坦mo se dise単a una base de datos que pueda ingerir m叩s de cuatro millones de ...
C坦mo se dise単a una base de datos que pueda ingerir m叩s de cuatro millones de ...
javier ramirez
QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728
javier ramirez
Processing and analysing streaming data with Python. Pycon Italy 2022
Processing and analysing streaming  data with Python. Pycon Italy 2022Processing and analysing streaming  data with Python. Pycon Italy 2022
Processing and analysing streaming data with Python. Pycon Italy 2022
javier ramirez
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
javier ramirez
Servicios e infraestructura de AWS y la pr坦xima regi坦n en Arag坦nServicios e infraestructura de AWS y la pr坦xima regi坦n en Arag坦n
Servicios e infraestructura de AWS y la pr坦xima regi坦n en Arag坦n
javier ramirez
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverless
javier ramirez
How AWS is reinventing the cloud
How AWS is reinventing the cloudHow AWS is reinventing the cloud
How AWS is reinventing the cloud
javier ramirez
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
javier ramirez
Getting started with streaming analytics
Getting started with streaming analyticsGetting started with streaming analytics
Getting started with streaming analytics
javier ramirez

Recently uploaded (20)

22 Nov RECSA AFRICA REGIONAL SECURITY ANALYSIS.pptx
22 Nov RECSA AFRICA REGIONAL SECURITY ANALYSIS.pptx22 Nov RECSA AFRICA REGIONAL SECURITY ANALYSIS.pptx
22 Nov RECSA AFRICA REGIONAL SECURITY ANALYSIS.pptx
Edward252793
Forecasting in AWS - 2025-01-25
Forecasting in AWS - 2025-01-25Forecasting in AWS - 2025-01-25
Forecasting in AWS - 2025-01-25
Alessandra Bilardi
data-analysis lectures for students - begginer level
data-analysis lectures for students - begginer leveldata-analysis lectures for students - begginer level
data-analysis lectures for students - begginer level
gemdimash
Digital Marketing Canvas for Charlotte Hornets
Digital Marketing Canvas for Charlotte HornetsDigital Marketing Canvas for Charlotte Hornets
Digital Marketing Canvas for Charlotte Hornets
DylanLee69
LITC-Living-in-the-IT-Era-for-CBA-Students.docx
LITC-Living-in-the-IT-Era-for-CBA-Students.docxLITC-Living-in-the-IT-Era-for-CBA-Students.docx
LITC-Living-in-the-IT-Era-for-CBA-Students.docx
JohnMark171
Analytics - SAP B2B_ Arava Santosh Kumar.pptx
Analytics  -  SAP B2B_ Arava Santosh Kumar.pptxAnalytics  -  SAP B2B_ Arava Santosh Kumar.pptx
Analytics - SAP B2B_ Arava Santosh Kumar.pptx
ARAVASANTOSHKUMAR1
Quantitative Presentation_Final.....pptx
Quantitative Presentation_Final.....pptxQuantitative Presentation_Final.....pptx
Quantitative Presentation_Final.....pptx
lenny lopez
Know Your Nation In Numbers myIndia-2006
Know Your Nation In Numbers myIndia-2006Know Your Nation In Numbers myIndia-2006
Know Your Nation In Numbers myIndia-2006
sahimbarc
sterategicinformationsystem-250329162230-1990dc92.pptx
sterategicinformationsystem-250329162230-1990dc92.pptxsterategicinformationsystem-250329162230-1990dc92.pptx
sterategicinformationsystem-250329162230-1990dc92.pptx
EliasHaile7
Scalable Data Analytics: Technologies and Methods
Scalable Data Analytics: Technologies and MethodsScalable Data Analytics: Technologies and Methods
Scalable Data Analytics: Technologies and Methods
hoisala6sludger
SAP-Innovation-2025-Pitch-Deck- _Final _ Arava Santosh Kumar _New.pptx
SAP-Innovation-2025-Pitch-Deck- _Final _ Arava Santosh Kumar _New.pptxSAP-Innovation-2025-Pitch-Deck- _Final _ Arava Santosh Kumar _New.pptx
SAP-Innovation-2025-Pitch-Deck- _Final _ Arava Santosh Kumar _New.pptx
ARAVASANTOSHKUMAR1
High-Paying Data Analytics Opportunities in Jaipur and Boost Your Career.pdf
High-Paying Data Analytics Opportunities in Jaipur and Boost Your Career.pdfHigh-Paying Data Analytics Opportunities in Jaipur and Boost Your Career.pdf
High-Paying Data Analytics Opportunities in Jaipur and Boost Your Career.pdf
vinay salarite
Agile Infinity: When the Customer Is an Abstract Concept
Agile Infinity: When the Customer Is an Abstract ConceptAgile Infinity: When the Customer Is an Abstract Concept
Agile Infinity: When the Customer Is an Abstract Concept
Loic Merckel
Employee data login and attendance for region
Employee data login and attendance for regionEmployee data login and attendance for region
Employee data login and attendance for region
nagom47355
B06 - Unit 05 Heroes - Lesson A - Ss.pdf
B06 - Unit 05 Heroes - Lesson A - Ss.pdfB06 - Unit 05 Heroes - Lesson A - Ss.pdf
B06 - Unit 05 Heroes - Lesson A - Ss.pdf
pomaliameza
Human-ai Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Human-ai Collaboration: Balancing Agentic AI and Autonomy in Hybrid SystemsHuman-ai Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Human-ai Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
ijccsa
How can Competitive Intelligence Platforms benefit a Business?
How can Competitive Intelligence Platforms benefit a Business?How can Competitive Intelligence Platforms benefit a Business?
How can Competitive Intelligence Platforms benefit a Business?
Contify
OPPOTUS - Malaysias on Malaysia 4Q2024.pdf
OPPOTUS - Malaysias on Malaysia 4Q2024.pdfOPPOTUS - Malaysias on Malaysia 4Q2024.pdf
OPPOTUS - Malaysias on Malaysia 4Q2024.pdf
Oppotus
Dynamic-Data-Visualization-Dashboard.pptx
Dynamic-Data-Visualization-Dashboard.pptxDynamic-Data-Visualization-Dashboard.pptx
Dynamic-Data-Visualization-Dashboard.pptx
bammidigovinda108
presentacion early classification on MTS.pdf
presentacion  early classification on MTS.pdfpresentacion  early classification on MTS.pdf
presentacion early classification on MTS.pdf
faiber13
22 Nov RECSA AFRICA REGIONAL SECURITY ANALYSIS.pptx
22 Nov RECSA AFRICA REGIONAL SECURITY ANALYSIS.pptx22 Nov RECSA AFRICA REGIONAL SECURITY ANALYSIS.pptx
22 Nov RECSA AFRICA REGIONAL SECURITY ANALYSIS.pptx
Edward252793
Forecasting in AWS - 2025-01-25
Forecasting in AWS - 2025-01-25Forecasting in AWS - 2025-01-25
Forecasting in AWS - 2025-01-25
Alessandra Bilardi
data-analysis lectures for students - begginer level
data-analysis lectures for students - begginer leveldata-analysis lectures for students - begginer level
data-analysis lectures for students - begginer level
gemdimash
Digital Marketing Canvas for Charlotte Hornets
Digital Marketing Canvas for Charlotte HornetsDigital Marketing Canvas for Charlotte Hornets
Digital Marketing Canvas for Charlotte Hornets
DylanLee69
LITC-Living-in-the-IT-Era-for-CBA-Students.docx
LITC-Living-in-the-IT-Era-for-CBA-Students.docxLITC-Living-in-the-IT-Era-for-CBA-Students.docx
LITC-Living-in-the-IT-Era-for-CBA-Students.docx
JohnMark171
Analytics - SAP B2B_ Arava Santosh Kumar.pptx
Analytics  -  SAP B2B_ Arava Santosh Kumar.pptxAnalytics  -  SAP B2B_ Arava Santosh Kumar.pptx
Analytics - SAP B2B_ Arava Santosh Kumar.pptx
ARAVASANTOSHKUMAR1
Quantitative Presentation_Final.....pptx
Quantitative Presentation_Final.....pptxQuantitative Presentation_Final.....pptx
Quantitative Presentation_Final.....pptx
lenny lopez
Know Your Nation In Numbers myIndia-2006
Know Your Nation In Numbers myIndia-2006Know Your Nation In Numbers myIndia-2006
Know Your Nation In Numbers myIndia-2006
sahimbarc
sterategicinformationsystem-250329162230-1990dc92.pptx
sterategicinformationsystem-250329162230-1990dc92.pptxsterategicinformationsystem-250329162230-1990dc92.pptx
sterategicinformationsystem-250329162230-1990dc92.pptx
EliasHaile7
Scalable Data Analytics: Technologies and Methods
Scalable Data Analytics: Technologies and MethodsScalable Data Analytics: Technologies and Methods
Scalable Data Analytics: Technologies and Methods
hoisala6sludger
SAP-Innovation-2025-Pitch-Deck- _Final _ Arava Santosh Kumar _New.pptx
SAP-Innovation-2025-Pitch-Deck- _Final _ Arava Santosh Kumar _New.pptxSAP-Innovation-2025-Pitch-Deck- _Final _ Arava Santosh Kumar _New.pptx
SAP-Innovation-2025-Pitch-Deck- _Final _ Arava Santosh Kumar _New.pptx
ARAVASANTOSHKUMAR1
High-Paying Data Analytics Opportunities in Jaipur and Boost Your Career.pdf
High-Paying Data Analytics Opportunities in Jaipur and Boost Your Career.pdfHigh-Paying Data Analytics Opportunities in Jaipur and Boost Your Career.pdf
High-Paying Data Analytics Opportunities in Jaipur and Boost Your Career.pdf
vinay salarite
Agile Infinity: When the Customer Is an Abstract Concept
Agile Infinity: When the Customer Is an Abstract ConceptAgile Infinity: When the Customer Is an Abstract Concept
Agile Infinity: When the Customer Is an Abstract Concept
Loic Merckel
Employee data login and attendance for region
Employee data login and attendance for regionEmployee data login and attendance for region
Employee data login and attendance for region
nagom47355
B06 - Unit 05 Heroes - Lesson A - Ss.pdf
B06 - Unit 05 Heroes - Lesson A - Ss.pdfB06 - Unit 05 Heroes - Lesson A - Ss.pdf
B06 - Unit 05 Heroes - Lesson A - Ss.pdf
pomaliameza
Human-ai Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Human-ai Collaboration: Balancing Agentic AI and Autonomy in Hybrid SystemsHuman-ai Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Human-ai Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
ijccsa
How can Competitive Intelligence Platforms benefit a Business?
How can Competitive Intelligence Platforms benefit a Business?How can Competitive Intelligence Platforms benefit a Business?
How can Competitive Intelligence Platforms benefit a Business?
Contify
OPPOTUS - Malaysias on Malaysia 4Q2024.pdf
OPPOTUS - Malaysias on Malaysia 4Q2024.pdfOPPOTUS - Malaysias on Malaysia 4Q2024.pdf
OPPOTUS - Malaysias on Malaysia 4Q2024.pdf
Oppotus
Dynamic-Data-Visualization-Dashboard.pptx
Dynamic-Data-Visualization-Dashboard.pptxDynamic-Data-Visualization-Dashboard.pptx
Dynamic-Data-Visualization-Dashboard.pptx
bammidigovinda108
presentacion early classification on MTS.pdf
presentacion  early classification on MTS.pdfpresentacion  early classification on MTS.pdf
presentacion early classification on MTS.pdf
faiber13

Getting started with streaming analytics: Deep Dive

  • 1. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Part 3 of 3: Deep Dive Getting started with streaming analytics Javier Ramirez AWS Developer Advocate @supercoco9
  • 2. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Agenda Working with Avro and schemas Time windows, session windows, and accumulators Complex Event Processing Streaming analytics with SQL
  • 3. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Ingestion/in-stream storage: Apache Kafka A distributed streaming platform Concepts: Producers Topics Brokers Consumers
  • 4. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Apache Avro Apache Avro is a data serialization system. Avro provides: Rich data structures. A compact, fast, binary data format. A container file, to store persistent data. Remote procedure call (RPC). Simple integration with dynamic languages. Code generation is not required
  • 5. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Confluent Schema Registry https://docs.confluent.io/current/schema-registry/index.html
  • 6. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Stream Processing: Apache Flink Stateful computation over Data Streams Concepts: Job Manager/Workers Source DataStream Transforms/Operators TableAPI/SQL Sinks
  • 7. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Different types of streaming windows 13:00 14:008:00 9:00 10:00 11:00 12:00 Processing Time Event Time Processing Time 11:0010:00 15:0014:0013:0012:00 11:0010:00 15:0014:0013:0012:00 Input Output Event Time Processing Time 11:0 0 10:0 0 15:0 0 14:0 0 13:0 0 12:0 0 11:0 0 10:0 0 15:0 0 14:0 0 13:0 0 12:0 0 Input Output
  • 8. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Apache Flink Windows https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html
  • 9. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential FlinkCEP Complex Event Processing for Flink FlinkCEP is the Complex Event Processing (CEP) library implemented on top of Flink. It allows you to detect event patterns in an endless stream of events, giving you the opportunity to get hold of whats important in your data.
  • 10. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Kinesis Data Analytics for SQL Sub-second end to end processing latencies SQL steps can be chained together in serial or parallel steps Build applications with one or hundreds of queries Pre-built functions include everything from sum and count distinct to machine learning algorithms Aggregations run continuously using window operators
  • 11. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Automatic schema discovery which works for CSV and JSON data Supports multiple event types, arbitrary object nesting, single level of array nesting Connect to Streaming Data Sources Easily connect to Kinesis Data streams and Kinesis Data Firehose delivery streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose
  • 12. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Built-in AWS Lambda integration provides flexible pre- processing ahead of SQL code for: Normalizing different event types Converting other data formats (AVRO, Protobuf, ZIP) to JSON and CSV Custom enrichment from database tables or API calls Pre-process Data Streams Using AWS Lambda AWS Lambda function raw data Amazon Kinesis Data Analytics application transformed data SQL code source destination
  • 13. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Fast, iterative development with SQL templates in console to get started Interactive SQL Editor
  • 14. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Window Types Sliding, tumbling, and custom windows Tumbling windows are fixed size and grouped keys do not overlap
  • 15. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Add a SQL table to your streaming application from Amazon S3 Periodically update the table by calling the update application API Enrich your Data Stream using Amazon S3 Data In-application stream Amazon Kinesis Data Analytics application SQL code joining table and stream streaming source destination Amazon S3 In-application table
  • 16. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Getting Started https://ci.apache.org/projects/flink/flink-docs-stable/ Apache Flink official docs https://docs.aws.amazon.com/msk/latest/developerguide/what-is-msk.html Getting started with Apache Kafka/Amazon MSK https://aws.amazon.com/kinesis/ Amazon Kinesis Services for streaming data https://aws.amazon.com/elasticsearch-service/ Amazon ElasticSearch Service https://kafka.apache.org/documentation/ Apache Kafka official docs
  • 17. 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark 息 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential ThanksJavier Ramirez AWS Developer Advocate @supercoco9