際際滷

際際滷Share a Scribd company logo
Table of Contents
1. Abstract 1
2. Architecture 1
3. Tools 1-2
4. Configurations 2-7
5. Code Snippets
6. Screenshots
7. References
Table of Figures
1 Log Analysis using Kafka Streaming 8
2 Log Analysis Web Page with Statistics
3 Top Endpoints
4 Frequent IP Addresses
5 Frequent IP Addresses Last Window
6 Spark Environment
7 Spark Jobs triggered during execution
8 RDD Storage
9 Streaming Statistics
10 Streaming Statistics after burst input
1
Abstract
This project aims at Analyzing the logs being streamed into spark using Kafka. This project has
an interactive Web Page to show log analysis of number of logs being streamed all time and
Last time window, response code counts, frequent IP Addresses and top-endpoints based on
request frequency.
Design and architecture:
Figure 1: Log Analysis using Kafka Streaming
Tools Used:
¢ Scala 2.10
¢ Java 8
¢ Apache Spark 1.5.2
¢ Apache Kafka 2.10.-0.8.2.0
¢ Ubuntu Linux Server
2
Configurations:
Setting up a Multi - broker Kafka Cluster :
Start ZooKeeper
Kafka ships with a reasonable default ZooKeeper configuration for our simple use case. The
following command launches a local ZooKeeper instance.
bin/zookeeper-server-start.sh config/zookeeper.
Note : By default the ZooKeeper server will listen on *:2181/tcp.
Configure and start the Kafka brokers
We will create 2 Kafka brokers, whose configurations are based on the default
config/server.properties. Apart from the settings below the configurations of the brokers are
identical.
The first broker:
Create the config file for broker 1
cp config/server.properties config/server1.properties
Edit config/server1.properties and replace the existing config values as follows:
broker.id=1
port=9092
log.dir=/tmp/kafka-logs-1
3
The second broker:
Create the config file for broker 2
cp config/server.properties config/server2.properties
Edit config/server2.properties and replace the existing config values as follows:
broker.id=2
port=9093
log.dir=/tmp/kafka-logs-2
Now you can start each Kafka broker in a separate console:
Start first broker in its own terminal session:
bin/kafka-server-start.sh config/server1.properties
Start second broker in its own terminal session:
bin/kafka-server-start.sh config/server2.properties
Create a Kafka topic :
bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic topicOne --partitions 3 --
replication-factor 2
4
Commands:
Start Zookeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties
Start Kafka Server (Broker):
bin/kafka-server-start.sh config/server.properties
Create topics:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --
topic topicOne
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --
topic topicTwo
Start Producer:
bin/kafka-console-producer.sh --broker-list localhost:9091,localhost:9092 --topic topicOne
bin/kafka-console-producer.sh --broker-list localhost:9091,localhost:9092 --topic topicTwo
Spark Command to execute KafkaLogAnalyzerApplication :
Note :Locate jar file as per project hierarchy
bin/spark-submit --class "com.cs696.bigdata.loganalyzer.KafkaLogAnalyzerApplication" --
master local[20] projectFinal/app/java8/target/uber-log-analysis-1.0.jar --output_html_file
/tmp/log_stats.html
5
Code Snippets:
Integrating Apache Kafka with Log Analyzer Application:
//We stream in the logs through Apache Kafka using multiple brokers which will be configured in the
//producer.properties file under config directory
HashSet<String> topicsSet = new
HashSet<String>(Arrays.asList(LogAnalyzerFlags.getInstance().getTopics().split(",")));
HashMap<String, String> kafkaParams = new HashMap<String, String>();
kafkaParams.put("metadata.broker.list",LogAnalyzerFlags.getInstance().getBrokers());
// Create Pair Input DStream kafka stream with brokers and topics
JavaPairInputDStream<String, String> logRecords = KafkaUtils.createDirectStream(
jssc,
String.class,
String.class,
StringDecoder.class,
StringDecoder.class,
kafkaParams,
topicsSet
);
6
Screen Shots:
Figure 2: Log Analysis Web Page with Statistics
Figure 3: Top Endpoints
7
Figure 4: Frequent IP Addresses
Figure 5: Frequent IP Addresses Last Window
8
Figure 6: Spark Environment
Figure 7: Spark Jobs triggered during execution
9
Figure 8: RDD Storage
Figure 9: Streaming Statistics
10
Figure 10: Streaming Statistics after burst input
References:
¢ http://spark.apache.org/docs/latest/streaming-kafka-integration.html
¢ http://www.michael-noll.com/blog/2013/03/13/running-a-multi-broker-apache-kafka-
cluster-on-a-single-node/
¢ https://databricks.gitbooks.io/databricks-spark-reference-
applications/content/logs_analyzer/chapter1/streaming.html

More Related Content

What's hot (20)

Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby
Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby
Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby
Michelle Antebi
?
Docker and Maestro for fun, development and profit
Docker and Maestro for fun, development and profitDocker and Maestro for fun, development and profit
Docker and Maestro for fun, development and profit
Maxime Petazzoni
?
IMPACT/myGrid Hackathon - Taverna Server as a Portal
IMPACT/myGrid Hackathon - Taverna Server as a PortalIMPACT/myGrid Hackathon - Taverna Server as a Portal
IMPACT/myGrid Hackathon - Taverna Server as a Portal
IMPACT Centre of Competence
?
Container Monitoring with Sysdig
Container Monitoring with SysdigContainer Monitoring with Sysdig
Container Monitoring with Sysdig
Sreenivas Makam
?
DCUS17 : Docker networking deep dive
DCUS17 : Docker networking deep diveDCUS17 : Docker networking deep dive
DCUS17 : Docker networking deep dive
Madhu Venugopal
?
OSMC 2021 | Icinga-Installer C the easy way to your Icinga
OSMC 2021 | Icinga-Installer C the easy way to your IcingaOSMC 2021 | Icinga-Installer C the easy way to your Icinga
OSMC 2021 | Icinga-Installer C the easy way to your Icinga
NETWAYS
?
Docker Networking & Swarm Mode Introduction
Docker Networking & Swarm Mode IntroductionDocker Networking & Swarm Mode Introduction
Docker Networking & Swarm Mode Introduction
Phi Huynh
?
High availability for puppet - 2016
High availability for puppet - 2016High availability for puppet - 2016
High availability for puppet - 2016
Zack Smith
?
Nginx Internals
Nginx InternalsNginx Internals
Nginx Internals
Joshua Zhu
?
Evolution of kube-proxy (Brussels, Fosdem 2020)
Evolution of kube-proxy (Brussels, Fosdem 2020)Evolution of kube-proxy (Brussels, Fosdem 2020)
Evolution of kube-proxy (Brussels, Fosdem 2020)
Laurent Bernaille
?
Capistrano 3 Deployment
Capistrano 3 DeploymentCapistrano 3 Deployment
Capistrano 3 Deployment
Creston Jamison
?
Docker cluster with swarm, consul, registrator and consul-template
Docker cluster with swarm, consul, registrator and consul-templateDocker cluster with swarm, consul, registrator and consul-template
Docker cluster with swarm, consul, registrator and consul-template
Julien Maitrehenry
?
DockerCoreNet
DockerCoreNetDockerCoreNet
DockerCoreNet
Eimantas ?labys
?
CNTUG x SDN Meetup #33 Talk 1: Cilium JR cgroup ebpf - Ruian
CNTUG x SDN Meetup #33  Talk 1:  Cilium JR cgroup ebpf - RuianCNTUG x SDN Meetup #33  Talk 1:  Cilium JR cgroup ebpf - Ruian
CNTUG x SDN Meetup #33 Talk 1: Cilium JR cgroup ebpf - Ruian
HanLing Shen
?
How to contribute Apache CloudStack
How to contribute Apache CloudStackHow to contribute Apache CloudStack
How to contribute Apache CloudStack
Go Chiba
?
Control your deployments with Capistrano
Control your deployments with CapistranoControl your deployments with Capistrano
Control your deployments with Capistrano
Ramazan K
?
Running at Scale: Practical Performance Tuning with Puppet - PuppetConf 2013
Running at Scale: Practical Performance Tuning with Puppet - PuppetConf 2013Running at Scale: Practical Performance Tuning with Puppet - PuppetConf 2013
Running at Scale: Practical Performance Tuning with Puppet - PuppetConf 2013
Puppet
?
[???????] EFK Stack ??? ?? ??
[???????] EFK Stack ??? ?? ??[???????] EFK Stack ??? ?? ??
[???????] EFK Stack ??? ?? ??
Open Source Consulting
?
SRX Automation at Groupon
SRX Automation at GrouponSRX Automation at Groupon
SRX Automation at Groupon
Alejandro Salinas
?
Arnold Bechtoldt, Inovex GmbH Linux systems engineer - Configuration Manageme...
Arnold Bechtoldt, Inovex GmbH Linux systems engineer - Configuration Manageme...Arnold Bechtoldt, Inovex GmbH Linux systems engineer - Configuration Manageme...
Arnold Bechtoldt, Inovex GmbH Linux systems engineer - Configuration Manageme...
SaltStack
?
Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby
Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby
Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby
Michelle Antebi
?
Docker and Maestro for fun, development and profit
Docker and Maestro for fun, development and profitDocker and Maestro for fun, development and profit
Docker and Maestro for fun, development and profit
Maxime Petazzoni
?
IMPACT/myGrid Hackathon - Taverna Server as a Portal
IMPACT/myGrid Hackathon - Taverna Server as a PortalIMPACT/myGrid Hackathon - Taverna Server as a Portal
IMPACT/myGrid Hackathon - Taverna Server as a Portal
IMPACT Centre of Competence
?
Container Monitoring with Sysdig
Container Monitoring with SysdigContainer Monitoring with Sysdig
Container Monitoring with Sysdig
Sreenivas Makam
?
DCUS17 : Docker networking deep dive
DCUS17 : Docker networking deep diveDCUS17 : Docker networking deep dive
DCUS17 : Docker networking deep dive
Madhu Venugopal
?
OSMC 2021 | Icinga-Installer C the easy way to your Icinga
OSMC 2021 | Icinga-Installer C the easy way to your IcingaOSMC 2021 | Icinga-Installer C the easy way to your Icinga
OSMC 2021 | Icinga-Installer C the easy way to your Icinga
NETWAYS
?
Docker Networking & Swarm Mode Introduction
Docker Networking & Swarm Mode IntroductionDocker Networking & Swarm Mode Introduction
Docker Networking & Swarm Mode Introduction
Phi Huynh
?
High availability for puppet - 2016
High availability for puppet - 2016High availability for puppet - 2016
High availability for puppet - 2016
Zack Smith
?
Evolution of kube-proxy (Brussels, Fosdem 2020)
Evolution of kube-proxy (Brussels, Fosdem 2020)Evolution of kube-proxy (Brussels, Fosdem 2020)
Evolution of kube-proxy (Brussels, Fosdem 2020)
Laurent Bernaille
?
Docker cluster with swarm, consul, registrator and consul-template
Docker cluster with swarm, consul, registrator and consul-templateDocker cluster with swarm, consul, registrator and consul-template
Docker cluster with swarm, consul, registrator and consul-template
Julien Maitrehenry
?
CNTUG x SDN Meetup #33 Talk 1: Cilium JR cgroup ebpf - Ruian
CNTUG x SDN Meetup #33  Talk 1:  Cilium JR cgroup ebpf - RuianCNTUG x SDN Meetup #33  Talk 1:  Cilium JR cgroup ebpf - Ruian
CNTUG x SDN Meetup #33 Talk 1: Cilium JR cgroup ebpf - Ruian
HanLing Shen
?
How to contribute Apache CloudStack
How to contribute Apache CloudStackHow to contribute Apache CloudStack
How to contribute Apache CloudStack
Go Chiba
?
Control your deployments with Capistrano
Control your deployments with CapistranoControl your deployments with Capistrano
Control your deployments with Capistrano
Ramazan K
?
Running at Scale: Practical Performance Tuning with Puppet - PuppetConf 2013
Running at Scale: Practical Performance Tuning with Puppet - PuppetConf 2013Running at Scale: Practical Performance Tuning with Puppet - PuppetConf 2013
Running at Scale: Practical Performance Tuning with Puppet - PuppetConf 2013
Puppet
?
Arnold Bechtoldt, Inovex GmbH Linux systems engineer - Configuration Manageme...
Arnold Bechtoldt, Inovex GmbH Linux systems engineer - Configuration Manageme...Arnold Bechtoldt, Inovex GmbH Linux systems engineer - Configuration Manageme...
Arnold Bechtoldt, Inovex GmbH Linux systems engineer - Configuration Manageme...
SaltStack
?

Similar to Final_Report_new (1) (20)

Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Lightbend
?
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
Abhinav Singh
?
Apache kafka configuration-guide
Apache kafka configuration-guideApache kafka configuration-guide
Apache kafka configuration-guide
Chetan Khatri
?
Scaling docker with kubernetes
Scaling docker with kubernetesScaling docker with kubernetes
Scaling docker with kubernetes
Liran Cohen
?
Training
TrainingTraining
Training
HemantDunga1
?
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
?
Integrating Apache Web Server with Tomcat Application Server
Integrating Apache Web Server with Tomcat Application ServerIntegrating Apache Web Server with Tomcat Application Server
Integrating Apache Web Server with Tomcat Application Server
webhostingguy
?
Integrating Apache Web Server with Tomcat Application Server
Integrating Apache Web Server with Tomcat Application ServerIntegrating Apache Web Server with Tomcat Application Server
Integrating Apache Web Server with Tomcat Application Server
webhostingguy
?
Publishing AwsLlambda Logs Into SplunkCloud
Publishing AwsLlambda Logs Into SplunkCloudPublishing AwsLlambda Logs Into SplunkCloud
Publishing AwsLlambda Logs Into SplunkCloud
varun kumar karuna
?
Lessons learned and challenges faced while running Kubernetes at Scale
Lessons learned and challenges faced while running Kubernetes at ScaleLessons learned and challenges faced while running Kubernetes at Scale
Lessons learned and challenges faced while running Kubernetes at Scale
Sidhartha Mani
?
Docker 1.11 Presentation
Docker 1.11 PresentationDocker 1.11 Presentation
Docker 1.11 Presentation
Sreenivas Makam
?
Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming Info
Doug Chang
?
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
?
Kafka Workshop
Kafka WorkshopKafka Workshop
Kafka Workshop
Alexandre Andr└
?
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Abdelhamide EL ARIB
?
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
confluent
?
DockerCon 2022 - From legacy to Kubernetes, securely & quickly
DockerCon 2022 - From legacy to Kubernetes, securely & quicklyDockerCon 2022 - From legacy to Kubernetes, securely & quickly
DockerCon 2022 - From legacy to Kubernetes, securely & quickly
Eric Smalling
?
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Samuel Kerrien
?
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdu??o a logs distrib...
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdu??o a logs distrib...TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdu??o a logs distrib...
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdu??o a logs distrib...
tdc-globalcode
?
Apache Spark on K8s and HDFS Security
Apache Spark on K8s and HDFS SecurityApache Spark on K8s and HDFS Security
Apache Spark on K8s and HDFS Security
Databricks
?
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Lightbend
?
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
Abhinav Singh
?
Apache kafka configuration-guide
Apache kafka configuration-guideApache kafka configuration-guide
Apache kafka configuration-guide
Chetan Khatri
?
Scaling docker with kubernetes
Scaling docker with kubernetesScaling docker with kubernetes
Scaling docker with kubernetes
Liran Cohen
?
Integrating Apache Web Server with Tomcat Application Server
Integrating Apache Web Server with Tomcat Application ServerIntegrating Apache Web Server with Tomcat Application Server
Integrating Apache Web Server with Tomcat Application Server
webhostingguy
?
Integrating Apache Web Server with Tomcat Application Server
Integrating Apache Web Server with Tomcat Application ServerIntegrating Apache Web Server with Tomcat Application Server
Integrating Apache Web Server with Tomcat Application Server
webhostingguy
?
Publishing AwsLlambda Logs Into SplunkCloud
Publishing AwsLlambda Logs Into SplunkCloudPublishing AwsLlambda Logs Into SplunkCloud
Publishing AwsLlambda Logs Into SplunkCloud
varun kumar karuna
?
Lessons learned and challenges faced while running Kubernetes at Scale
Lessons learned and challenges faced while running Kubernetes at ScaleLessons learned and challenges faced while running Kubernetes at Scale
Lessons learned and challenges faced while running Kubernetes at Scale
Sidhartha Mani
?
Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming Info
Doug Chang
?
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
?
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Abdelhamide EL ARIB
?
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
confluent
?
DockerCon 2022 - From legacy to Kubernetes, securely & quickly
DockerCon 2022 - From legacy to Kubernetes, securely & quicklyDockerCon 2022 - From legacy to Kubernetes, securely & quickly
DockerCon 2022 - From legacy to Kubernetes, securely & quickly
Eric Smalling
?
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Samuel Kerrien
?
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdu??o a logs distrib...
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdu??o a logs distrib...TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdu??o a logs distrib...
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdu??o a logs distrib...
tdc-globalcode
?
Apache Spark on K8s and HDFS Security
Apache Spark on K8s and HDFS SecurityApache Spark on K8s and HDFS Security
Apache Spark on K8s and HDFS Security
Databricks
?

More from Adarsh Burma (7)

Summary2 (1)
Summary2 (1)Summary2 (1)
Summary2 (1)
Adarsh Burma
?
Summary_Onset (1)
Summary_Onset (1)Summary_Onset (1)
Summary_Onset (1)
Adarsh Burma
?
unofficial_Transcript (1)
unofficial_Transcript (1)unofficial_Transcript (1)
unofficial_Transcript (1)
Adarsh Burma
?
Toss_Up
Toss_UpToss_Up
Toss_Up
Adarsh Burma
?
brf to mathml
brf to mathmlbrf to mathml
brf to mathml
Adarsh Burma
?
Academic_Projects
Academic_ProjectsAcademic_Projects
Academic_Projects
Adarsh Burma
?
Keste_Projects
Keste_ProjectsKeste_Projects
Keste_Projects
Adarsh Burma
?

Final_Report_new (1)

  • 1. Table of Contents 1. Abstract 1 2. Architecture 1 3. Tools 1-2 4. Configurations 2-7 5. Code Snippets 6. Screenshots 7. References Table of Figures 1 Log Analysis using Kafka Streaming 8 2 Log Analysis Web Page with Statistics 3 Top Endpoints 4 Frequent IP Addresses 5 Frequent IP Addresses Last Window 6 Spark Environment 7 Spark Jobs triggered during execution 8 RDD Storage 9 Streaming Statistics 10 Streaming Statistics after burst input
  • 2. 1 Abstract This project aims at Analyzing the logs being streamed into spark using Kafka. This project has an interactive Web Page to show log analysis of number of logs being streamed all time and Last time window, response code counts, frequent IP Addresses and top-endpoints based on request frequency. Design and architecture: Figure 1: Log Analysis using Kafka Streaming Tools Used: ¢ Scala 2.10 ¢ Java 8 ¢ Apache Spark 1.5.2 ¢ Apache Kafka 2.10.-0.8.2.0 ¢ Ubuntu Linux Server
  • 3. 2 Configurations: Setting up a Multi - broker Kafka Cluster : Start ZooKeeper Kafka ships with a reasonable default ZooKeeper configuration for our simple use case. The following command launches a local ZooKeeper instance. bin/zookeeper-server-start.sh config/zookeeper. Note : By default the ZooKeeper server will listen on *:2181/tcp. Configure and start the Kafka brokers We will create 2 Kafka brokers, whose configurations are based on the default config/server.properties. Apart from the settings below the configurations of the brokers are identical. The first broker: Create the config file for broker 1 cp config/server.properties config/server1.properties Edit config/server1.properties and replace the existing config values as follows: broker.id=1 port=9092 log.dir=/tmp/kafka-logs-1
  • 4. 3 The second broker: Create the config file for broker 2 cp config/server.properties config/server2.properties Edit config/server2.properties and replace the existing config values as follows: broker.id=2 port=9093 log.dir=/tmp/kafka-logs-2 Now you can start each Kafka broker in a separate console: Start first broker in its own terminal session: bin/kafka-server-start.sh config/server1.properties Start second broker in its own terminal session: bin/kafka-server-start.sh config/server2.properties Create a Kafka topic : bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic topicOne --partitions 3 -- replication-factor 2
  • 5. 4 Commands: Start Zookeeper: bin/zookeeper-server-start.sh config/zookeeper.properties Start Kafka Server (Broker): bin/kafka-server-start.sh config/server.properties Create topics: bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 -- topic topicOne bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 -- topic topicTwo Start Producer: bin/kafka-console-producer.sh --broker-list localhost:9091,localhost:9092 --topic topicOne bin/kafka-console-producer.sh --broker-list localhost:9091,localhost:9092 --topic topicTwo Spark Command to execute KafkaLogAnalyzerApplication : Note :Locate jar file as per project hierarchy bin/spark-submit --class "com.cs696.bigdata.loganalyzer.KafkaLogAnalyzerApplication" -- master local[20] projectFinal/app/java8/target/uber-log-analysis-1.0.jar --output_html_file /tmp/log_stats.html
  • 6. 5 Code Snippets: Integrating Apache Kafka with Log Analyzer Application: //We stream in the logs through Apache Kafka using multiple brokers which will be configured in the //producer.properties file under config directory HashSet<String> topicsSet = new HashSet<String>(Arrays.asList(LogAnalyzerFlags.getInstance().getTopics().split(","))); HashMap<String, String> kafkaParams = new HashMap<String, String>(); kafkaParams.put("metadata.broker.list",LogAnalyzerFlags.getInstance().getBrokers()); // Create Pair Input DStream kafka stream with brokers and topics JavaPairInputDStream<String, String> logRecords = KafkaUtils.createDirectStream( jssc, String.class, String.class, StringDecoder.class, StringDecoder.class, kafkaParams, topicsSet );
  • 7. 6 Screen Shots: Figure 2: Log Analysis Web Page with Statistics Figure 3: Top Endpoints
  • 8. 7 Figure 4: Frequent IP Addresses Figure 5: Frequent IP Addresses Last Window
  • 9. 8 Figure 6: Spark Environment Figure 7: Spark Jobs triggered during execution
  • 10. 9 Figure 8: RDD Storage Figure 9: Streaming Statistics
  • 11. 10 Figure 10: Streaming Statistics after burst input References: ¢ http://spark.apache.org/docs/latest/streaming-kafka-integration.html ¢ http://www.michael-noll.com/blog/2013/03/13/running-a-multi-broker-apache-kafka- cluster-on-a-single-node/ ¢ https://databricks.gitbooks.io/databricks-spark-reference- applications/content/logs_analyzer/chapter1/streaming.html