Table of Contents
1. Abstract 1
2. Architecture 1
3. Tools 1-2
4. Configurations 2-7
5. Code Snippets
6. Screenshots
7. References
Table of Figures
1 Log Analysis using Kafka Streaming 8
2 Log Analysis Web Page with Statistics
3 Top Endpoints
4 Frequent IP Addresses
5 Frequent IP Addresses Last Window
6 Spark Environment
7 Spark Jobs triggered during execution
8 RDD Storage
9 Streaming Statistics
10 Streaming Statistics after burst input
This project aims at Analyzing the logs being streamed into spark using Kafka. This project has
an interactive Web Page to show log analysis of number of logs being streamed all time and
Last time window, response code counts, frequent IP Addresses and top-endpoints based on
request frequency.
Design and architecture:
Figure 1: Log Analysis using Kafka Streaming
Tools Used:
¢ Scala 2.10
¢ Java 8
¢ Apache Spark 1.5.2
¢ Apache Kafka 2.10.-
¢ Ubuntu Linux Server
Setting up a Multi - broker Kafka Cluster :
Start ZooKeeper
Kafka ships with a reasonable default ZooKeeper configuration for our simple use case. The
following command launches a local ZooKeeper instance.
bin/zookeeper-server-start.sh config/zookeeper.
Note : By default the ZooKeeper server will listen on *:2181/tcp.
Configure and start the Kafka brokers
We will create 2 Kafka brokers, whose configurations are based on the default
config/server.properties. Apart from the settings below the configurations of the brokers are
The first broker:
Create the config file for broker 1
cp config/server.properties config/server1.properties
Edit config/server1.properties and replace the existing config values as follows:
The second broker:
Create the config file for broker 2
cp config/server.properties config/server2.properties
Edit config/server2.properties and replace the existing config values as follows:
Now you can start each Kafka broker in a separate console:
Start first broker in its own terminal session:
bin/kafka-server-start.sh config/server1.properties
Start second broker in its own terminal session:
bin/kafka-server-start.sh config/server2.properties
Create a Kafka topic :
bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic topicOne --partitions 3 --
replication-factor 2
Start Zookeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties
Start Kafka Server (Broker):
bin/kafka-server-start.sh config/server.properties
Create topics:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --
topic topicOne
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --
topic topicTwo
Start Producer:
bin/kafka-console-producer.sh --broker-list localhost:9091,localhost:9092 --topic topicOne
bin/kafka-console-producer.sh --broker-list localhost:9091,localhost:9092 --topic topicTwo
Spark Command to execute KafkaLogAnalyzerApplication :
Note :Locate jar file as per project hierarchy
bin/spark-submit --class "com.cs696.bigdata.loganalyzer.KafkaLogAnalyzerApplication" --
master local[20] projectFinal/app/java8/target/uber-log-analysis-1.0.jar --output_html_file
Code Snippets:
Integrating Apache Kafka with Log Analyzer Application:
//We stream in the logs through Apache Kafka using multiple brokers which will be configured in the
//producer.properties file under config directory
HashSet<String> topicsSet = new
HashMap<String, String> kafkaParams = new HashMap<String, String>();
// Create Pair Input DStream kafka stream with brokers and topics
JavaPairInputDStream<String, String> logRecords = KafkaUtils.createDirectStream(
Screen Shots:
Figure 2: Log Analysis Web Page with Statistics
Figure 3: Top Endpoints
Figure 4: Frequent IP Addresses
Figure 5: Frequent IP Addresses Last Window
Figure 6: Spark Environment
Figure 7: Spark Jobs triggered during execution
Figure 8: RDD Storage
Figure 9: Streaming Statistics
Figure 10: Streaming Statistics after burst input
¢ http://spark.apache.org/docs/latest/streaming-kafka-integration.html
¢ http://www.michael-noll.com/blog/2013/03/13/running-a-multi-broker-apache-kafka-
¢ https://databricks.gitbooks.io/databricks-spark-reference-

