�ݺ�ߣ

CC 2.0 by Per Olesen | http://ﬂic.kr/p/7pVCgZ

CC 2.0 by Franck BLAIS | http://ﬂic.kr/p/cwVnSy

CC 2.0 by John Steven Fernandez | http://ﬂic.kr/p/a8uTzz

CC 2.0 by Ian Carroll | http://ﬂic.kr/p/6NWoGm

CC 2.0 by Perry French | http://ﬂic.kr/p/8wDMJS

CC 2.0 by John Mitchell | http://ﬂic.kr/p/5UaPg8

��ä��
8,
2013

Before we started designing a blueprint 7
solution we ﬁrst of all asked ourselves:

1  Who would be asked to answer
questions like this?
2  Who is this person?
3  What tools does this person expect to
use?
4  And what is a typical skill set of this
person?
5  How do they work?

Preparation

How do we answer these questions?

��ä��
8,
2013

From a high level of abstraction the 8

answer is simple. We need a data
management system with three pieces:
ingest, store and process.

Data Data Data Data
Source Ingestion Storage Processing

Traditional Data Management System Approach

So, how do we answer these questions as a

��ä��
8,
2013

We take this basis architecture and replace the 9
generic terms while mapping it onto the Hadoop
ecosystem.

Data HIVE,
Source Flume HDFS Impala

BI/Analysis/
Reporting

With this Hadoop architecture a Data Scientist
should be able to answer the questions without any
programming environment. He/she can also use
familiar BI, analysis and reporting tools as well.

Blueprint for a Data Management System with Hadoop

So, how do we answer these questions as a

��ä��
8,
2013

1  2 WiFi access points to simulate two different stores 10
with OpenWRT, a linux based firmware for routers,
installed
2  Flume to move all log messages to HDFS, without any
manual intervention (no transformation, no filtering)
3  A 4 node CDH4 cluster
4  Pentaho Data Integration‘s graphical designer for data
transformation, parsing, filtering and loading to the
warehouse
5  Hive as data warehouse system on top of Hadoop to
project structure onto data
6  Impala for querying data from Hive in real time
7  Tool to visualize results

Setup

Ingrediants

CC 2.0 by Qi Wei Fong | http://ﬂic.kr/p/7w8vfq

��ä��
8,
2013

The plot indicates that about 85% of the visits were detected in store 12
number one and about 15% in store number two. One might draw the
conclusion that store number one is in a much better location with more
occasional customers.

But let’s gain more insights by analysing the number of unique visitors.

Analysis Result

Visits for stores number one & two

��ä��
8,
2013

This plot gives us more details about the customers. It turns out 13
that the 135 visits in store number one were caused by just 9
unique visitors while store number two encountered 5 unique
visitors.

Analysis Result

Unique visitors

��ä��
8,
2013

This plot indicates that we have more returning than new users in both 14
stores. In store number two we didn’t see a new user over the past 4 days
at all.

It’s probably a good idea to start a marketing campaign which aims at
new customers, e.g. to give out vouchers for the ﬁrst purchase.

Analysis Result

New vs. returning users

��ä��
8,
2013

The plot for the last 4 days vividly visualizes that the visit duration 15
in store number one was evenly distributed while the distribution
in store number two shows some peaks.

We can also see that visitors tend to stay in shop number one
much longer.

Analysis Result

Visit duration over the past 4 days

��ä��
8,
2013

There is a lot of useful information that can be derived 16
from this plot.

1.  There is a repeating pattern of step-ins and step-outs
within a short period of time.
2.  There was a step-out of store number one and a step-in
into store number two within just 28 seconds.

Analysis Result

Avg. Duration Between Visits of one particular user

��ä
rz

8,

201
3

CC 2.0 by Aurelien Guichard | http://ﬂic.kr/p/cjg9yw

��ä��
8,
2013

1  Presentation, Video and Post Series 18

•  http://bit.ly/YgtIMK
2  http://sentric.ch
3  http://www.bigdata-usergroup.ch
4  http://about.me/jpkoenig

Links

�ݺ�ߣ

WMFRA # 46: Case Study - In-Store Analysis

More Related Content

WMFRA # 46: Case Study - In-Store Analysis