際際滷

際際滷Share a Scribd company logo
CC 2.0 by Per Olesen | http://鍖ic.kr/p/7pVCgZ
CC 2.0 by Franck BLAIS | http://鍖ic.kr/p/cwVnSy
CC 2.0 by John Steven Fernandez | http://鍖ic.kr/p/a8uTzz
CC 2.0 by Ian Carroll | http://鍖ic.kr/p/6NWoGm
CC 2.0 by Perry French | http://鍖ic.kr/p/8wDMJS
CC 2.0 by John Mitchell | http://鍖ic.kr/p/5UaPg8
珂辰姻噛
                                              8,
                                             2013


Before we started designing a blueprint       7
solution we 鍖rst of all asked ourselves:

1 Who would be asked to answer
   questions like this?
2 Who is this person?
3 What tools does this person expect to
   use?
4 And what is a typical skill set of this
   person?
5 How do they work?

Preparation

How do we answer these questions?
珂辰姻噛
                                                                      8,
                                                                     2013



From a high level of abstraction the                                 8

answer is simple. We need a data
management system with three pieces:
ingest, store and process.


          Data                   Data          Data        Data
         Source                Ingestion      Storage   Processing




Traditional Data Management System Approach


So, how do we answer these questions as a
珂辰姻噛
                                                                            8,
                                                                           2013

We take this basis architecture and replace the                            9
generic terms while mapping it onto the Hadoop
ecosystem.

         Data                                                  HIVE,
        Source                 Flume                 HDFS     Impala


                                                            BI/Analysis/
                                                             Reporting


With this Hadoop architecture a Data Scientist
should be able to answer the questions without any
programming environment. He/she can also use
familiar BI, analysis and reporting tools as well.


Blueprint for a Data Management System with Hadoop

So, how do we answer these questions as a
珂辰姻噛
                                                                  8,
                                                                 2013

1      2 WiFi access points to simulate two di鍖erent stores     10
        with OpenWRT, a linux based 鍖rmware for routers,
        installed
2      Flume to move all log messages to HDFS, without any
        manual intervention (no transformation, no 鍖ltering)
3      A 4 node CDH4 cluster
4      Pentaho Data Integrations graphical designer for data
        transformation, parsing, 鍖ltering and loading to the
        warehouse
5      Hive as data warehouse system on top of Hadoop to
        project structure onto data
6      Impala for querying data from Hive in real time
7      Tool to visualize results

Setup

Ingrediants
CC 2.0 by Qi Wei Fong | http://鍖ic.kr/p/7w8vfq
珂辰姻噛
                                                                            8,
                                                                           2013

The plot indicates that about 85% of the visits were detected in store     12
number one and about 15% in store number two. One might draw the
conclusion that store number one is in a much better location with more
occasional customers.




But lets gain more insights by analysing the number of unique visitors.




Analysis Result

Visits for stores number one & two
珂辰姻噛
                                                                     8,
                                                                    2013

This plot gives us more details about the customers. It turns out   13
that the 135 visits in store number one were caused by just 9
unique visitors while store number two encountered 5 unique
visitors.




Analysis Result

Unique visitors
珂辰姻噛
                                                                             8,
                                                                            2013

This plot indicates that we have more returning than new users in both      14
stores. In store number two we didnt see a new user over the past 4 days
at all.




Its probably a good idea to start a marketing campaign which aims at
new customers, e.g. to give out vouchers for the 鍖rst purchase.


Analysis Result

New vs. returning users
珂辰姻噛
                                                                           8,
                                                                          2013

The plot for the last 4 days vividly visualizes that the visit duration   15
in store number one was evenly distributed while the distribution
in store number two shows some peaks.




We can also see that visitors tend to stay in shop number one
much longer.


Analysis Result

Visit duration over the past 4 days
珂辰姻噛
                                                              8,
                                                             2013

There is a lot of useful information that can be derived     16
from this plot.




1. There is a repeating pattern of step-ins and step-outs
    within a short period of time.
2. There was a step-out of store number one and a step-in
    into store number two within just 28 seconds.

Analysis Result

Avg. Duration Between Visits of one particular user
珂辰
                                                       rz	
 
                                                       8,	
 
                                                       201
                                                       3	
 




CC 2.0 by Aurelien Guichard | http://鍖ic.kr/p/cjg9yw
珂辰姻噛
                                          8,
                                         2013



1 Presentation, Video and Post Series   18

   ≒ http://bit.ly/YgtIMK
2 http://sentric.ch
3 http://www.bigdata-usergroup.ch
4 http://about.me/jpkoenig




Links

More Related Content

WMFRA # 46: Case Study - In-Store Analysis

  • 1. CC 2.0 by Per Olesen | http://鍖ic.kr/p/7pVCgZ
  • 2. CC 2.0 by Franck BLAIS | http://鍖ic.kr/p/cwVnSy
  • 3. CC 2.0 by John Steven Fernandez | http://鍖ic.kr/p/a8uTzz
  • 4. CC 2.0 by Ian Carroll | http://鍖ic.kr/p/6NWoGm
  • 5. CC 2.0 by Perry French | http://鍖ic.kr/p/8wDMJS
  • 6. CC 2.0 by John Mitchell | http://鍖ic.kr/p/5UaPg8
  • 7. 珂辰姻噛 8, 2013 Before we started designing a blueprint 7 solution we 鍖rst of all asked ourselves: 1 Who would be asked to answer questions like this? 2 Who is this person? 3 What tools does this person expect to use? 4 And what is a typical skill set of this person? 5 How do they work? Preparation How do we answer these questions?
  • 8. 珂辰姻噛 8, 2013 From a high level of abstraction the 8 answer is simple. We need a data management system with three pieces: ingest, store and process. Data Data Data Data Source Ingestion Storage Processing Traditional Data Management System Approach So, how do we answer these questions as a
  • 9. 珂辰姻噛 8, 2013 We take this basis architecture and replace the 9 generic terms while mapping it onto the Hadoop ecosystem. Data HIVE, Source Flume HDFS Impala BI/Analysis/ Reporting With this Hadoop architecture a Data Scientist should be able to answer the questions without any programming environment. He/she can also use familiar BI, analysis and reporting tools as well. Blueprint for a Data Management System with Hadoop So, how do we answer these questions as a
  • 10. 珂辰姻噛 8, 2013 1 2 WiFi access points to simulate two di鍖erent stores 10 with OpenWRT, a linux based 鍖rmware for routers, installed 2 Flume to move all log messages to HDFS, without any manual intervention (no transformation, no 鍖ltering) 3 A 4 node CDH4 cluster 4 Pentaho Data Integrations graphical designer for data transformation, parsing, 鍖ltering and loading to the warehouse 5 Hive as data warehouse system on top of Hadoop to project structure onto data 6 Impala for querying data from Hive in real time 7 Tool to visualize results Setup Ingrediants
  • 11. CC 2.0 by Qi Wei Fong | http://鍖ic.kr/p/7w8vfq
  • 12. 珂辰姻噛 8, 2013 The plot indicates that about 85% of the visits were detected in store 12 number one and about 15% in store number two. One might draw the conclusion that store number one is in a much better location with more occasional customers. But lets gain more insights by analysing the number of unique visitors. Analysis Result Visits for stores number one & two
  • 13. 珂辰姻噛 8, 2013 This plot gives us more details about the customers. It turns out 13 that the 135 visits in store number one were caused by just 9 unique visitors while store number two encountered 5 unique visitors. Analysis Result Unique visitors
  • 14. 珂辰姻噛 8, 2013 This plot indicates that we have more returning than new users in both 14 stores. In store number two we didnt see a new user over the past 4 days at all. Its probably a good idea to start a marketing campaign which aims at new customers, e.g. to give out vouchers for the 鍖rst purchase. Analysis Result New vs. returning users
  • 15. 珂辰姻噛 8, 2013 The plot for the last 4 days vividly visualizes that the visit duration 15 in store number one was evenly distributed while the distribution in store number two shows some peaks. We can also see that visitors tend to stay in shop number one much longer. Analysis Result Visit duration over the past 4 days
  • 16. 珂辰姻噛 8, 2013 There is a lot of useful information that can be derived 16 from this plot. 1. There is a repeating pattern of step-ins and step-outs within a short period of time. 2. There was a step-out of store number one and a step-in into store number two within just 28 seconds. Analysis Result Avg. Duration Between Visits of one particular user
  • 17. 珂辰 rz 8, 201 3 CC 2.0 by Aurelien Guichard | http://鍖ic.kr/p/cjg9yw
  • 18. 珂辰姻噛 8, 2013 1 Presentation, Video and Post Series 18 ≒ http://bit.ly/YgtIMK 2 http://sentric.ch 3 http://www.bigdata-usergroup.ch 4 http://about.me/jpkoenig Links