際際滷

際際滷Share a Scribd company logo
User Behaviour Tracking
Track - Store - Process
!
//Florian Pfeiffer - Head of Data&Infrastructure - gutefrage.net

!
Datageeks
Vision
Lets build our own Google
Analytics
Why

analytics does sampling
we want the (raw) data
Ideas,Thoughts&Goals
fast / minimal impact on page loading time
high availability
track user over multiple platforms
storage engine? -> hbase
Infrastructure
Numbers!
10-20ms Response Time per pixel
record for now: ~2500 concurrent reqs
1,5 billion entries in Hbase
10 Nodes in Hadoop Cluster
Serving Infrastructure

Loadbalancers & RR DNS
nginx with empty_gif module (~2ms)
data is written to log鍖le
Storing Infrastructure
every nginx node has 鍖ume-ng
鍖ume ingests log鍖le
AsyncHBaseSink with custom Serializer
direct writes to HBase
why 鍖ume?

we had it already in production ;)
Storm might be an interesting alternative
HBase rowkey design
Why?

You can scan through all data and use 鍖lters
for selecting speci鍖c data
But scanning with start & stop row speeds
things up (a lot)
HBase rowkey design
Do I need a fast user or a fast timespan
lookup?
User - clientid,ts<,connectionId>
Timespan - ts,clientid<,connectionId>
Inverse Timestamps
Data in HBase is stored lexicographicaly
sorted
Normal TS - scan would yield oldest results
鍖rst
Inverse TS - newer entries come 鍖rst (and you
can cancel the scan if you have enough data)
Cross Domain Tracking
(Flash)Cookies
Fingerprinting
Etag
HTML5 Storage
The olden times
or
Cookies

Easy to drop a 3rd party cookie with userId
on different websites
Gets more and more blocked (Safari, FF..)
Fingerprinting
Yields interesting results on desktop, dif鍖cult
on e.g. iPhone
invisible to user
Last resort if everything else fails?
Etag

Quite new, based on browser cache
sounds interesting
HTML5 Storage

Store data in local HTML5 storage
Retrieve data with Cross Domain Messaging
Store data

e.g. UserId, SessionId, GeoIP, URL, action,
data
Batch Processing

Calculate how many users are active on
platform A and also on B
Get Traf鍖c of all Questions belonging to
Channel X sorted by Country
Now to something
completely 糸庄韓韓艶姻艶稼岳
demo
Recommendations
with Myrrix
Myrrix

Evolution: taste -> mahout -> myrrix (-> oryx)
Recommender based on ALS
Recommendations @
GF.net
User emit signals on questions
view, like, gives answer, answer is voted best
Application sends signals through RabbitMQ
to recommendation servers
YEAH
but what happens, when a new user signs up?
?
Fetch data from tracking
and feed it into myrrix
Collecting&Storing data
works great
using & processing is another thing ;)
Datageeks

More Related Content

Similar to Datageeks (20)

UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
Christopher Curtin
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
Krug Fat Client
Krug Fat ClientKrug Fat Client
Krug Fat Client
Paul Klipp
Aerospike for machine learning
Aerospike for machine learningAerospike for machine learning
Aerospike for machine learning
Aerospike
HTML5 Technical Executive Summary
HTML5 Technical Executive SummaryHTML5 Technical Executive Summary
HTML5 Technical Executive Summary
Gilad Khen
Splunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk TrafficSplunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk Traffic
Splunk
Scaling 101
Scaling 101Scaling 101
Scaling 101
Chris Finne
Scaling 101 test
Scaling 101 testScaling 101 test
Scaling 101 test
Rashmi Sinha
Would Mr. Spok choose Open Source
Would Mr. Spok choose Open SourceWould Mr. Spok choose Open Source
Would Mr. Spok choose Open Source
vlcinsky
2011 06-30-hadoop-summit v5
2011 06-30-hadoop-summit v52011 06-30-hadoop-summit v5
2011 06-30-hadoop-summit v5
Samuel Rash
Html5 workshop part 1
Html5 workshop part 1Html5 workshop part 1
Html5 workshop part 1
NAILBITER
Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)
Amazon Web Services Korea
Html5
Html5Html5
Html5
Zahin Omar Alwa
Server Monitoring (Scaling while bootstrapped)
Server Monitoring  (Scaling while bootstrapped)Server Monitoring  (Scaling while bootstrapped)
Server Monitoring (Scaling while bootstrapped)
Ajibola Aiyedogbon
Making it fast: Zotonic & Performance
Making it fast: Zotonic & PerformanceMaking it fast: Zotonic & Performance
Making it fast: Zotonic & Performance
Arjan
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAchieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloads
Alluxio, Inc.
HTML5 - The Python Angle (PyCon Ireland 2010)
HTML5 - The Python Angle (PyCon Ireland 2010)HTML5 - The Python Angle (PyCon Ireland 2010)
HTML5 - The Python Angle (PyCon Ireland 2010)
Kevin Gill
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
TechYugadi IT Solutions & Consulting
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Alluxio, Inc.
Treasure Data and OSS
Treasure Data and OSSTreasure Data and OSS
Treasure Data and OSS
N Masahiro
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
Christopher Curtin
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
Krug Fat Client
Krug Fat ClientKrug Fat Client
Krug Fat Client
Paul Klipp
Aerospike for machine learning
Aerospike for machine learningAerospike for machine learning
Aerospike for machine learning
Aerospike
HTML5 Technical Executive Summary
HTML5 Technical Executive SummaryHTML5 Technical Executive Summary
HTML5 Technical Executive Summary
Gilad Khen
Splunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk TrafficSplunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk Traffic
Splunk
Scaling 101 test
Scaling 101 testScaling 101 test
Scaling 101 test
Rashmi Sinha
Would Mr. Spok choose Open Source
Would Mr. Spok choose Open SourceWould Mr. Spok choose Open Source
Would Mr. Spok choose Open Source
vlcinsky
2011 06-30-hadoop-summit v5
2011 06-30-hadoop-summit v52011 06-30-hadoop-summit v5
2011 06-30-hadoop-summit v5
Samuel Rash
Html5 workshop part 1
Html5 workshop part 1Html5 workshop part 1
Html5 workshop part 1
NAILBITER
Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)
Amazon Web Services Korea
Server Monitoring (Scaling while bootstrapped)
Server Monitoring  (Scaling while bootstrapped)Server Monitoring  (Scaling while bootstrapped)
Server Monitoring (Scaling while bootstrapped)
Ajibola Aiyedogbon
Making it fast: Zotonic & Performance
Making it fast: Zotonic & PerformanceMaking it fast: Zotonic & Performance
Making it fast: Zotonic & Performance
Arjan
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAchieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloads
Alluxio, Inc.
HTML5 - The Python Angle (PyCon Ireland 2010)
HTML5 - The Python Angle (PyCon Ireland 2010)HTML5 - The Python Angle (PyCon Ireland 2010)
HTML5 - The Python Angle (PyCon Ireland 2010)
Kevin Gill
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Alluxio, Inc.
Treasure Data and OSS
Treasure Data and OSSTreasure Data and OSS
Treasure Data and OSS
N Masahiro

Recently uploaded (20)

STRING FUNCTIONS IN JAVA BY N SARATH KUMAR
STRING FUNCTIONS IN JAVA BY N SARATH KUMARSTRING FUNCTIONS IN JAVA BY N SARATH KUMAR
STRING FUNCTIONS IN JAVA BY N SARATH KUMAR
Sarathkumar Narsupalli
Columbia Weather Systems - Product Overview
Columbia Weather Systems - Product OverviewColumbia Weather Systems - Product Overview
Columbia Weather Systems - Product Overview
Columbia Weather Systems
All-Data, Any-AI Integration: FME & Amazon Bedrock in the Real-World
All-Data, Any-AI Integration: FME & Amazon Bedrock in the Real-WorldAll-Data, Any-AI Integration: FME & Amazon Bedrock in the Real-World
All-Data, Any-AI Integration: FME & Amazon Bedrock in the Real-World
Safe Software
Transactional Outbox & Inbox Patterns.pptx
Transactional Outbox & Inbox Patterns.pptxTransactional Outbox & Inbox Patterns.pptx
Transactional Outbox & Inbox Patterns.pptx
Maysam Mousa
AI in Talent Acquisition: Boosting Hiring
AI in Talent Acquisition: Boosting HiringAI in Talent Acquisition: Boosting Hiring
AI in Talent Acquisition: Boosting Hiring
Beyond Chiefs
GDG Cloud Southlake #41: Shay Levi: Beyond the Hype:How Enterprises Are Using AI
GDG Cloud Southlake #41: Shay Levi: Beyond the Hype:How Enterprises Are Using AIGDG Cloud Southlake #41: Shay Levi: Beyond the Hype:How Enterprises Are Using AI
GDG Cloud Southlake #41: Shay Levi: Beyond the Hype:How Enterprises Are Using AI
James Anderson
Commit Conf 2025 Bitnami Charts with Kubescape
Commit Conf 2025 Bitnami Charts with KubescapeCommit Conf 2025 Bitnami Charts with Kubescape
Commit Conf 2025 Bitnami Charts with Kubescape
Alfredo Garc鱈a Lavilla
The Future of Materials: Transitioning from Silicon to Alternative Metals
The Future of Materials: Transitioning from Silicon to Alternative MetalsThe Future of Materials: Transitioning from Silicon to Alternative Metals
The Future of Materials: Transitioning from Silicon to Alternative Metals
anupriti
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
Smarter RAG Pipelines: Scaling Search with Milvus and FeastSmarter RAG Pipelines: Scaling Search with Milvus and Feast
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
Zilliz
HHUG-04-2025-Close-more-deals-from-your-existing-pipeline-FOR SLIDESHARE.pptx
HHUG-04-2025-Close-more-deals-from-your-existing-pipeline-FOR SLIDESHARE.pptxHHUG-04-2025-Close-more-deals-from-your-existing-pipeline-FOR SLIDESHARE.pptx
HHUG-04-2025-Close-more-deals-from-your-existing-pipeline-FOR SLIDESHARE.pptx
HampshireHUG
San Francisco Atlassian ACE - Mar 27 2025.pdf
San Francisco Atlassian ACE - Mar 27 2025.pdfSan Francisco Atlassian ACE - Mar 27 2025.pdf
San Francisco Atlassian ACE - Mar 27 2025.pdf
Matt Doar
Top Tips to Get Your Data AI-Ready
Top Tips to Get Your Data AI-Ready   Top Tips to Get Your Data AI-Ready
Top Tips to Get Your Data AI-Ready
Precisely
2025-04-05 - Block71 Event - The Landscape of GenAI and Ecosystem.pdf
2025-04-05 - Block71 Event - The Landscape of GenAI and Ecosystem.pdf2025-04-05 - Block71 Event - The Landscape of GenAI and Ecosystem.pdf
2025-04-05 - Block71 Event - The Landscape of GenAI and Ecosystem.pdf
Ivan Tang
Microsoft Digital Defense Report 2024 .pdf
Microsoft Digital Defense Report 2024 .pdfMicrosoft Digital Defense Report 2024 .pdf
Microsoft Digital Defense Report 2024 .pdf
Abhishek Agarwal
Getting the Best of TrueDEM April News & Updates
Getting the Best of TrueDEM  April News & UpdatesGetting the Best of TrueDEM  April News & Updates
Getting the Best of TrueDEM April News & Updates
panagenda
Automated Engineering of Domain-Specific Metamorphic Testing Environments
Automated Engineering of Domain-Specific Metamorphic Testing EnvironmentsAutomated Engineering of Domain-Specific Metamorphic Testing Environments
Automated Engineering of Domain-Specific Metamorphic Testing Environments
Pablo G坦mez Abajo
ScotSecure Cyber Security Summit 2025 Edinburgh
ScotSecure Cyber Security Summit 2025 EdinburghScotSecure Cyber Security Summit 2025 Edinburgh
ScotSecure Cyber Security Summit 2025 Edinburgh
Ray Bugg
The effectiveness of ai powered educational tools in enhancing academic perfo...
The effectiveness of ai powered educational tools in enhancing academic perfo...The effectiveness of ai powered educational tools in enhancing academic perfo...
The effectiveness of ai powered educational tools in enhancing academic perfo...
aebhpmqaocxhydmajf
SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
DianaGray10
APAC Solutions Challenge Info Session.pdf
APAC Solutions Challenge Info Session.pdfAPAC Solutions Challenge Info Session.pdf
APAC Solutions Challenge Info Session.pdf
GDG on Campus Monash
STRING FUNCTIONS IN JAVA BY N SARATH KUMAR
STRING FUNCTIONS IN JAVA BY N SARATH KUMARSTRING FUNCTIONS IN JAVA BY N SARATH KUMAR
STRING FUNCTIONS IN JAVA BY N SARATH KUMAR
Sarathkumar Narsupalli
Columbia Weather Systems - Product Overview
Columbia Weather Systems - Product OverviewColumbia Weather Systems - Product Overview
Columbia Weather Systems - Product Overview
Columbia Weather Systems
All-Data, Any-AI Integration: FME & Amazon Bedrock in the Real-World
All-Data, Any-AI Integration: FME & Amazon Bedrock in the Real-WorldAll-Data, Any-AI Integration: FME & Amazon Bedrock in the Real-World
All-Data, Any-AI Integration: FME & Amazon Bedrock in the Real-World
Safe Software
Transactional Outbox & Inbox Patterns.pptx
Transactional Outbox & Inbox Patterns.pptxTransactional Outbox & Inbox Patterns.pptx
Transactional Outbox & Inbox Patterns.pptx
Maysam Mousa
AI in Talent Acquisition: Boosting Hiring
AI in Talent Acquisition: Boosting HiringAI in Talent Acquisition: Boosting Hiring
AI in Talent Acquisition: Boosting Hiring
Beyond Chiefs
GDG Cloud Southlake #41: Shay Levi: Beyond the Hype:How Enterprises Are Using AI
GDG Cloud Southlake #41: Shay Levi: Beyond the Hype:How Enterprises Are Using AIGDG Cloud Southlake #41: Shay Levi: Beyond the Hype:How Enterprises Are Using AI
GDG Cloud Southlake #41: Shay Levi: Beyond the Hype:How Enterprises Are Using AI
James Anderson
Commit Conf 2025 Bitnami Charts with Kubescape
Commit Conf 2025 Bitnami Charts with KubescapeCommit Conf 2025 Bitnami Charts with Kubescape
Commit Conf 2025 Bitnami Charts with Kubescape
Alfredo Garc鱈a Lavilla
The Future of Materials: Transitioning from Silicon to Alternative Metals
The Future of Materials: Transitioning from Silicon to Alternative MetalsThe Future of Materials: Transitioning from Silicon to Alternative Metals
The Future of Materials: Transitioning from Silicon to Alternative Metals
anupriti
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
Smarter RAG Pipelines: Scaling Search with Milvus and FeastSmarter RAG Pipelines: Scaling Search with Milvus and Feast
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
Zilliz
HHUG-04-2025-Close-more-deals-from-your-existing-pipeline-FOR SLIDESHARE.pptx
HHUG-04-2025-Close-more-deals-from-your-existing-pipeline-FOR SLIDESHARE.pptxHHUG-04-2025-Close-more-deals-from-your-existing-pipeline-FOR SLIDESHARE.pptx
HHUG-04-2025-Close-more-deals-from-your-existing-pipeline-FOR SLIDESHARE.pptx
HampshireHUG
San Francisco Atlassian ACE - Mar 27 2025.pdf
San Francisco Atlassian ACE - Mar 27 2025.pdfSan Francisco Atlassian ACE - Mar 27 2025.pdf
San Francisco Atlassian ACE - Mar 27 2025.pdf
Matt Doar
Top Tips to Get Your Data AI-Ready
Top Tips to Get Your Data AI-Ready   Top Tips to Get Your Data AI-Ready
Top Tips to Get Your Data AI-Ready
Precisely
2025-04-05 - Block71 Event - The Landscape of GenAI and Ecosystem.pdf
2025-04-05 - Block71 Event - The Landscape of GenAI and Ecosystem.pdf2025-04-05 - Block71 Event - The Landscape of GenAI and Ecosystem.pdf
2025-04-05 - Block71 Event - The Landscape of GenAI and Ecosystem.pdf
Ivan Tang
Microsoft Digital Defense Report 2024 .pdf
Microsoft Digital Defense Report 2024 .pdfMicrosoft Digital Defense Report 2024 .pdf
Microsoft Digital Defense Report 2024 .pdf
Abhishek Agarwal
Getting the Best of TrueDEM April News & Updates
Getting the Best of TrueDEM  April News & UpdatesGetting the Best of TrueDEM  April News & Updates
Getting the Best of TrueDEM April News & Updates
panagenda
Automated Engineering of Domain-Specific Metamorphic Testing Environments
Automated Engineering of Domain-Specific Metamorphic Testing EnvironmentsAutomated Engineering of Domain-Specific Metamorphic Testing Environments
Automated Engineering of Domain-Specific Metamorphic Testing Environments
Pablo G坦mez Abajo
ScotSecure Cyber Security Summit 2025 Edinburgh
ScotSecure Cyber Security Summit 2025 EdinburghScotSecure Cyber Security Summit 2025 Edinburgh
ScotSecure Cyber Security Summit 2025 Edinburgh
Ray Bugg
The effectiveness of ai powered educational tools in enhancing academic perfo...
The effectiveness of ai powered educational tools in enhancing academic perfo...The effectiveness of ai powered educational tools in enhancing academic perfo...
The effectiveness of ai powered educational tools in enhancing academic perfo...
aebhpmqaocxhydmajf
SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
DianaGray10
APAC Solutions Challenge Info Session.pdf
APAC Solutions Challenge Info Session.pdfAPAC Solutions Challenge Info Session.pdf
APAC Solutions Challenge Info Session.pdf
GDG on Campus Monash

Datageeks