�ݺ�ߣ

Mining Behaviour Models
from User-Intensive Web
Applications
Carlo Ghezzi
carlo.ghezzi@polimi.it
Politecnico di Milano, Italy (IT)
Mauro Pezzè
mauro.pezze@usi.ch
Università della Svizzera Italiana, Lugano (CH)
Michele Sama
michele@swiftkey.net
Touchtype Ltd, UK
Giordano Tamburrelli
giordano.tamburrelli@usi.ch
Università della Svizzera Italiana, Lugano (CH)

Scalability Privacy
Security Users
Modern Web Applications
• Millions of interactions per day
• Manage sensible data
• Secure economic transactions
• Capture/measure user behaviours

• User’s behaviours cannot be
predicted at design time.
• Only released applications allow
us to collect statistics
• Multiple and heterogeneous
navigational behaviours that
depend on several factors
• Behaviours may unpredictably
change over time
User behaviours

• Monitoring+analysis/mining
• Little support from a general software engineering perspective
Related work
Google AnalyticsLink PredictionWeb Caching

• General abstraction to support software engineers
• Automated and non-ambigous analysis tool
• Support for different user classes
• Other key features:
• extensibility (domain specific analysis)
• incrementality
• applicable to legacy systems
What is missing

• Exploit formal models to capture and quantitatively analyse
user behaviors
• Focus on RESTful architectures
• Based on log file mining applicable to legacy systems
Formal
Methods
Web
Development
+
Our Idea

• User classes
• Give semantics to events
in the log file
• Infer user-behaviour
models (DTMC)
• Queries the models
Ingredients

A real-world case study
• Small example, but general enough:
• URL with parameters
• URL with parametric structure
URL Description
/home/ Homepage of findyourhouse.com
/anncs/sales/ The first page that shows the sales announcements.
/anncs/sales/?page=< n>
Nth page of sales announcements
/anncs/sales/< id> / Detailed view of the sales announcement
/anncs/renting/
The first page that shows the renting announcements.
/anncs/renting/?page=< n> Nth page of renting announcements
/anncs/renting/< id> / Detailed view of the renting announcement
/search/ Page containing the results of a search
/admin/.../
Website’s control panel
/admin/login/ Login page that allows to access the control panel.
/contacts/ URL with the form to contact a sales agent.
/contacts/submit/
Contact form submitted
has been submitted.
Page that describes the website terms of use.

• A set of atomic propositions (AP) give semantics to the
entries in the log
• Declarative approach: @BearFilter
URLs ➔ Atomic Propositions
@BearFilter(regex="^/anncs/sales/(w+)/$")
public static Proposition void filterSales(LogLine line){
return new Proposition("sales_anncs");
}
@BearFilter(regex="^/admin/login/$")
public static Proposition void filterLogin(LogLine line){
if(logLine.getHTTPStatusCode == "302")
return new Proposition("login_success");
else
return new Proposition("login_fail");
}

URLs ➔ Atomic Propositions
URL Atomic Propositions
/home/
homepage
/anncs/sales/
sales_page, page_1
/anncs/sales/?page=< n>
sales_page, page_n
/anncs/sales/< id> /
sales_anncs
/anncs/renting/
renting_page, page_1
/anncs/renting/?page=< n>
renting_page, page_n
renting_anncs

• Code fragments called classifiers to specify user classes
• Declarative approach: @BearClassifier
Identify User Classes
@BearClassifier(name="userAgent")
public static String UserAgentClassifier(LogLine logline) {
return logline.getAgent();
}
{(userAgent = “Mozilla/5.0...”), (location = “Boston”)}

• BEAR infers a set of DTMCs
• Sequential and incremental
process
• An independent DTMC for
each user class
Infer the models

IP TIMESTAMP URL
1.1.1.1 - [20/Dec/2013:15:35:02] - /home/
2.2.2.2 - [20/Dec/2013:15:35:07] - /admin/login/
1.1.1.1 - [20/Dec/2013:15:35:12] - /anncs/sales/1756/
2.2.2.2 - [20/Dec/2013:15:35:19] - /admin/edit/
Infer the models

Infer the models
incrementality

• Rewards: domain specific
metrics of interests
• Number of announcements
displayed
• DB Queries
Annotating the models
extensibility

• Probabilistic Computation Tree Logic (PCTL)
augmented with rewards
• BEAR Properties = scope + PCTL formula
Specifying the properties
{userAgent = “(.∗)Mozilla(.∗)”}P=?[F contact_requested]
{userAgent = “(.∗)(Android|iOS)(.∗)”}R=?[F end]
generality

Querying the models
automation
• Scope identifies the set of
relevant DTMCs among the
inferred models
• BEAR analysis engine
compose selected DTMCs into
single one
• PCTL verification performed
with PRISM on the composed
model

Model Composition
• Union of the sets of states of the input DTMCs
• Law of total probability to compute transitions

• Detecting navigational anomalies:
• A difference between the actual and the expected user navigation
actions.
• Comparing the BEAR models with the site map:
{}P =?[(X si)]{sj}
• Measuring behaviours and attitudes
• {}P =?[(F sales_anncs) & (!(F renting_anncs))]
• {(?!(.∗)(Android | iOS))(.∗)}R=?[F end {sales_anncs}]
BEAR at work

BEAR: performance
• Variable number of states
• Variable length of log file

BEAR: performance
• Variable number of DTMCs
• Variable number of states

• More expressive formalisms
• Self-adaptive applications
Summary
• Formal analysis of user
behaviours in web apps
• Validation on a real case study
• On-going validation on a
mobile app

�ݺ�ߣ

BEAR: Mining Behaviour Models from User-Intensive Web Applications

More Related Content

BEAR: Mining Behaviour Models from User-Intensive Web Applications

Editor's Notes