際際滷

際際滷Share a Scribd company logo
ITMO RecSys course. Autumn2014. Lecture1
亠从仂仄亠仆亟舒亠仍仆亠 
亳亠仄 
亠从亳 1: 于于亠亟亠仆亳亠 
仆亟亠亶 舒仆亳仍亠仆从仂 
18 仂从磡 2014
弌从舒 
 Introduction 
 Collaborative filtering 
 Content-based & hybrid methods 
 Evaluation
Recommender Systems (RSs) are 
software tools and techniques 
providing suggestions for items to 
be of use to a user 
F. Ricci 
Introduction
仂仍亳亠于仂 舒亠亶 于 仂弍仍舒亳 RS 
仗仂 亟舒仆仆仄 google scholar (仂 2014-10-17)
 亢亳于亠仄 于 仗仂 
亠从仂仄亠仆亟舒亠仍仆 亳亠仄!
仍舒亳亳从舒亳 RS 
Available data 
Tags 
& 
Metadata 
User history Content 
Collaborative Content-based 
Hybrid
舒仆仆亠 
 亠亶亳仆亞亳 (explicit feedback) 
 丕仆舒仆亠 (like) 
 亳仆舒仆亠 (like/dislike) 
 丼亳仍仂于亠 (stars) 
 仂亳 亟亠亶于亳亶 (implicit feedback) 
 丐亠亞亳, 仄亠舒亟舒仆仆亠 
~ 
 亰于 
 亰 (community-based RS)
仂舒仆仂于从舒 亰舒亟舒 RS 
 Predict 
 Recommend 
 Similar
Collaborative filtering 
 Neighborhood methods 
 Matrix factorization methods
Collaborative filtering 
Neighborhood methods
亟亠 仄亠仂亟舒 (user-based) 
舒从 仗仂亟从 仂亠仆亳仍亳 仗仂仂亢亳亠 仗仂仍亰仂于舒亠仍亳? 
 rui = 
1 
Ni (u) 
rvi 
裡 
vNi (u) 
亰于亠亳仄 于从仍舒亟 从舒亢亟仂亞仂 
 rui = 
wuvrvi 
裡 
vNi (u) 
wuv 
裡 
vNi (u) 
 仆仂仄舒仍亳亰亠仄 亠亶亳仆亞亳 
 rui = h1 
wuvh rvi ( ) 
裡 
vNi (u) 
wuv 
裡 
vNi (u) 
$ 
&&& 
% 
' 
))) 
(
舒从仂亠 舒仂礌亳亠 亳仗仂仍亰仂于舒? 
 仂亳仆仆仂亠 舒仂礌亳亠 
cos(u, v) = 
ruirvi 
裡 
iIuv 
裡 r2 
rui 
vj 
iIu 
2 
裡 
jIv 
 仂亠仍亳 亳仂仆舒 
PC(u, v) = 
(rui  ru )(rvi  rv ) 
裡 
iIuv 
裡 (r r)2 
(r r)2 
ui u vi v iIu 
裡 
jIv
舒从 仆仂仄舒仍亳亰仂于舒 亠亶亳仆亞亳? 
 Mean centering 
h rui ( ) = rui  ru 
 Z-score 
h rui ( ) = 
rui  ru 
 u 
 Percentile 
h rui ( ) = 
j  Iu : ruj  rui { } 
Iu
Collaborative filtering 
Matrix factorization methods
舒亳仍亠亠 仗亳弍仍亳亢亠仆亳亠 舒仆亞舒 k 
丐亠仂亠仄舒: 
仍亳 于 仄舒亳亠 了 仂舒于亳 k 仆舒亳弍仂仍亳 亳仆亞仍仆 于亠从仂仂于, 
仂 仗仂仍亳仄 仆舒亳仍亠亠 仗亳弍仍亳亢亠仆亳亠 仄舒亳 A 舒仆亞舒 k
Baseline predictors 
仂亟亠仍: 
ruui =亮 + bu + bi 
argmin 
b* 
裡 ( 2 
r亮  b b) 
uui u i (u,i )R 
裡 2 + 
裡 
bi 
+了 bu 
uU 
2 
iI 
$ 
% & 
' 
( ) 
个仆从亳 仂亳弍从亳:
SVD 
仂亟亠仍: 
Tqi 
ruui =亮 + bu + bi + pu 
argmin 
p*q*b* 
裡 2 
( r亮  b b pTq) 
uui u i u 
i (u,i )R 
2 ( ) 
+了 pu 
2 
+ qi 
2 
+ bu 
2 + bi 
个仆从亳 仂亳弍从亳:
Neighborhood (item-based) 
仂亟亠仍: 
ruui = bui + 
sij ruj  buj ( ) jSk (u,i ) 裡 
sjS ij k (u,i ) 裡 
u ruj  buj ( ) jSk (u,i ) 裡 
= bui + 慮ij
Neighborhood (optimization) 
裡 
u ruj  buj ( ) jSk (u,i ) 裡 
ruui = bui + ij ruj  buj ( ) 
jR(u) 
 ruui = bui + 
sij ruj  buj ( ) jSk (u,i ) 裡 
sjS ij k (u,i ) 裡 
= bui + 慮ij
Neighborhood (optimization + implicit) 
u ruj  buj ( ) jSk (u,i ) 裡 
裡 + cij 
ruui = bui + ij ruj  buj ( ) 
jR(u) 
裡 
jN(u) 
 ruui = bui + 
sij ruj  buj ( ) jSk (u,i ) 裡 
sjS ij k (u,i ) 裡 
= bui + 慮ij
Neighborhood (normalization) 
ruui = bui + R(u) 
 
裡 + cij 
1 
2 ij ruj  buj ( ) 
jR(u) 
裡 
裡 + N(u) 
 
1 
2 裡 
cij 
jN(u) 
 ruui = bui + ij ruj  buj ( ) 
jR(u) 
jN(u) 
 ruui = bui + Rk (i,u) 
 
1 
2 ij ruj  buj ( ) 
jRk (u) 
裡 + Nk (i,u) 
 
1 
2 裡 
cij 
jNk (u)
弌仆仂于舒 SVD 
仂亟亠仍: 
Tqi 
ruui =亮 + bu + bi + pu 
argmin 
p*q*b* 
裡 2 
( r亮  b b pTq) 
uui u i u 
i (u,i )R 
2 ( ) 
+了 pu 
2 
+ qi 
2 
+ bu 
2 + bi 
个仆从亳 仂亳弍从亳:
Asymmetric-SVD 
仂亟亠仍: 
T R(u) 
ruui =亮 + bu + bi + qi 
 
1 
2 ruj  buj ( ) xj 
jR(u) 
裡 + N(u) 
 
1 
2 裡 
yj 
jN(u) 
$ 
% && 
' 
( )) 
argmin 
p*q*b* 
裡 ( r r )2 + 
了 quui uui i 
(u,i )R 
2 
+ bu 
2 + bi 
裡 
2 + xj 
2 
+ yj 
2 
裡 
jN(u) 
jR(u) 
$ 
% && 
' 
( )) 
个仆从亳 仂亳弍从亳:
SVD++ 
仂亟亠仍: 
T pu + N(u) 
ruui =亮 + bu + bi + qi 
 
1 
2 裡 
yj 
jN(u) 
$ 
% && 
' 
( )) 
argmin 
p*q*b* 
裡 ( r r )2 + 
了 puui uui u 
(u,i )R 
2 
+ qi 
2 
+ bu 
2 + bi 
裡 
2 + yj 
2 
jN(u) 
$ 
% && 
' 
( )) 
个仆从亳 仂亳弍从亳:
Integrated model 
仂亟亠仍: 
T pu + N(u) 
ruui =亮 + bu + bi + qi 
 
1 
2 裡 
yj 
jN(u) 
$ 
% && 
' 
( )) 
+ 
+ Rk (i,u) 
 
1 
2 ij ruj  buj ( ) 
jRk (u) 
裡 + Nk (i,u) 
 
1 
2 裡 
cij 
jNk (u)
 从舒从 于亠 仂 
仂仗亳仄亳亰亳仂于舒?
SGD-仂仗亳仄亳亰舒亳 仄仂亟亠仍亳 SVD 
仂亟亠仍: 
Tqi 
ruui =亮 + bu + bi + pu 
argmin 
p*q*b* 
裡 2 
( r亮  b b pTq) 
uui u i u 
i (u,i )R 
2 ( ) 
+了 pu 
2 
+ qi 
2 
+ bu 
2 + bi 
个仆从亳 仂亳弍从亳: 
舒于亳仍舒 亟仍 亞舒亟亳亠仆仆仂亞仂 仗从舒: 
bu bu +粒1 eui 了1bu ( ) 
bi bi +粒1 eui 了1bi ( ) 
pu  pu +粒 2 euiqi 了2 pu ( ) 
qu qi +粒 2 eui pu 了2qi ( )
Ridge regression 
仂亟亠仍: 
yi wT xi 
wTw0 
argmin 
w 
n裡 
了wTw+ wT xi  yi ( )2 
i=1 
# 
$ % 
& 
' ( 
个仆从亳 仂亳弍从亳: 
丐仂仆仂亠 亠亠仆亳亠: 
w = (了 I + XTX)1 
XT y = (了 I + A)1 d 
A = XTX 
d = XT y
ALS-仂仗亳仄亳亰舒亳 仄仂亟亠仍亳 SVD 
仂亟亠仍: 
Tqi 
ruui =亮 + bu + bi + pu 
argmin 
p*q*b* 
裡 2 
( r亮  b b pTq) 
uui u i u 
i (u,i )R 
2 ( ) 
+了 pu 
2 
+ qi 
2 
+ bu 
2 + bi 
个仆从亳 仂亳弍从亳: 
P-step: 
pu = 了nuI + Au ( )1 du 
Au =Q[u]TQ[u] = qiqi 
T 
裡 
i:(u,i)R 
裡 
d =Q[u]T ru = ruiqi 
i:(u,i)R 
Q-step: 
qi = 了niI + Ai ( )1 di 
Ai = P[i]T P[i] = pupu 
T 
裡 
u:(u,i)R 
裡 
di = P[i]T ri = rui pu 
u:(u,i)R
弌舒于仆亠仆亳亠 仄仂亟亠仍亠亶 仗仂 RMSE 
仂亟亠仍 50 舒从仂仂于 100 舒从仂仂于 200 舒从仂仂于 亠亠 
Item-based 
  
 
0.9406 
kNN 
Neighborhood    0.9002 
SVD 0.9046 0.9025 0.9009 0.9009 
Asymmetric 
0.9037 0.9013 0.9000 0.9000 
SVD 
SVD++ 0.8952 0.8924 0.8911 0.8911 
Integrated 
model 
0.8877 0.8870 0.8868 0.8868 
仆舒 亟舒仆仆 Netflix Prize
仂亟亠仍 50 舒从仂仂于 100 舒从仂仂于 200 舒从仂仂于 亠亠 
Item-based 
  
 
0.9406 
kNN 
Neighborhood    0.9002 
SVD 0.9046 0.9025 0.9009 0.9009 
Asymmetric 
0.9037 0.9013 0.9000 0.9000 
SVD 
SVD++ 0.8952 0.8924 0.8911 0.8911 
Integrated 
model 
0.8877 0.8870 0.8868 0.8868 
仆舒 亟舒仆仆 Netflix Prize 
BULLSHIT!
弌ontent-based methods 
Tag-based methods 
True content-based methods
弌ontent-based methods 
Tag-based methods
舒于舒亶亠 亳仗仂仍亰仂于舒 亞亳!
弌仗仂仂弍 亞亠仆亠舒亳亳 亞仂于 
 User-generated 
 Web-mining 
 Expert-generated 
 Metadata
Similarity by tags (co-occurrence) 
舒仆仆亠: 
仂弍仍舒从舒 亞仂于 亳 
亠 仂亟于舒: 
 舒从从舒亟舒 
 舒亶舒 
 舒亳 
Ti Tj 
Ti Tj 
Ti Tj 
2  Ti Tj 
Ti + Tj 
Ti Tj 
Ti Tj
Similarity by tags (LSA) 
 舒亰仍仂亢亳仄 仄舒亳 Items x Tags 仗仂 SVD 
 亠 仂亟于舒: 从仂亳仆仆仂亠 舒仂礌亳亠 亳 
亟. 
Item 
features 
 x x 
Tags 
Tag 
features 
Items 
了
丐亞仂于亶 于舒仆亟舒仍亳亰仄 
丐亞亳 Paris Hillton 
Last.fm, 仄舒亶 2013
丐亞仂于亶 于舒仆亟舒仍亳亰仄  从舒从 弍仂仂? 
仗舒于仍亠仆仆亠 亞亳 Paris Hillton 
 User listening habbits 
 Filter tags by 
similarity
弌ontent-based methods 
True content-based methods
亳仄亠  仄亰从舒 
 Spectral centroid 
 Spectral flatness 
 Spectral skewness 
 Spectral kurtosis 
 Zero-Crossing Rate (ZCR) 
 Mel Frequency Cepstrum Coefficients (MFCCs) 
 Instrumentation 
 Rhythm 
 Harmony 
 Structure 
 Intensity 
 Genre 
 Mood 
low-level 
high-level
Hybrid methods
仍舒亳亳从舒亳 仄亠仂亟仂于 
 Weighted 
 Switching 
 Mixed 
 Cascade
Evaluation
舒从 仄仂亢仆仂 亳亰仄亠亳 从舒亠于仂 RS? 
 Offline test 
 User study 
 Online experiment
Offline evaluation 
 Prediction accuracy 
 RMSE 
 MAE 
 Usage prediction accuracy 
 Precision/recall @N 
 F1 
 AUC 
 Ranking accuracy 
 DPM 
 DGC 
 Average Reciprocal Hit Rank (ARHR) 
 Coverage 
 Catalog coverage 
 Sales diversity 
 Gini index 
 Shannon entropy
User study 
 Confidence 
 Trust 
 Novelty 
 Diversity 
 Serendipity 
 Robustness 
 Adaptivity 
 Scalability
弌舒于仆亳于舒 仍亠亞亠!
Online study methods 
 A-B testing 
 Team-Driven Interleaving (TDI)
仆亟亠亶 舒仆亳仍亠仆从仂 
舒亰舒弍仂亳从 
丕亟舒亳!

More Related Content

ITMO RecSys course. Autumn2014. Lecture1