Thomas Debeauvais

Bart Knijnenburg

                    Big Data
                    A critical appraisal
+                                 2


       The wonders of Big Data

       The Perils of Big Data

       User Experiments

       A Note on Privacy

The Wonders
of Big Data
How Big Data will put
the personal back
in e-commerce
+                                                          4

    Large vs small datasets

       Everything is significant!

       Data from most/all of your customers
           More than just an educated guess
           This is what really happens!

       Large datasets can improve business intelligence
+                                                                     5

    The Netflix challenge

       Recommendations seen as       $1M prize if 10% better than
        Netflix strongest asset       Netflixs Moviematch

       2006-2009                     Data: 18k movies, 500k
                                       users, 100M ratings
+                                                                             6

    The Netflix challenge

       Netflixs rational:
           Improve our ability to connect people to the movies they love
           Improve recommendations = improve satisfaction and retention
           Small R&D team, slow progress
           $1M will pay for itself

       Based on Padhraic Smyths report at
+                                                                        7

    Matrix approximation

         Distinguish noise from signal: variance and eigenvalues

         Singular value decomposition
             Ratings(m*n) = U(m*n) E(n*n) V(n*n)

         Rank-k approximation
             Ratings(m*n)  U(m*k) E(k*k) V(k*n)
              n movies                   k          k         n movies

                                                    E               V

                               m users
m users

              Ratings      =             U
independent, quirky,
                                  critically acclaimed                         8
             Plot of V with k=2

Lowbrow                                                                Drama,
comedies,                                                              serious
Horror,                                                                comedy,
Male or                                                                Strong
adolescent                                                             female
audience                                                               lead


                                                         [Koren et al. 2009]
+                                        9

    Bias is information

                          [Smyth 2010]
+                                                                     10


       Matrix decomposition
           Meaningful movie categories!
           For example: lowbrow, quirky, indie, strong female lead

       Older movies are rated higher
           So ...?
           Should recommend older movies more often or less often?
           Why are they rated higher?

The Perils
of Big Data
How overfitting and
a lack of domain knowledge
can lead to suboptimal solutions
+                                                                           12

    What about random?

       We were demonstrating our new recommender to a client.
        They were amazed by how well it predicted their preferences!

       Later we found out that we forgot to activate the algorithm: the
        system was giving completely random recommendations.
+               13

+                                                                           14

    Model complexity

       Our winning entries consist of more than 100 different
        predictor sets [Koren et al 2009]

       Only 10% better than Netflix

       Intrinsic noise
           Example: children watch cartoons, Mum is recommended cartoons
           Should Netflix implement a switch user feature?
           Domain knowledge!
+                                                                      15

    More gotchas

       Obvious truisms and correlation fallacies
           Still present in large datasets
           Domain knowledge!

       Overfitting: simple models that make sense vs complex models
        that fit the data

User Experiments
How user evaluations
can be used to create
meaningful experiences
+                                                           17

    Offline evaluations

           Gather rating data
           Remove 10% of the ratings of each user
           Optimize the algorithm to predict those 10%

           Predict the rating of unknown items
           Recommend items with highest predicted rating
+                                                                                         18

    Offline evaluations

       Problems                                       Solutions
           Offline evaluations may not                    Test with real users
            give the same outcome as                        (A/B testing)
            online evaluations (Cosley et
            al., 2002; McNee et al., 2002)

           Higher rating does not mean                    Consider other behaviors
            good recommendation (McNee                      (consumption, retention)
            et al., 2006)

           The algorithm counts for only                  A/B test other aspects
            5% of the relevance of a                        (interaction, presentation)
            recommender system (Francisco
            Martin, 2009)
+                                                                                                                             19

    Online evaluations

       Testing a recommender against
        a random videoclip system (A/B
        test)                                   number of
                                             clips watched
         Expectation: Consumption           from beginning
                                                  to end                           total                        number of
                                                                  +            viewing time                   clips clicked
           will increase
         Reality: The number of                      personalized
           clicked clips and total viewing                               OSA

           time went down!                                                                 perceived system
                                                              +                                               EXP

       Insight: Recommender is more               perceived recommendation
        effective                                                        SSA

         More clips watched from                                                               choice
           beginning to end                                                                                   EXP

         Users browse less, consume
+                                                                                  20

    Behavior vs Questionnaires

       Behavior is hard to interpret
           Relationship between behavior and satisfaction is not always trivial

       Questionnaires are a better predictor of long-term retention
           With behavior only, you will need to run for a long time

       Questionnaire data is more robust
           Fewer participants needed
+                                                                                21

    A guide to user experiments
    http://bit.ly/recsys2011short          http://bit.ly/recsystutorialhandout

       Is my system good?
           What does good mean?
           We need to define measures

       Does my system score high on this satisfaction scale?
           What does high mean?
           We need to compare it against something

       Does my system score higher than this other system?
           Say we find that it scores higher on satisfaction... why does it?
           Apply the concept of ceteris paribus
+                                           22

    An example

       We compared three
        recommender systems
         Three different algorithms

       System effectiveness scale:
         The system has no real benefit
          for me.
         I would recommend the system
          to others.
         The system is useful.
         I can save time using the
         I can find better TV programs
          without the help of the system.
+                                                         23

    An example

          The mediating variables tell the entire story
+                                                                                                                 24

    An example

    Matrix Factorization recommender with   Matrix Factorization recommender with
    explicit feedback (MF-E)                implicit feedback (MF-I)
    (versus generally most popular; GMP)         (versus most popular; GMP)
                                      OSA                                     OSA

                  +                                        +

         perceived recommendation                perceived recommendation                perceived system
                 variety                    +            quality                    +   effectiveness
                               SSA                                     SSA                                  EXP

A Note on Privacy
How to avoid
this looming danger
of our Big Data future
+                                   26

    Personalization with control
+                                                                          27

    Privacy concerns

       Second Netflix challenge

       Anonymized dataset

       Lawsuit from Californian closeted lesbian Mum

       Netflix withdraws their second challenge

+                                           28

    Privacy directive

         companies should provide
          clear descriptions of [...] why
          they need the data, how they
          will use it
         Informed consent

         companies should offer
          consumers clear and simple
          choices [...] about personal
          data collection, use, and
         User empowerment
+                          29

    Transparency Paradox
+                                                                  30

    Control Paradox

       bewildering tangle of options (New York Times, 2010)

       labyrinthian controls (U.S. Consumer Magazine, 2012)

       Researchers asked: what do your privacy settings mean?
           86% of Facebook users got it wrong!
+                                                                         31

    Control Paradox

                                          Introducing an extreme
             E                             sharing option
                                              Nothing - City - Block

                   B                          Add the option Exact

                                              Some will choose Exact
                                               instead of Block
                  privacy                    Sharing increases across
                                               the board!
+                           32

    Bounded rationality

A                         25%
B                         37%
C                         53%
D                         0%
+                                       33

    Idea: nudging

       People do not always choose
        what is best for them

       Idea: use defaults to nudge
        users in the right direction
+                                                                                     34

    What is the right direction?

       More information = better, e.g. for personalization
           Techniques to increase disclosure cause reactance in the more
            privacy-minded users

       Privacy is an absolute right
           More difficult for less privacy-minded users to enjoy the benefits that
            disclosure would provide
+                                                     35

    It depends on the user!

                What is best for consumers
                 depends upon characteristics
                 of the consumer

                An outcome that maximizes
                 consumer welfare may be
                 suboptimal for some consumers
                 in a context where there is
                 heterogeneity in preferences
                 (Smith, Goldstein & Johnson, 2009)
+                                                                  36

    Privacy Adaptation Procedure

           Personalize users privacy settings!
           Automatic defaults in line with disclosure profile
           Using big data to improve big data privacy 

       Relieves some of the burden of the privacy decision:
           The right privacy-related information
           The right amount of control

       Realistic empowerment
+                The wonders of Big Data
                  Big Data can be used to create powerful
                  personalized e-commerce experiences

                 The Perils of Big Data
                  Big Data solutions will only work if the
                  developers have an adequate amount of
                  domain knowledge

                 User Experiments
                  Big Data solutions need to be tested on
Conclusions       real users, with a focus on user

                 A Note on Privacy
                  Big Data can raise privacy concerns, but
                  it can at the same time be used to
                  alleviate these concerns
+               The wonders of Big Data
                    Big Data can be used to create
                     powerful personalized e-commerce

                The Perils of Big Data
                    Big Data solutions will only work if the
                     developers have an adequate amount
                     of domain knowledge

                User Experiments
Questions?          Big Data solutions need to be tested
                     on real users, with a focus on user

                A Note on Privacy
                    Big Data can raise privacy
                     concerns, but it can at the same time
                     be used to alleviate these concerns

Editor's Notes

  • #3: The wonders of Big DataHow Big Data will put the personal back in e-commerceThe Perils of Big DataHow overfitting and a lack of domain knowledge can lead to suboptimal solutionsUser ExperimentsHow user evaluations can be used to create meaningful experiencesA Note on PrivacyHow to avoid this looming danger of our Big Data future
  • #6: Improvement means reducing the error in predicting user ratingerror = root mean square error between system rating and user rating
  • #10: Older movies have higher average rating.
  • #14: Averages are understandable.Bayes and multinomial maybe. Leaders models not at all!
  • #15: Nobody will use these hybrids in a real system
  • #17: We have a ground truth problem. Easy to overfit models on some quirk in the data. We want to make sure we adapt to general human behavior, and ultimately, that we make our users happy.Framework for user centric evaluation, using the example of recommender systems.
  • #18: If we just have more accurate algorithms, our recommendations will automatically be better!
  • #19: Also link to Xaviers blog posts about NetflixAsk who knows A/B testing
  • #20: But even that is not enough
  • #28: Also add the Target horror story
  • #30: I think transparency and control will not help because people are kind of broken.Transparency should make people avoid bad privacy practices and endorse good privacy practices
  • #32: Control is an illusion, because we can easily influence peoples decisions
  • #33: People are boundedly rational. Here is another example:
  • #34: This idea is interesting, because if people dont choose what is best for them, then why dont we just push them in the right direction?