際際滷

際際滷Share a Scribd company logo
Faceted Ranking In Collaborative Tagging Systems

   J. I. Orlicki12   P. Fierens2           J. I. Alvarez-Hamelin23
                     1 Core   Security Technologies
                                  2 ITBA

                               3 CONICET


               WEBIST 2009, Lisbon, Portugal
The Problem (Faceted Reputation)
      Which ickr photographers are the best regarding a facet, i.e.
      tag set, { sea, portugal }?
      Nodes are users/channels, edges are favorites and tags are
      associated to the favorited content.
Single Ranking (1/3)




      Basic approach, single rank and ltering. Scales well.
      Everything is biased to the richer nodes, tags don't inuence
      the ranking.
      G goes out, but why is D worstly ranked than A regarding
      {sea, portugal}? Is D better than C?
Single Ranking (2/3)




      Basic approach, single rank and ltering. Scales well.
      Everything is biased to the richer nodes, tags don't inuence
      the ranking.
      G goes out, but why is D worstly ranked than A regarding
      {sea, portugal}? Is D better than C?
Single Ranking (3/3)




      Basic approach, single rank and ltering. Scales well.
      Everything is biased to the richer nodes, tags don't inuence
      the ranking.
      G goes out, but why is D worstly ranked than A regarding
      {sea, portugal}? Is D better than C?
Edge-intersection, 1st gold standard (1/3)

       Filtering edges including the conjunction of tags.
       Adequate tag bias, slightly restrictive.
Edge-intersection, 1st gold standard (2/3)

       Filtering edges including the conjunction of tags.
       Adequate tag bias, slightly restrictive.
Edge-intersection, 1st gold standard (3/3)

       Filtering edges including the conjunction of tags.
       Adequate tag bias, slightly restrictive.
Node-intersection, 2nd gold standard (1/2)
       Filtering edges including the disjunction of tags to rank.
       Plus ltering conjuntion of nodes involved in every tag edge
       after ranking.
       Adequate tag bias, slightly irrestrictive, possibly one tag
       prevails over the other.




   c
Node-intersection, 2nd gold standard (2/2)
       Filtering edges including the disjunction of tags to rank.
       Plus ltering conjuntion of nodes involved in every tag edge
       after ranking.
       Adequate tag bias, slightly irrestrictive, possibly one tag
       prevails over the other.
The Scalability Problem
       The previous two algorithms don't scale for online queries.
       Another possibility is computing singleton facets oine, and
       later merge the results online.
       Oine time and spatial complexity will grow linearly on
       #edges  #tags per edge. Scaling nicely.


                      100000
                                                 YouTube
                                                   Flickr
                      10000

                        1000
            # edges




                         100

                          10

                           1

                         0.1
                               1   10             100       1000
                                        # tags
Singleton facets, computed oine (1/2)

      Singleton facet subgraphs used in ranking, after that only best
      K users stored, where K is small.
Singleton facets, computed oine (2/2)
      Singleton facet subgraphs used in ranking, after that only best
      K users stored, where K is small.
Probability-product


       Inspired by the probability independence rule, multiply
       PageRank probability of single tags.

                 sea        portugal                    rank!
           A     0.09         0.02            0.0018     #6
           B     0.14         0.04            0.0056     #4
           C     0.14        0.40      =     0.0560     #2
           D     0.38         0.39            0.1482     #1
           E     0.14         0.07            0.0098     #3
           F     0.09         0.05            0.0045     #5

       Possible bias towards the heaviest tag, eclipsing the others.
Rank-sum
     Lowest accumulated ordinal/position sum gets the best ranks.

                  sea        portugal              rank!
             A    #3           #6             9     #5
             B    #2           #5             7     #4
             C    #2    +      #2        =    4     #2
             D    #1           #1             2     #1
             E    #2           #3             5     #3
             F    #3           #4             7     #4

     Avoids this kind of topic drift towards one of the tags.
Winners-intersection

        Top W (small) nodes per singleton facet are used to build a
        new small graph.
        W = 500 in experiments (W = 3 in example).




         sea       portugal
    A    #3
    B    #2
    C    #2          #2       =    C
    D    #1           #1            D
    E    #2           #3            E
    F    #3
Experiments, comp. with Edge-intersection, OSim
darker is better results
More experiments (ickr)
Conclusions


      Exist approximate and scalable methods for faceted ranking in
      collaborative tagging systems.
      Functional web prototype: Egg-O-Matic

                   http://egg-o-matic.itba.edu.ar




      Loose Ends
          Using weighted graphs.
          Scientic cites dataset (real egos!).
          Industrial-sized dataset (10^7 instead   of 10^5 edges)
Prototype (1/2)
Prototype (2/2, last slide, thanks!)

More Related Content

WEBIST 2009

  • 1. Faceted Ranking In Collaborative Tagging Systems J. I. Orlicki12 P. Fierens2 J. I. Alvarez-Hamelin23 1 Core Security Technologies 2 ITBA 3 CONICET WEBIST 2009, Lisbon, Portugal
  • 2. The Problem (Faceted Reputation) Which ickr photographers are the best regarding a facet, i.e. tag set, { sea, portugal }? Nodes are users/channels, edges are favorites and tags are associated to the favorited content.
  • 3. Single Ranking (1/3) Basic approach, single rank and ltering. Scales well. Everything is biased to the richer nodes, tags don't inuence the ranking. G goes out, but why is D worstly ranked than A regarding {sea, portugal}? Is D better than C?
  • 4. Single Ranking (2/3) Basic approach, single rank and ltering. Scales well. Everything is biased to the richer nodes, tags don't inuence the ranking. G goes out, but why is D worstly ranked than A regarding {sea, portugal}? Is D better than C?
  • 5. Single Ranking (3/3) Basic approach, single rank and ltering. Scales well. Everything is biased to the richer nodes, tags don't inuence the ranking. G goes out, but why is D worstly ranked than A regarding {sea, portugal}? Is D better than C?
  • 6. Edge-intersection, 1st gold standard (1/3) Filtering edges including the conjunction of tags. Adequate tag bias, slightly restrictive.
  • 7. Edge-intersection, 1st gold standard (2/3) Filtering edges including the conjunction of tags. Adequate tag bias, slightly restrictive.
  • 8. Edge-intersection, 1st gold standard (3/3) Filtering edges including the conjunction of tags. Adequate tag bias, slightly restrictive.
  • 9. Node-intersection, 2nd gold standard (1/2) Filtering edges including the disjunction of tags to rank. Plus ltering conjuntion of nodes involved in every tag edge after ranking. Adequate tag bias, slightly irrestrictive, possibly one tag prevails over the other. c
  • 10. Node-intersection, 2nd gold standard (2/2) Filtering edges including the disjunction of tags to rank. Plus ltering conjuntion of nodes involved in every tag edge after ranking. Adequate tag bias, slightly irrestrictive, possibly one tag prevails over the other.
  • 11. The Scalability Problem The previous two algorithms don't scale for online queries. Another possibility is computing singleton facets oine, and later merge the results online. Oine time and spatial complexity will grow linearly on #edges #tags per edge. Scaling nicely. 100000 YouTube Flickr 10000 1000 # edges 100 10 1 0.1 1 10 100 1000 # tags
  • 12. Singleton facets, computed oine (1/2) Singleton facet subgraphs used in ranking, after that only best K users stored, where K is small.
  • 13. Singleton facets, computed oine (2/2) Singleton facet subgraphs used in ranking, after that only best K users stored, where K is small.
  • 14. Probability-product Inspired by the probability independence rule, multiply PageRank probability of single tags. sea portugal rank! A 0.09 0.02 0.0018 #6 B 0.14 0.04 0.0056 #4 C 0.14 0.40 = 0.0560 #2 D 0.38 0.39 0.1482 #1 E 0.14 0.07 0.0098 #3 F 0.09 0.05 0.0045 #5 Possible bias towards the heaviest tag, eclipsing the others.
  • 15. Rank-sum Lowest accumulated ordinal/position sum gets the best ranks. sea portugal rank! A #3 #6 9 #5 B #2 #5 7 #4 C #2 + #2 = 4 #2 D #1 #1 2 #1 E #2 #3 5 #3 F #3 #4 7 #4 Avoids this kind of topic drift towards one of the tags.
  • 16. Winners-intersection Top W (small) nodes per singleton facet are used to build a new small graph. W = 500 in experiments (W = 3 in example). sea portugal A #3 B #2 C #2 #2 = C D #1 #1 D E #2 #3 E F #3
  • 17. Experiments, comp. with Edge-intersection, OSim darker is better results
  • 19. Conclusions Exist approximate and scalable methods for faceted ranking in collaborative tagging systems. Functional web prototype: Egg-O-Matic http://egg-o-matic.itba.edu.ar Loose Ends Using weighted graphs. Scientic cites dataset (real egos!). Industrial-sized dataset (10^7 instead of 10^5 edges)
  • 21. Prototype (2/2, last slide, thanks!)