狠狠撸

狠狠撸Share a Scribd company logo
PageRankWhat is PageRankWhy PageRankRelated work and problemsLink Structure of the WebDefinition of PageRankDangling LinksImplementation
PageRank(cont.)What is PageRank   In order to measure the relative importance of web pages, PageRank is proposed. It is a method for computing a ranking for every web page based on the graph of the web.
PageRank(cont.)Why PageRank__The World Wide Web is very large and     heterogeneous. __Search engines on the Web must also contend    with inexperienced users and pages engineered    to manipulate search engine ranking functions.    Unlike “flat” document collections, the WorldWide Web is hypertext and provides considerable
PageRank(cont.)auxiliary information on top of the text of the web pages, such as link structure and link text. We can take advantage of the link structure of the web to produce a PageRank of every web page. It helps search engines and users quickly make sense of the vast heterogeneity of the World Wide Web.
PageRank (Cont.)Related work and problems   __Backlink counts   Problem: for example, if a web page has a link off the Yahoo home page, it may be just one link but it is very important one. This page should be ranked higher than many pages with more    backlinks but from obscure places.   __The ranks and numbers of backlinksThis covers both the case that  when a page has many backlinks and when a page has a few highly ranked backlinks. Let u be a webpage,
PageRank (Cont.)
PageRank (Cont.)     be the set of pages that point to u.       be the number of links from u and let c be a factor used for normalization, thena simplified version of PageRank:
PageRank (Cont.)Problem: may form a rank sink. Consider two web pagesthat point to each other but to no other page. And if there issome web page which points to one of them. Then, duringiteration, this loop will accumulate rank but never distributeany rank. The loop forms a sort of trap called a rank sink.
PageRank (Cont.)Link Structure of the Web___Pages are as nodes___Links are as edges (outedges and inedges)Every page has some forward links (outedges) andbacklinks (inedges). We can never know whether wehave found all the backlinks of a particular page but if wehave downloaded it, we know all of its forward links at thattime. PageRank handles both cases and everything inbetween by recursively propagating weights through thelink structure of the web.
PageRank(Cont.)Definition of PageRankWe assume page A has pages T1,…,Tn, which point to it. The parameter d is a damping factorwhich can be set between 0 and 1(usually d isset to 0.85). Also C(A) is defined as the numberof links going out of page A. The PageRank of page A is given as follows:
T1PR=0.5AT2PR=0.3T3PR=0.13245PR(A)=(1-d) + d*(PR(T1)/C(T1) + PR(T2)/C(T2) + PR(T3)/C(T3))           =0.15+0.85*(0.5/3 + 0.3/4+ 0.1/5)
PageRank(Cont.)Let A be a square matrix with the rows and columncorresponding to web pages. Let                     if there is an edge from u to v and               if not. Ifwe treat R as a vector over web pages, then wehave                             . Here E is a uniform vector.Since                  , we can rewrite this as                             . So R is an eigenvector ofwith eigenvalue d.
PageRank(Cont.)Dangling LinksDangling links are simply links that point to any page withno outgoing links. They affect the model because it is notclear where their weights should be distributed, and thereare a large number of them. Because they do not affectthe ranking of any other page directly, we simply removethem from the system until all the PageRanks arecalculated. After all the PageRanks are calculated, theycan be added back in, without affecting things significantly.
PageRank(Cont.)ImplementationSort the link structure by ParentIDRemove dangling links from the link databaseMake an initial assignment of the ranksMemory is allocated for the weights for every pageAfter the weights have converged, add the dangling links back in and recompute the rankings

More Related Content

Page Rank

  • 1. PageRankWhat is PageRankWhy PageRankRelated work and problemsLink Structure of the WebDefinition of PageRankDangling LinksImplementation
  • 2. PageRank(cont.)What is PageRank In order to measure the relative importance of web pages, PageRank is proposed. It is a method for computing a ranking for every web page based on the graph of the web.
  • 3. PageRank(cont.)Why PageRank__The World Wide Web is very large and heterogeneous. __Search engines on the Web must also contend with inexperienced users and pages engineered to manipulate search engine ranking functions. Unlike “flat” document collections, the WorldWide Web is hypertext and provides considerable
  • 4. PageRank(cont.)auxiliary information on top of the text of the web pages, such as link structure and link text. We can take advantage of the link structure of the web to produce a PageRank of every web page. It helps search engines and users quickly make sense of the vast heterogeneity of the World Wide Web.
  • 5. PageRank (Cont.)Related work and problems __Backlink counts Problem: for example, if a web page has a link off the Yahoo home page, it may be just one link but it is very important one. This page should be ranked higher than many pages with more backlinks but from obscure places. __The ranks and numbers of backlinksThis covers both the case that when a page has many backlinks and when a page has a few highly ranked backlinks. Let u be a webpage,
  • 7. PageRank (Cont.) be the set of pages that point to u. be the number of links from u and let c be a factor used for normalization, thena simplified version of PageRank:
  • 8. PageRank (Cont.)Problem: may form a rank sink. Consider two web pagesthat point to each other but to no other page. And if there issome web page which points to one of them. Then, duringiteration, this loop will accumulate rank but never distributeany rank. The loop forms a sort of trap called a rank sink.
  • 9. PageRank (Cont.)Link Structure of the Web___Pages are as nodes___Links are as edges (outedges and inedges)Every page has some forward links (outedges) andbacklinks (inedges). We can never know whether wehave found all the backlinks of a particular page but if wehave downloaded it, we know all of its forward links at thattime. PageRank handles both cases and everything inbetween by recursively propagating weights through thelink structure of the web.
  • 10. PageRank(Cont.)Definition of PageRankWe assume page A has pages T1,…,Tn, which point to it. The parameter d is a damping factorwhich can be set between 0 and 1(usually d isset to 0.85). Also C(A) is defined as the numberof links going out of page A. The PageRank of page A is given as follows:
  • 11. T1PR=0.5AT2PR=0.3T3PR=0.13245PR(A)=(1-d) + d*(PR(T1)/C(T1) + PR(T2)/C(T2) + PR(T3)/C(T3)) =0.15+0.85*(0.5/3 + 0.3/4+ 0.1/5)
  • 12. PageRank(Cont.)Let A be a square matrix with the rows and columncorresponding to web pages. Let if there is an edge from u to v and if not. Ifwe treat R as a vector over web pages, then wehave . Here E is a uniform vector.Since , we can rewrite this as . So R is an eigenvector ofwith eigenvalue d.
  • 13. PageRank(Cont.)Dangling LinksDangling links are simply links that point to any page withno outgoing links. They affect the model because it is notclear where their weights should be distributed, and thereare a large number of them. Because they do not affectthe ranking of any other page directly, we simply removethem from the system until all the PageRanks arecalculated. After all the PageRanks are calculated, theycan be added back in, without affecting things significantly.
  • 14. PageRank(Cont.)ImplementationSort the link structure by ParentIDRemove dangling links from the link databaseMake an initial assignment of the ranksMemory is allocated for the weights for every pageAfter the weights have converged, add the dangling links back in and recompute the rankings