際際滷

際際滷Share a Scribd company logo
The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University Presented by Guoqiang Su & Wei Li
Contents Motivation Related work Page Rank & Random Surfer Model Implementation Application Conclusion
Motivation Web: heterogeneous and unstructured Free of quality control on the web Commercial interest to manipulate ranking
Related Work Academic citation analysis Link-based analysis Clustering methods of link structure Hubs & Authorities Model
Backlink Link Structure of the Web Approximation of importance / quality
PageRank Pages with lots of backlinks are important Backlinks coming from important pages convey more importance to a page Problem: Rank Sink
Rank Sink Page cycles pointed by some incoming link Problem: this loop will accumulate rank but never distribute any rank outside
Escape Term Solution: Rank Source c is maximized and  = 1 E(u) is some vector over the web pages  uniform, favorite page etc.
Matrix Notation R is the dominant eigenvector and c is the dominant eigenvalue of  because c is maximized
Computing PageRank - initialize vector over web pages loop: - new ranks sum of normalized backlink ranks      - compute normalizing factor   - add escape term   - control parameter while  - stop when converged
Random Surfer Model Page Rank corresponds to the probability distribution of a random walk on the web graphs E(u) can be re-phrased as the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever
Implementation Computing resources   24 million pages   75 million URLs Memory and disk storage Weight Vector  (4 byte float)   Matrix  A  (linear access)
Implementation (Con't) Unique integer ID for each URL Sort and Remove dangling links Rank initial assignment Iteration until convergence Add back dangling links and Re-compute
Convergence Properties Graph (V, E) is an  expander  with factor    if for all (not too large) subsets S: |As|     |s| Eigenvalue separation:  Largest eigenvalue is sufficiently larger than the second-largest eigenvalue Random walk  converges fast to a limiting probability distribution on a set of nodes in the graph.
Convergence Properties (con't) PageRank computation is O(log(|V|)) due to rapidly  mixing graph  G of the web.
Personalized PageRank Rank Source E can be initialized :   uniformly over all pages:  e.g. copyright  warnings, disclaimers, mailing lists archives    result in overly high ranking   total weight on a single page,  e.g .  Netscape, McCarthy    great variation of ranks under different single pages  as rank source   and everything in-between,  e.g. server root pages    allow manipulation by commercial interests
Applications I Estimate web traffic   Server/page aliases   Link/traffic disparity, e.g. porn sites, free web-mail Backlink predictor   Citation counts have been used to predict future citations    very difficult to map the citation structure of the web completely   avoid the local maxima that citation counts get stuck in and get better performance
Applications II - Ranking Proxy Surfer's Navigation Aid Annotating links by PageRank (bar graph) Not query dependent
Issues Users are no random walkers   Content based methods Starting point distribution     Actual usage data as starting vector Reinforcing effects/bias towards main pages How about traffic to ranking pages? No query specific rank Linkage spam   PageRank favors pages that managed to get other pages to link to  them   Linkage not necessarily a sign of relevancy, only of promotion  (advertisement)
Evaluation I
Evaluation II
Conclusion PageRank is a global ranking based on the web's graph structure PageRank use backlinks information to bring order to the web PageRank can separate out representative pages as cluster center A great variety of applications
Ad

Recommended

INTRODUCCION A LA FINANZA
INTRODUCCION A LA FINANZA
guest9d0a6f
PAGERANK
PAGERANK
marco larco
Pagerank
Pagerank
Jos辿 Luis Vera Ch叩vez
Pagerank
Pagerank
alvaro luis
PAGERANKS
PAGERANKS
David
Pagerank
Pagerank
alexander
Pagerank
Pagerank
miguel angel andrade moreno
Spam derecho informatico_(2)
rodrigoosco
Internet
Vivi Pillajo
永姻艶壊艶稼岳温界庄坦稼1姻看壊壊壊看乙看稼噛温鉛艶沿庄噛温韓艶姻稼温稼糸看乙
pablin
Introduccion a las Bases de Datos
andreapguzman
仂亠从仆舒 舒弍仂舒 仗仂 仂仆仂于舒仄 亠仂亳亳 亟亳亰舒亶仆舒 "亳亰舒亶仆 于亠仆亳仆仂亶 仗仂亟从亳亳".
仂亠从仆舒 舒弍仂舒 仗仂 仂仆仂于舒仄 亠仂亳亳 亟亳亰舒亶仆舒 "亳亰舒亶仆 于亠仆亳仆仂亶 仗仂亟从亳亳".
丐舒仆 仂于舒
Tik bab1 bab3
Tik bab1 bab3
Anisyuliani
Cantares
alfre1240
LA SALUD. CLARA
Sagrario Fern叩ndez Ruiz
Ranking Web Pages
Ranking Web Pages
elliando dias
Random web surfer pagerank algorithm
Random web surfer pagerank algorithm
alexandrelevada
Page rank by university of michagain.ppt
Page rank by university of michagain.ppt
rayyverma
PageRank Algorithm In data mining
PageRank Algorithm In data mining
Mai Mustafa
Markov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdf
rayyverma
Page Rank
Page Rank
Silvia Quimis
Page Rank
Page Rank
Silvia Quimis
PageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_Habib
El Habib NFAOUI
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTES
Subhajit Sahu
Page Rank
Page Rank
Kevin Mirallas
Verdad incomoda
Bryan Molina
El azufre
Bryan Molina
Videos ecologia
Bryan Molina
Hechos histricos de_la_ecologa
Bryan Molina
Cuenca
Bryan Molina

More Related Content

Viewers also liked (7)

Internet
Vivi Pillajo
永姻艶壊艶稼岳温界庄坦稼1姻看壊壊壊看乙看稼噛温鉛艶沿庄噛温韓艶姻稼温稼糸看乙
pablin
Introduccion a las Bases de Datos
andreapguzman
仂亠从仆舒 舒弍仂舒 仗仂 仂仆仂于舒仄 亠仂亳亳 亟亳亰舒亶仆舒 "亳亰舒亶仆 于亠仆亳仆仂亶 仗仂亟从亳亳".
仂亠从仆舒 舒弍仂舒 仗仂 仂仆仂于舒仄 亠仂亳亳 亟亳亰舒亶仆舒 "亳亰舒亶仆 于亠仆亳仆仂亶 仗仂亟从亳亳".
丐舒仆 仂于舒
Tik bab1 bab3
Tik bab1 bab3
Anisyuliani
Cantares
alfre1240
LA SALUD. CLARA
Sagrario Fern叩ndez Ruiz
Internet
Vivi Pillajo
永姻艶壊艶稼岳温界庄坦稼1姻看壊壊壊看乙看稼噛温鉛艶沿庄噛温韓艶姻稼温稼糸看乙
pablin
Introduccion a las Bases de Datos
andreapguzman
仂亠从仆舒 舒弍仂舒 仗仂 仂仆仂于舒仄 亠仂亳亳 亟亳亰舒亶仆舒 "亳亰舒亶仆 于亠仆亳仆仂亶 仗仂亟从亳亳".
仂亠从仆舒 舒弍仂舒 仗仂 仂仆仂于舒仄 亠仂亳亳 亟亳亰舒亶仆舒 "亳亰舒亶仆 于亠仆亳仆仂亶 仗仂亟从亳亳".
丐舒仆 仂于舒
Tik bab1 bab3
Tik bab1 bab3
Anisyuliani
Cantares
alfre1240
LA SALUD. CLARA
Sagrario Fern叩ndez Ruiz

Similar to Pagerank(2) (10)

Ranking Web Pages
Ranking Web Pages
elliando dias
Random web surfer pagerank algorithm
Random web surfer pagerank algorithm
alexandrelevada
Page rank by university of michagain.ppt
Page rank by university of michagain.ppt
rayyverma
PageRank Algorithm In data mining
PageRank Algorithm In data mining
Mai Mustafa
Markov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdf
rayyverma
Page Rank
Page Rank
Silvia Quimis
Page Rank
Page Rank
Silvia Quimis
PageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_Habib
El Habib NFAOUI
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTES
Subhajit Sahu
Page Rank
Page Rank
Kevin Mirallas
Ranking Web Pages
Ranking Web Pages
elliando dias
Random web surfer pagerank algorithm
Random web surfer pagerank algorithm
alexandrelevada
Page rank by university of michagain.ppt
Page rank by university of michagain.ppt
rayyverma
PageRank Algorithm In data mining
PageRank Algorithm In data mining
Mai Mustafa
Markov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdf
rayyverma
PageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_Habib
El Habib NFAOUI
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTES
Subhajit Sahu
Ad

More from Bryan Molina (7)

Verdad incomoda
Bryan Molina
El azufre
Bryan Molina
Videos ecologia
Bryan Molina
Hechos histricos de_la_ecologa
Bryan Molina
Cuenca
Bryan Molina
Google
Bryan Molina
Pagerank(2)
Pagerank(2)
Bryan Molina
Verdad incomoda
Bryan Molina
El azufre
Bryan Molina
Videos ecologia
Bryan Molina
Hechos histricos de_la_ecologa
Bryan Molina
Cuenca
Bryan Molina
Google
Bryan Molina
Ad

Recently uploaded (20)

FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
The Future of Data, AI, and AR: Innovation Inspired by You.pdf
The Future of Data, AI, and AR: Innovation Inspired by You.pdf
Safe Software
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
Key Requirements to Successfully Implement Generative AI in Edge DevicesOpt...
Key Requirements to Successfully Implement Generative AI in Edge DevicesOpt...
Edge AI and Vision Alliance
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
Wenn alles versagt - IBM Tape sch端tzt, was z辰hlt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape sch端tzt, was z辰hlt! Und besonders mit dem neust...
Josef Weingand
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
The Future of Data, AI, and AR: Innovation Inspired by You.pdf
The Future of Data, AI, and AR: Innovation Inspired by You.pdf
Safe Software
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
Key Requirements to Successfully Implement Generative AI in Edge DevicesOpt...
Key Requirements to Successfully Implement Generative AI in Edge DevicesOpt...
Edge AI and Vision Alliance
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
Wenn alles versagt - IBM Tape sch端tzt, was z辰hlt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape sch端tzt, was z辰hlt! Und besonders mit dem neust...
Josef Weingand
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante

Pagerank(2)

  • 1. The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University Presented by Guoqiang Su & Wei Li
  • 2. Contents Motivation Related work Page Rank & Random Surfer Model Implementation Application Conclusion
  • 3. Motivation Web: heterogeneous and unstructured Free of quality control on the web Commercial interest to manipulate ranking
  • 4. Related Work Academic citation analysis Link-based analysis Clustering methods of link structure Hubs & Authorities Model
  • 5. Backlink Link Structure of the Web Approximation of importance / quality
  • 6. PageRank Pages with lots of backlinks are important Backlinks coming from important pages convey more importance to a page Problem: Rank Sink
  • 7. Rank Sink Page cycles pointed by some incoming link Problem: this loop will accumulate rank but never distribute any rank outside
  • 8. Escape Term Solution: Rank Source c is maximized and = 1 E(u) is some vector over the web pages uniform, favorite page etc.
  • 9. Matrix Notation R is the dominant eigenvector and c is the dominant eigenvalue of because c is maximized
  • 10. Computing PageRank - initialize vector over web pages loop: - new ranks sum of normalized backlink ranks - compute normalizing factor - add escape term - control parameter while - stop when converged
  • 11. Random Surfer Model Page Rank corresponds to the probability distribution of a random walk on the web graphs E(u) can be re-phrased as the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever
  • 12. Implementation Computing resources 24 million pages 75 million URLs Memory and disk storage Weight Vector (4 byte float) Matrix A (linear access)
  • 13. Implementation (Con't) Unique integer ID for each URL Sort and Remove dangling links Rank initial assignment Iteration until convergence Add back dangling links and Re-compute
  • 14. Convergence Properties Graph (V, E) is an expander with factor if for all (not too large) subsets S: |As| |s| Eigenvalue separation: Largest eigenvalue is sufficiently larger than the second-largest eigenvalue Random walk converges fast to a limiting probability distribution on a set of nodes in the graph.
  • 15. Convergence Properties (con't) PageRank computation is O(log(|V|)) due to rapidly mixing graph G of the web.
  • 16. Personalized PageRank Rank Source E can be initialized : uniformly over all pages: e.g. copyright warnings, disclaimers, mailing lists archives result in overly high ranking total weight on a single page, e.g . Netscape, McCarthy great variation of ranks under different single pages as rank source and everything in-between, e.g. server root pages allow manipulation by commercial interests
  • 17. Applications I Estimate web traffic Server/page aliases Link/traffic disparity, e.g. porn sites, free web-mail Backlink predictor Citation counts have been used to predict future citations very difficult to map the citation structure of the web completely avoid the local maxima that citation counts get stuck in and get better performance
  • 18. Applications II - Ranking Proxy Surfer's Navigation Aid Annotating links by PageRank (bar graph) Not query dependent
  • 19. Issues Users are no random walkers Content based methods Starting point distribution Actual usage data as starting vector Reinforcing effects/bias towards main pages How about traffic to ranking pages? No query specific rank Linkage spam PageRank favors pages that managed to get other pages to link to them Linkage not necessarily a sign of relevancy, only of promotion (advertisement)
  • 22. Conclusion PageRank is a global ranking based on the web's graph structure PageRank use backlinks information to bring order to the web PageRank can separate out representative pages as cluster center A great variety of applications