�ݺ�ߣ

Tushar Joshi
Software Architect / JUG Leader / NetBeans Dream
Team member
Works@Persistent Systems

 The list of search results come from
database
 The list is processed by algorithms
to sort them with
 Importance
 PageRank
 Relevance
 The search result is not real time, it
is a cache

 Googol means 1 x 10100 or in other words 1
with 100 zeros
 Used to coin the name Google
 Google became a verb in dictionary, meaning
searching the internet using Google Search
Engine

C��ɱ�� or Robots or
Spider
Database,
Cache,
Page-store
Interface, Presentation

Internet

Search Crawler

Cache, Indexes

Keywords sent for
Search

List of pages from
Cache

Sorting with relevance
and presentation

 Software Engine
 Runs continuously
 Scans web pages using links to navigate
 Backbone of any search engine

 Makes special index entries for web pages
 ID for each page downloaded
 List of all the words appearing in web page
 Sorting web page IDs in barrels of keywords
 Assigning properties to web pages
 Outward Links
 Inward Links
 Meta Tags, Headings, Structure
 Calculated Page Rank

 Storage of web page cache
 Huge amount of data
 Stored across multiple storage hardware and
computer assemblies using proprietary
storage software

 http://www.google.com/intl/en/about/com
pany/history.html
 http://en.wikipedia.org/wiki/History_of_Go
ogle
 http://infolab.stanford.edu/~backrub/googl
e.html
 http://www.youtube.com/watch?v=BNHR6IQ
JGZs

�ݺ�ߣ

How Google Search Works By Tushar Joshi

Convert to study guideBETA

More Related Content

How Google Search Works By Tushar Joshi