Tushar Joshi works as a software architect and JUG leader at Persistent Systems. He is also a member of the NetBeans Dream Team. The document discusses the key components and processes of how a basic search engine works, including crawlers that scan web pages and build an index, sorting and ranking search results based on relevance metrics like page rank, and caching results for faster retrieval rather than conducting real-time searches. It also provides some background on the origins and meaning of the name "Google".
Convert to study guideBETA
Transform any presentation into a summarized study guide, highlighting the most important points and key insights.
2. The list of search results come from
database
The list is processed by algorithms
to sort them with
Importance
PageRank
Relevance
The search result is not real time, it
is a cache
3. Googol means 1 x 10100 or in other words 1
with 100 zeros
Used to coin the name Google
Google became a verb in dictionary, meaning
searching the internet using Google Search
Engine
4. C姻温敬鉛艶姻壊 or Robots or
Spider
Database,
Cache,
Page-store
Interface, Presentation
7. Software Engine
Runs continuously
Scans web pages using links to navigate
Backbone of any search engine
8. Makes special index entries for web pages
ID for each page downloaded
List of all the words appearing in web page
Sorting web page IDs in barrels of keywords
Assigning properties to web pages
Outward Links
Inward Links
Meta Tags, Headings, Structure
Calculated Page Rank
9. Storage of web page cache
Huge amount of data
Stored across multiple storage hardware and
computer assemblies using proprietary
storage software