This document provides an overview of search engines, including what they are, how they work, and the evolution of major search engines over time. It discusses how search engines use web crawlers to index web pages and how they developed ranking algorithms to return relevant results. Key points include:
- Search engines allow users to find information on the internet through keyword searches. They index web pages using crawlers and return ranked results based on relevance and popularity.
- Major early search engines included AltaVista, Yahoo, Ask Jeeves, and others. Google revolutionized search in 1998 with its PageRank algorithm that analyzed backlinks.
- Search engine algorithms consider many on-page and off-page
3. What is a Search Engine?
? Search engines are the key to find specific information on the vast expanse of the World Wide Web.
? Without search engines, it would be virtually impossible to locate anything on the Web without
knowing a specific URL.
? Search engines use automated software (known as robots or spiders) that follow links on the
websites, thus harvesting information as they go.
? Search engines are also known as answer machines. When a person performs an online search, the
search engine scours its corpus of billions of documents and does two things:
1.) It returns only those results that are relevant or useful to the searcher's query;
2.) It ranks those results according to the popularity of the websites serving the information. It is both
relevance and popularity that the process of SEO is meant to influence.
4. What is SEO?
? Imagine for a minute that you are the
librarian
? People across the world depend upon
you for the exact information they need
? For this we need a system to know
what¡¯s inside every book & how books
relate to each other.
? System needs to take a lot of
information & send out the best
answers for the questions.
? Search engines like Google & Yahoo are
the librarians of the Internet.
5. ? Their systems collect information about every page on
the web so they can help people find what exactly
they are looking for.
? Every search engine has a secret recipe called
Algorithm for turning all that information into useful
search results.
? When pages have higher Rankings they have more
people finding those higher pages.
? The key to higher Ranking is making sure your website
has ingredients search engines need for their
Algorithm. This is called as Search Engine
Optimization.
7. Search Engine Optimization
¨C The process of maximizing the
number of visitors to a
particular website by ensuring
that the site appears high on
the list of results returned by
search engines.
8. Birth of Search Engines
? The concept of hypertext & memory extension came
to life in July 1945 when Vannevar Bush¡¯s ¡°As We
May Think¡± was published in The ¡®Atlantic Monthly¡¯.
? He urged scientists to work together to help build a
body of knowledge for all mankind.
? He then proposed the idea of a virtually limitless,
fast, reliable, extensible, associative memory
storage & retrieval system. Vannevar Bush named
this device a ¡°MEMEX¡±.
9. ? Ted Nelson created the Project Xanadu in 1960 & coined
the term ¡°Hypertext ¡° in 1963, much of the inspiration to
create the WWW was drawn from Ted¡¯s work hence he is
rightly called as the father of ¡®Search Engines¡¯.
? ARPANET is the network which eventually led to the
Internet.
? Packet switching was based on the concepts & designs
by American scientists Leonard Kleinrock & Paul Baran
of the Lincoln Laboratory.
? The ARPANET was an early packet switching network &
the first network to implement the protocol suite
¡®TCP/IP¡¯.
? The ARPANET was operated by the military during the
two decades of its existence until 1990.
11. ? First Internet search engine created was Archie, in
1990 by Alan Emtage a student at McGill University in
Montreal.
? Archie was a database of web filenames which it would
match with the users query.
? Later Veronica was developed which served the same
purpose as Archie but it worked on plain text files.
? Soon another user interface named Jughead appeared
& both were used for sending files via Gopher which
served an alternative to Archie by Mark McCahill at the
university of Minnesota in 1991.
12. ? Tim Berners-Lee designed & built the first web
browser & editor called httpd.
? The first website built was http://info.cern.ch/ & was
put online on August 6,1991.
? In 1994, Berners ¨CLee founded World Wide Web
Consortium at Massachusetts Institute of Technology.
? IN 1993 Martijn Koster created Archie-Like Indexing of
the Web, or ALIWEB.
? ALIWEB crawled meta information & allowed users to
submit their pages which they wanted to index with
their own page descriptions.
13. What is a Computer Bot?
? Computer robots are simply programs that automate
repetitive tasks at speeds impossible for humans to
reproduce.
? The term ¡®Bot¡¯ on the Internet is usually used to describe
anything that interfaces with the user or that collects data.
? Search engines use ¡°spiders¡± which search the web for
information. They read the contents of pages for indexing
& also record the links.
? Another eg. Is Chatterbots which attempts to act like a
human & communicate with humans on said topic.
15. Primitive Web Search
? By December of 1993, three full fledged bot fed search
engines had surfaced on the web: JumpStation, the World
Wide Web Worm, and the Repository-Based Software
Engineering (RBSE) spider.
? JumpStation & the WWW worm gathered info about the
title & URL¡¯s from webpages & retrieved these using a
simple linear search.
? The problem with JumpStation & the WWW worm is that
they listed results in the order that they found them &
provided no discimination.
? The RBSE spider implemented a ranking system.
16. ALTA VISTA
? AltaVista provided
numerous search tips &
advanced search features to
the web.
? They had nearly unlimited
bandwidth, the first to allow
natural language queries,
advanced searching
techniques & they also
allowed users to add or
delete their own URL within
24 hrs.
? Due to poor mismanagement
& fear of result manipulation
AltaVista was largely driven
into irrelevancy.
17. WEB CRAWLER
? On April 20, 1994
Brian Pinkerton of
the university of
Washington released
WebCrawler .
? It was the first
crawler which
indexed entire pages.
? In 1997 Excite
bought out
WebCrawler.
18. ? Lycos was the next major search development designed at
Carnegie Mellon University around July of 1994 by Michale
Mauldin.
? Lycos provided ranked relevance retrieval, prefix matching
& word proximity.
? In November 1996, Lycos had indexed over 60 million
documents-more than any other web search engine.
? In October 1994, Lycos ranked first on Netscape¡¯s list of
search engines by finding the most hits on the word ¡°surf¡±.
19. ? In April 1994, David Filo & Jerry Yang created
the Yahoo! Directory as a collection of their
favourite web pages.
? As their no of links grew they had to reorganize
& become a searchable directory.
? On September 26, 2014 Yahoo! Announced they
would close the Yahoo! Directory at the end of
2014.
20. ? LookSmart was founded in 1995.
? LookSmart was too dependent on
MSN & in 2003 Microsoft announced
that they were discontinuing
LookSmart that basically killed their
business model.
21. ? The Inktomi Corporation came about on
May20, 1996 with its search engine Hotbot.
? They failed to develop a profitable business
model & sold out to Yahoo! For approx
$235 million in December 2003.
22. Ask.com(formerly Ask
Jeeves):
? In April 1997, Ask Jeeves
was launched as a natural
language search engine. It
used human editors to
match the search queries.
? In 2001, Ask Jeeves bought
Teoma to replace DirectHit
search technology.
? On March 21, 2005 Barry
Diller¡¯s IAC agreed to
acquire Ask Jeeves for 1.85
billion dollars.
? In 2006, Ask Jeeves was
renamed to Ask.
24. MICROSOFT
? In 1998 MSN Search was launched, but
Microsoft did not get serious about search until
after Google proved the business model.
? On September 11, 2006, Microsoft launched
their in house search technology Live
Searchproduct.
? On June 1, 2009, Microsoft launched Bing, a new
search service which changed the search
landscape by placing inline search suggestions
for related searches directly in the result set.
? For eg. When one searches for credit cards they
will suggest related phrases like ¡°credit card
types¡±, ¡°apply for credit cards¡±,¡± advice on
credit cards¡±, etc.
25. Yahoo!
? Getting Into Search: Yahoo! was founded in 1994 by David
Filo and Jerry Yang as a directory of websites.
? Overture purchased AllTheWeb and AltaVista in 2003.
Yahoo! purchased Inktomi in December, 2002, and then
consumed Overture in July, 2003 & combined the
technologies from the various search companies they
bought to make a new search engine.
? On March 20, 2005 Yahoo! purchased Flickr, a popular
photo sharing site.
? On December 9, 2005 Yahoo! Purchased Del.icio.us a social
bookmarking site.
? Yahoo! also made a strong push to promote Yahoo!
Answers, a popular free community driven question
answering service.
? On July 29, 2009, Yahoo! decided to give up on search and
signed a 10 yr deal to syndicate Bing ads and algorithmic
results on their website.
26. GOOGLE
? In 1995, Larry Page met Sergey Brin at
Stanford.
? By January of 1996, Larry and Sergey had
begun collaboration on a search engine
called BackRub, named for its unique ability
to analyze the "back links" pointing to a
given website
? A year later their unique approach to link
analysis was earning BackRub growing
reputation.
? BackRub ranked pages using citation
notation. In the Page Rank algorithm links
count as votes i.e. how many people link to
you & how trustworthy those links are.
? In 1998, Google was launched. Sergey tried
to shop their PageRank technology, but
nobody was interested in buying or licensing
their search technology at that time
28. The Main Parts of a Search Engine
? Spider (or ¡°web crawler¡±)
? Indexer
? Search software (an algorithm)
29. 1. Web Crawling
? A web crawler (also known as a web spider or web robot) is a program
or automated script which browses the World Wide Web in a methodical,
automated manner. This process is called Web crawling or spidering.
? Many legitimate sites, in particular search engines, use spidering as a
means of providing up-to-date data.
? Web crawlers are mainly used to create a copy of all the visited pages &
are also used for automating maintenance tasks on a Web site, such as
checking links or validating HTML code.
? For eg:Imagine the World Wide Web as a network of stops in a big city
subway system. Each stop is a unique document (usually a web page, but
sometimes a PDF, JPG, or other file). The search engines need a way to
¡°crawl¡± the entire city and find all the stops along the way, so they use the
best path available¡ªlinks.
30. ? Links allow the search engines' automated robots, called
"crawlers" or "spiders," to reach the many billions of
interconnected documents on the web.
? Once the engines find these pages, they decipher the code
from them and store selected pieces in massive databases,
to be recalled later when needed for a search query.
? The monstrous storage facilities hold thousands of
machines processing large quantities of information very
quickly.
? When a person performs a search at any of the major
engines, they demand results instantaneously; even a one-
or two-second delay can cause dissatisfaction, so the
engines work hard to provide answers as fast as possible.
31. Indexing
? Search engine indexing is the process of a search engine collecting, parses and
stores data for use by the search engine.
? The actual search engine index is the place where all the data the search
engine has collected is stored. It is the search engine index that provides the
results for search queries, and pages that are stored within the search engine
index that appear on the search engine results page.
? Without a search engine index, the search engine would take considerable
amounts of time and effort each time a search query was initiated, as the
search engine would have to search not only every web page or piece of data
that has to do with the particular keyword used in the search query, but every
other piece of information it has access to, to ensure that it is not missing
something that has something to do with the particular keyword.
? There are many different parts to a search engine index, such as design factors
and data structures.
32. ? The design factors of a search engine index design decide how the
index actually works. The parts all combine to create the working
of search engine index, and include:
? Index size-which pertains to the amount of computer space
necessary to support the index.
? Storage techniques-which is the decision of the information
should be stored .Larger files are compressed while smaller files
are simply filtered.
? Fault tolerance-refers to the issue of how important it is for the
search engine index to be reliable.
? Lookup speed-is exactly as it sounds, pertaining to how quickly a
word can be found when the data is searched in the inverted index.
? Maintenance-is an important factor as well because the better
maintained a search engine index, the better it works.
33. What is a Search Engine Algorithm?
? A search algorithm is defined as a math formula that takes a
problem as input and returns a solution to the problem,
usually after evaluating a number of possible solutions.
? In simple words, a search engine algorithm is a set of rules,
or a unique formula, that the search engine uses to
determine the significance or rankings of a web page, and
each search engine has its own set of rules.
? Search algorithm sorts on the basis of many things like
location of keyword, synonyms, adjacent words, etc
? But there are certain things that all search engine
algorithms have in common.
35. Relevancy
? This is the First thing every search engine checks.
? The algorithm will determine whether this web
page has any relevancy at all for the particular
keyword.
? Location of keywords in that page is also
important for the relevancy of that website.
? Web pages that have the keywords in the title, as
well as within the headline or the first few lines of
the text will rank better for that keyword than
websites that do not have these features
37. Off-Page Factors
? Another part of algorithms that is still individual
to each search engine are off-page factors.
? Off-page factors are such things as click-through
measurement and linking.
? The frequency of click-through rates and linking
can be an indicator of how relevant a web page is
to actual users and visitors, and this can cause an
algorithm to rank the web page higher.
? Off-page factors are harder for web masters to
craft, but can have an enormous effect on page
rank depending on the search engine algorithm.
39. They are classified based on Content/Topic, Type of Information and Model. They are
further sub categorized as:
¨CContent/topic:
General:
? A general search engine operates using a search algorithm. Websites that are listed in the
search engine's directory are used to search for information based on various search
qualities. The goal is that the user gets relevant results with useful pages.
? Examples: , bing.com, duckduckdo.com, exalead.com, google.co.in, munax.com
Metasearch Engines:
? A metasearch engine (or aggregator) is a search tool that uses another search engine's data
to produce their own results from the Internet. Metasearch engines take input from a user
and simultaneously send out queries to third party search engines for results.
? Examples: Blingo, Yippy, DeeperWeb, Dogpile, Excite, HotBot, Info.com, Mamma,
Metacrawler, Mobissimo, Otalo, and Skyscanner.
40. ¨C Geographically Limited:
? Which search engine you use which has Geographical limited scope. Means that engine
finds search result only related to that geographical area. In this we are not counting
Google.co.in or many others like this. Here is a list of some of those Search Engines
with their respective Geographical area.
? Examples: Accoona, Ansearch, Biglobe, Daum, Egerin, Goo, Leit.is, Maktoob, Miner.hu,
Najdi.si, Naver, Onkosh, Rambler, Rediff, SAPO, Search.ch, Sesam, Seznam, Ziplocal, etc.
¨C Semantic:
? Semantic search seeks to improve search accuracy by understanding searcher intent
and the contextual meaning of terms as they appear in the searchable dataspace,
whether on the Web or within a closed system, to generate more relevant results.
Semantic search systems consider various points including context of search, location,
intent, variation of words, synonyms, generalized and specialized queries, concept
matching and natural language queries to provide relevant search results. Major web
search engines like Google and Bing incorporate some elements of semantic search.
? Examples: Sophia Search, Evi, Yummy, Swoogle.
41. ¨C Business:
? Business search helps us to keep in touch with the global world. All
the latest information regarding the dynamic market is available
with just a click.
? Examples: Business.com, Getit Infoservices Private Limited,
GenieKnows, GlobalSpec, Nexis(Lexis Nexis), Thomasnet, Justdial.
¨C Academic Materials Only:
? Examples: BASE, CiteULike, GoogleScholar, Library of congress,
Shodan, Noodle Education, SkilledUp, Chegg.
¨C Enterprise:
? Examples: Funnelback, Jumper 2.0, Oracle Corporation, Q-Sensei,
TeraText, SimilarWeb, Swifttype.
¨C Jobs:
? Examples: Adzuna, Bixee.com, CareerBuilder.com, Craigslist,
Dice.com, Eluta.com, Hotjobs.com, JobSreet.com, Incruit,
Indeed.com, Glassdooor.com, LinkUp.com, Monster.com,
Naukri.com.
46. Winning the Search War
? Later that year Andy Bechtolsheim gave them $100,000 seed funding and
Google received $25 million Sequoia Capital .
? In 1999 AOL selected Google as a search partner.
? In 2000 Google also launched their popular Google Toolbar.
? On May 1, 2002, AOL announced they would use Google to deliver their
search related ads, which was a strong turning point in Google's battle
against Overture.
? In 2003 Google also launched their AdSense program, which allowed them
to expand their ad network by selling targeted ads on other websites.
50. ? Crawling & Indexing:- Search starts with the web. It¡¯s made up of over
60 trillion individual pages & it¡¯s constantly growing. Google navigates
the web by crawling that means it follows links from page to page.
Pages are sorted by their content & other factors & it keeps a track of it
in ¡®THE INDEX¡¯ (It¡¯s over 100 million gigabytes)
? Algorithms:- Work looking for clues to better understand the user
means. Based on the clues we pull relevant documents from ¡®The
Index¡¯.The results are ranked according to freshness, site& page quality,
safe search, user context, translation, Universal search. These results
can take a variety of forms. (All this happens in 1/8th of a sec)
The Search Lab: The algorithms are constantly changing. These
changes begin as ideas in the minds of the engineers. They take these ideas
and run experiments, analyze the results, tweak them & run them again&
again to get the following results:
51. ? Knowledge Graph:-Provides results based on a database of real world
people, places, things & connections between them.
? Snippets:-Shows small previews of information, such as a page¡¯s title&
short descriptive text for each search results.
? News:-Includes results from online newspapers & blogs from around
the world.
? Answers:-Displays immediate answers & information for things such as
the weather, sports, scores, quick facts,
? Videos:-Shows video-based results with thumbnails so you can quickly
decide which video to watch.
? Refinements:-Provides features like ¡®Advanced Search¡¯ related searches
& other search tools, all of which helps one find the respective search.
? Voice search:-With the Google search app simply say what you want
and get answers spoken right back to you.
? Mobile:-Include improvements designed specifically for mobile devices
such as tablets & smartphones.
52. Fighting Spam
Google fights spam 24/7 to keep the results relevant.The majority of spam
removal is automatic.Other questionable documents are examined by hand and
incase any spam is detected manual action is taken.
Types of Spams:-
? Unnatural links from a site: Google detected a pattern of unnatural, artificial,
deceptive or manipulative outbound links on this site.This may be the result of
selling links that pass PageRank or participating in link schemes.
? Cloaking &/ Sneaky redirects:-Site appears to be cloaking(displaying different
content to human users than is shown to search engines) to redirecting users to a
different page than google.
? Hacked Site:-Some pages on this site may have been hacked by a third party to
display spammy content or links.Websites owners should take immediate actions to
clean their sites & fix any security vulnerabilities.
? Hidden text &/ or keyword stuffing:-Some of the pages may contain hidden texts
&/ keyword stuffing.
53. And that¡¯s how Google search engines
works. Behind a simple page of results is a
complex system, carefully crafted & tested,
to support more than one hundred billion
searches each month.
55. ? A web search engine is a software system that is designed to
search for information on the World Wide Web.
? Working process of search engines which starts with Web
crawling, Indexing and Searching which uses an algorithm to
give relevant search results within fraction of seconds.
? History of Search engines from its inception i.e in 1945 when
Vannevar Bush proposed the visionary idea of maintaining a
record of all the knowledge available to mankind which led to
an era of revolution for search engines.
? Various type of search engines like metasearch engine,
business, educational, social, LookSmart,Lycos, Microsoft,
Yahoo, human answers machines like Quora, etc.
? It saves time and gives us the precise & relevant information
needed.
Editor's Notes
1.Search engines accounts for every word on the web.
2. Each page on the web has an official Title
3. Links between websites matter. When one page links other it is usually a recommendations telling readers.
4. Words that are used in the links matter
5. Search engine care about reputation
<number>