ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
SEARCH ENGINES
ABHAY KHANDALKAR
KISHOR MANE
SAIF ALI MIRZA
ESHA PALAV
DAKSHEH RATHOD
INTRODUCTION
What is a Search Engine?
? Search engines are the key to find specific information on the vast expanse of the World Wide Web.
? Without search engines, it would be virtually impossible to locate anything on the Web without
knowing a specific URL.
? Search engines use automated software (known as robots or spiders) that follow links on the
websites, thus harvesting information as they go.
? Search engines are also known as answer machines. When a person performs an online search, the
search engine scours its corpus of billions of documents and does two things:
1.) It returns only those results that are relevant or useful to the searcher's query;
2.) It ranks those results according to the popularity of the websites serving the information. It is both
relevance and popularity that the process of SEO is meant to influence.
What is SEO?
? Imagine for a minute that you are the
librarian
? People across the world depend upon
you for the exact information they need
? For this we need a system to know
what¡¯s inside every book & how books
relate to each other.
? System needs to take a lot of
information & send out the best
answers for the questions.
? Search engines like Google & Yahoo are
the librarians of the Internet.
? Their systems collect information about every page on
the web so they can help people find what exactly
they are looking for.
? Every search engine has a secret recipe called
Algorithm for turning all that information into useful
search results.
? When pages have higher Rankings they have more
people finding those higher pages.
? The key to higher Ranking is making sure your website
has ingredients search engines need for their
Algorithm. This is called as Search Engine
Optimization.
Search Algorithm
1.Words matter
2.Titles matter
3.Links matter
4.Words in Links
5.Reputation
Search Engine Optimization
¨C The process of maximizing the
number of visitors to a
particular website by ensuring
that the site appears high on
the list of results returned by
search engines.
Birth of Search Engines
? The concept of hypertext & memory extension came
to life in July 1945 when Vannevar Bush¡¯s ¡°As We
May Think¡± was published in The ¡®Atlantic Monthly¡¯.
? He urged scientists to work together to help build a
body of knowledge for all mankind.
? He then proposed the idea of a virtually limitless,
fast, reliable, extensible, associative memory
storage & retrieval system. Vannevar Bush named
this device a ¡°MEMEX¡±.
? Ted Nelson created the Project Xanadu in 1960 & coined
the term ¡°Hypertext ¡° in 1963, much of the inspiration to
create the WWW was drawn from Ted¡¯s work hence he is
rightly called as the father of ¡®Search Engines¡¯.
? ARPANET is the network which eventually led to the
Internet.
? Packet switching was based on the concepts & designs
by American scientists Leonard Kleinrock & Paul Baran
of the Lincoln Laboratory.
? The ARPANET was an early packet switching network &
the first network to implement the protocol suite
¡®TCP/IP¡¯.
? The ARPANET was operated by the military during the
two decades of its existence until 1990.
EVOLUTION OF SEARCH ENGINES
? First Internet search engine created was Archie, in
1990 by Alan Emtage a student at McGill University in
Montreal.
? Archie was a database of web filenames which it would
match with the users query.
? Later Veronica was developed which served the same
purpose as Archie but it worked on plain text files.
? Soon another user interface named Jughead appeared
& both were used for sending files via Gopher which
served an alternative to Archie by Mark McCahill at the
university of Minnesota in 1991.
? Tim Berners-Lee designed & built the first web
browser & editor called httpd.
? The first website built was http://info.cern.ch/ & was
put online on August 6,1991.
? In 1994, Berners ¨CLee founded World Wide Web
Consortium at Massachusetts Institute of Technology.
? IN 1993 Martijn Koster created Archie-Like Indexing of
the Web, or ALIWEB.
? ALIWEB crawled meta information & allowed users to
submit their pages which they wanted to index with
their own page descriptions.
What is a Computer Bot?
? Computer robots are simply programs that automate
repetitive tasks at speeds impossible for humans to
reproduce.
? The term ¡®Bot¡¯ on the Internet is usually used to describe
anything that interfaces with the user or that collects data.
? Search engines use ¡°spiders¡± which search the web for
information. They read the contents of pages for indexing
& also record the links.
? Another eg. Is Chatterbots which attempts to act like a
human & communicate with humans on said topic.
PRE MODERN SEARCH ENGINES
Primitive Web Search
? By December of 1993, three full fledged bot fed search
engines had surfaced on the web: JumpStation, the World
Wide Web Worm, and the Repository-Based Software
Engineering (RBSE) spider.
? JumpStation & the WWW worm gathered info about the
title & URL¡¯s from webpages & retrieved these using a
simple linear search.
? The problem with JumpStation & the WWW worm is that
they listed results in the order that they found them &
provided no discimination.
? The RBSE spider implemented a ranking system.
ALTA VISTA
? AltaVista provided
numerous search tips &
advanced search features to
the web.
? They had nearly unlimited
bandwidth, the first to allow
natural language queries,
advanced searching
techniques & they also
allowed users to add or
delete their own URL within
24 hrs.
? Due to poor mismanagement
& fear of result manipulation
AltaVista was largely driven
into irrelevancy.
WEB CRAWLER
? On April 20, 1994
Brian Pinkerton of
the university of
Washington released
WebCrawler .
? It was the first
crawler which
indexed entire pages.
? In 1997 Excite
bought out
WebCrawler.
? Lycos was the next major search development designed at
Carnegie Mellon University around July of 1994 by Michale
Mauldin.
? Lycos provided ranked relevance retrieval, prefix matching
& word proximity.
? In November 1996, Lycos had indexed over 60 million
documents-more than any other web search engine.
? In October 1994, Lycos ranked first on Netscape¡¯s list of
search engines by finding the most hits on the word ¡°surf¡±.
? In April 1994, David Filo & Jerry Yang created
the Yahoo! Directory as a collection of their
favourite web pages.
? As their no of links grew they had to reorganize
& become a searchable directory.
? On September 26, 2014 Yahoo! Announced they
would close the Yahoo! Directory at the end of
2014.
? LookSmart was founded in 1995.
? LookSmart was too dependent on
MSN & in 2003 Microsoft announced
that they were discontinuing
LookSmart that basically killed their
business model.
? The Inktomi Corporation came about on
May20, 1996 with its search engine Hotbot.
? They failed to develop a profitable business
model & sold out to Yahoo! For approx
$235 million in December 2003.
Ask.com(formerly Ask
Jeeves):
? In April 1997, Ask Jeeves
was launched as a natural
language search engine. It
used human editors to
match the search queries.
? In 2001, Ask Jeeves bought
Teoma to replace DirectHit
search technology.
? On March 21, 2005 Barry
Diller¡¯s IAC agreed to
acquire Ask Jeeves for 1.85
billion dollars.
? In 2006, Ask Jeeves was
renamed to Ask.
MODERN SEARCH
ENGINES
MICROSOFT
? In 1998 MSN Search was launched, but
Microsoft did not get serious about search until
after Google proved the business model.
? On September 11, 2006, Microsoft launched
their in house search technology Live
Searchproduct.
? On June 1, 2009, Microsoft launched Bing, a new
search service which changed the search
landscape by placing inline search suggestions
for related searches directly in the result set.
? For eg. When one searches for credit cards they
will suggest related phrases like ¡°credit card
types¡±, ¡°apply for credit cards¡±,¡± advice on
credit cards¡±, etc.
Yahoo!
? Getting Into Search: Yahoo! was founded in 1994 by David
Filo and Jerry Yang as a directory of websites.
? Overture purchased AllTheWeb and AltaVista in 2003.
Yahoo! purchased Inktomi in December, 2002, and then
consumed Overture in July, 2003 & combined the
technologies from the various search companies they
bought to make a new search engine.
? On March 20, 2005 Yahoo! purchased Flickr, a popular
photo sharing site.
? On December 9, 2005 Yahoo! Purchased Del.icio.us a social
bookmarking site.
? Yahoo! also made a strong push to promote Yahoo!
Answers, a popular free community driven question
answering service.
? On July 29, 2009, Yahoo! decided to give up on search and
signed a 10 yr deal to syndicate Bing ads and algorithmic
results on their website.
GOOGLE
? In 1995, Larry Page met Sergey Brin at
Stanford.
? By January of 1996, Larry and Sergey had
begun collaboration on a search engine
called BackRub, named for its unique ability
to analyze the "back links" pointing to a
given website
? A year later their unique approach to link
analysis was earning BackRub growing
reputation.
? BackRub ranked pages using citation
notation. In the Page Rank algorithm links
count as votes i.e. how many people link to
you & how trustworthy those links are.
? In 1998, Google was launched. Sergey tried
to shop their PageRank technology, but
nobody was interested in buying or licensing
their search technology at that time
How Search Engine Works?
The Main Parts of a Search Engine
? Spider (or ¡°web crawler¡±)
? Indexer
? Search software (an algorithm)
1. Web Crawling
? A web crawler (also known as a web spider or web robot) is a program
or automated script which browses the World Wide Web in a methodical,
automated manner. This process is called Web crawling or spidering.
? Many legitimate sites, in particular search engines, use spidering as a
means of providing up-to-date data.
? Web crawlers are mainly used to create a copy of all the visited pages &
are also used for automating maintenance tasks on a Web site, such as
checking links or validating HTML code.
? For eg:Imagine the World Wide Web as a network of stops in a big city
subway system. Each stop is a unique document (usually a web page, but
sometimes a PDF, JPG, or other file). The search engines need a way to
¡°crawl¡± the entire city and find all the stops along the way, so they use the
best path available¡ªlinks.
? Links allow the search engines' automated robots, called
"crawlers" or "spiders," to reach the many billions of
interconnected documents on the web.
? Once the engines find these pages, they decipher the code
from them and store selected pieces in massive databases,
to be recalled later when needed for a search query.
? The monstrous storage facilities hold thousands of
machines processing large quantities of information very
quickly.
? When a person performs a search at any of the major
engines, they demand results instantaneously; even a one-
or two-second delay can cause dissatisfaction, so the
engines work hard to provide answers as fast as possible.
Indexing
? Search engine indexing is the process of a search engine collecting, parses and
stores data for use by the search engine.
? The actual search engine index is the place where all the data the search
engine has collected is stored. It is the search engine index that provides the
results for search queries, and pages that are stored within the search engine
index that appear on the search engine results page.
? Without a search engine index, the search engine would take considerable
amounts of time and effort each time a search query was initiated, as the
search engine would have to search not only every web page or piece of data
that has to do with the particular keyword used in the search query, but every
other piece of information it has access to, to ensure that it is not missing
something that has something to do with the particular keyword.
? There are many different parts to a search engine index, such as design factors
and data structures.
? The design factors of a search engine index design decide how the
index actually works. The parts all combine to create the working
of search engine index, and include:
? Index size-which pertains to the amount of computer space
necessary to support the index.
? Storage techniques-which is the decision of the information
should be stored .Larger files are compressed while smaller files
are simply filtered.
? Fault tolerance-refers to the issue of how important it is for the
search engine index to be reliable.
? Lookup speed-is exactly as it sounds, pertaining to how quickly a
word can be found when the data is searched in the inverted index.
? Maintenance-is an important factor as well because the better
maintained a search engine index, the better it works.
What is a Search Engine Algorithm?
? A search algorithm is defined as a math formula that takes a
problem as input and returns a solution to the problem,
usually after evaluating a number of possible solutions.
? In simple words, a search engine algorithm is a set of rules,
or a unique formula, that the search engine uses to
determine the significance or rankings of a web page, and
each search engine has its own set of rules.
? Search algorithm sorts on the basis of many things like
location of keyword, synonyms, adjacent words, etc
? But there are certain things that all search engine
algorithms have in common.
? Relevancy
? Individual Factors
? Off-Page Factors
SAERCH ALGORITHM PRINCIPLES
Relevancy
? This is the First thing every search engine checks.
? The algorithm will determine whether this web
page has any relevancy at all for the particular
keyword.
? Location of keywords in that page is also
important for the relevancy of that website.
? Web pages that have the keywords in the title, as
well as within the headline or the first few lines of
the text will rank better for that keyword than
websites that do not have these features
Individual Factors
? ?A?second?part?of?search?engine?algorithms?are?the?
individual?factors?that?make?that?particular?search?engine?
different?from?every?other?search?engine?out?there.?
? Each?search?engine?has?unique?algorithms,?and?the?
individual?factors?of?these?algorithms?are?why?a?search?
query?turns?up?different?results?on?Google?than?MSN?or?
Yahoo!.?
? One?of?the?most?common?individual?factors?is?the?number?
of?pages?a?search?engine?indexes.
? They?may?just?have?more?pages?indexed,?or?index?them?
more?frequently,?but?this?can?give?different?results?for?each?
search?engine.?
? Some?search?engines?also?penalize?for?spamming,?while?
others?do?not.?
Off-Page Factors
? Another part of algorithms that is still individual
to each search engine are off-page factors.
? Off-page factors are such things as click-through
measurement and linking.
? The frequency of click-through rates and linking
can be an indicator of how relevant a web page is
to actual users and visitors, and this can cause an
algorithm to rank the web page higher.
? Off-page factors are harder for web masters to
craft, but can have an enormous effect on page
rank depending on the search engine algorithm.
Classified List of search engines
They are classified based on Content/Topic, Type of Information and Model. They are
further sub categorized as:
¨CContent/topic:
General:
? A general search engine operates using a search algorithm. Websites that are listed in the
search engine's directory are used to search for information based on various search
qualities. The goal is that the user gets relevant results with useful pages.
? Examples: , bing.com, duckduckdo.com, exalead.com, google.co.in, munax.com
Metasearch Engines:
? A metasearch engine (or aggregator) is a search tool that uses another search engine's data
to produce their own results from the Internet. Metasearch engines take input from a user
and simultaneously send out queries to third party search engines for results.
? Examples: Blingo, Yippy, DeeperWeb, Dogpile, Excite, HotBot, Info.com, Mamma,
Metacrawler, Mobissimo, Otalo, and Skyscanner.
¨C Geographically Limited:
? Which search engine you use which has Geographical limited scope. Means that engine
finds search result only related to that geographical area. In this we are not counting
Google.co.in or many others like this. Here is a list of some of those Search Engines
with their respective Geographical area.
? Examples: Accoona, Ansearch, Biglobe, Daum, Egerin, Goo, Leit.is, Maktoob, Miner.hu,
Najdi.si, Naver, Onkosh, Rambler, Rediff, SAPO, Search.ch, Sesam, Seznam, Ziplocal, etc.
¨C Semantic:
? Semantic search seeks to improve search accuracy by understanding searcher intent
and the contextual meaning of terms as they appear in the searchable dataspace,
whether on the Web or within a closed system, to generate more relevant results.
Semantic search systems consider various points including context of search, location,
intent, variation of words, synonyms, generalized and specialized queries, concept
matching and natural language queries to provide relevant search results. Major web
search engines like Google and Bing incorporate some elements of semantic search.
? Examples: Sophia Search, Evi, Yummy, Swoogle.
¨C Business:
? Business search helps us to keep in touch with the global world. All
the latest information regarding the dynamic market is available
with just a click.
? Examples: Business.com, Getit Infoservices Private Limited,
GenieKnows, GlobalSpec, Nexis(Lexis Nexis), Thomasnet, Justdial.
¨C Academic Materials Only:
? Examples: BASE, CiteULike, GoogleScholar, Library of congress,
Shodan, Noodle Education, SkilledUp, Chegg.
¨C Enterprise:
? Examples: Funnelback, Jumper 2.0, Oracle Corporation, Q-Sensei,
TeraText, SimilarWeb, Swifttype.
¨C Jobs:
? Examples: Adzuna, Bixee.com, CareerBuilder.com, Craigslist,
Dice.com, Eluta.com, Hotjobs.com, JobSreet.com, Incruit,
Indeed.com, Glassdooor.com, LinkUp.com, Monster.com,
Naukri.com.
¨C Medical:
? Examples: Bing Health, Bioinformatic
Harvester, CiteAb, EB-eye, Entrez,
GenieKnows, GoPubMed, Healia, Healthline,
Nextbio, Quertle, Searchmedica, WebMD.
¨C News:
? Examples: Bing News, Daylife, Google News,
MagPortal, Newslookup, Nexis, Topix.net,
Trapit, Yahoo! News.
¡ñ
Type of Information
? 4.2.1 Blog:
? Examples: Amatomu, Bloglines, BlogScope, IceRocket, Munax, Regator, Technorati.
¨C Multimedia:
? Examples: Bing Videos, blinkx, FindSounds, Google Videos, Munax¡¯s Play Audio Video,
Picsearch, Pixsta, Podscope, ScienceStage, SeeqPod, Songza, TinEye, TV Genius, Veveo.
¨C Source code:
? Examples: Google Code Search, Koders, Krugle.
¨C BitTorrent
¨C Examples: BTDigg, Isohunt, Mininova, The Pirate Bay, TorrentSpy, Torrentz, Torrentus.
¨C Maps:
? Examples: Bing Maps, Geoportail, Google Maps, MapQuest, Nokia Maps, OpenStreetMap,
Wikiloc, WikiMapia, Yahoo! Maps.
¨C Price:
? Examples:Bing Shopping, Google Shopping, Kelkoo, MySimon, PriceGrabber, PriceRunner,
PriceSCAN, Pronto.com, Shopping.com, ShopWiki, Shopzilla, SwoopThat.com, TheFind.com.
Model
Privacy search engines:
? Examples: DuckDuckGo, Ixquick.
¨C Open source search engines:
? Examples: DataparkSearch, Gigablast, Grub, ht://Dig,
Isearch, Lemur Toolkit & Indri Search Engine, Lucene,
Namazu, Nutch, Recoll, Sciencenet, Searchdaimaon, Seeks,
etc.
¨C Social search engines:
? Examples: ChaCha Search, Delver, Eurekster, Majhalo.com,
Rollyo, Search Team, Sproose, Trexy
GOOGLE AS A SEARCH ENGINE
Winning the Search War
? Later that year Andy Bechtolsheim gave them $100,000 seed funding and
Google received $25 million Sequoia Capital .
? In 1999 AOL selected Google as a search partner.
? In 2000 Google also launched their popular Google Toolbar.
? On May 1, 2002, AOL announced they would use Google to deliver their
search related ads, which was a strong turning point in Google's battle
against Overture.
? In 2003 Google also launched their AdSense program, which allowed them
to expand their ad network by selling targeted ads on other websites.
Google Maps
Google News
Google Book Search
Google Scholar
Google Blog Search
Google Base
Google Video
VERTICAL GALORES
Google Universal Search
Email
Analytics
Radio ads
Office productivity software
Calendar
Checkout
Working of Google
? Crawling & Indexing:- Search starts with the web. It¡¯s made up of over
60 trillion individual pages & it¡¯s constantly growing. Google navigates
the web by crawling that means it follows links from page to page.
Pages are sorted by their content & other factors & it keeps a track of it
in ¡®THE INDEX¡¯ (It¡¯s over 100 million gigabytes)
? Algorithms:- Work looking for clues to better understand the user
means. Based on the clues we pull relevant documents from ¡®The
Index¡¯.The results are ranked according to freshness, site& page quality,
safe search, user context, translation, Universal search. These results
can take a variety of forms. (All this happens in 1/8th of a sec)
The Search Lab: The algorithms are constantly changing. These
changes begin as ideas in the minds of the engineers. They take these ideas
and run experiments, analyze the results, tweak them & run them again&
again to get the following results:
? Knowledge Graph:-Provides results based on a database of real world
people, places, things & connections between them.
? Snippets:-Shows small previews of information, such as a page¡¯s title&
short descriptive text for each search results.
? News:-Includes results from online newspapers & blogs from around
the world.
? Answers:-Displays immediate answers & information for things such as
the weather, sports, scores, quick facts,
? Videos:-Shows video-based results with thumbnails so you can quickly
decide which video to watch.
? Refinements:-Provides features like ¡®Advanced Search¡¯ related searches
& other search tools, all of which helps one find the respective search.
? Voice search:-With the Google search app simply say what you want
and get answers spoken right back to you.
? Mobile:-Include improvements designed specifically for mobile devices
such as tablets & smartphones.
Fighting Spam
Google fights spam 24/7 to keep the results relevant.The majority of spam
removal is automatic.Other questionable documents are examined by hand and
incase any spam is detected manual action is taken.
Types of Spams:-
? Unnatural links from a site: Google detected a pattern of unnatural, artificial,
deceptive or manipulative outbound links on this site.This may be the result of
selling links that pass PageRank or participating in link schemes.
? Cloaking &/ Sneaky redirects:-Site appears to be cloaking(displaying different
content to human users than is shown to search engines) to redirecting users to a
different page than google.
? Hacked Site:-Some pages on this site may have been hacked by a third party to
display spammy content or links.Websites owners should take immediate actions to
clean their sites & fix any security vulnerabilities.
? Hidden text &/ or keyword stuffing:-Some of the pages may contain hidden texts
&/ keyword stuffing.
And that¡¯s how Google search engines
works. Behind a simple page of results is a
complex system, carefully crafted & tested,
to support more than one hundred billion
searches each month.
Web search engines ( Mr.Mirza )
? A web search engine is a software system that is designed to
search for information on the World Wide Web.
? Working process of search engines which starts with Web
crawling, Indexing and Searching which uses an algorithm to
give relevant search results within fraction of seconds.
? History of Search engines from its inception i.e in 1945 when
Vannevar Bush proposed the visionary idea of maintaining a
record of all the knowledge available to mankind which led to
an era of revolution for search engines.
? Various type of search engines like metasearch engine,
business, educational, social, LookSmart,Lycos, Microsoft,
Yahoo, human answers machines like Quora, etc.
? It saves time and gives us the precise & relevant information
needed.
Web search engines ( Mr.Mirza )

More Related Content

Web search engines ( Mr.Mirza )

  • 1. SEARCH ENGINES ABHAY KHANDALKAR KISHOR MANE SAIF ALI MIRZA ESHA PALAV DAKSHEH RATHOD
  • 3. What is a Search Engine? ? Search engines are the key to find specific information on the vast expanse of the World Wide Web. ? Without search engines, it would be virtually impossible to locate anything on the Web without knowing a specific URL. ? Search engines use automated software (known as robots or spiders) that follow links on the websites, thus harvesting information as they go. ? Search engines are also known as answer machines. When a person performs an online search, the search engine scours its corpus of billions of documents and does two things: 1.) It returns only those results that are relevant or useful to the searcher's query; 2.) It ranks those results according to the popularity of the websites serving the information. It is both relevance and popularity that the process of SEO is meant to influence.
  • 4. What is SEO? ? Imagine for a minute that you are the librarian ? People across the world depend upon you for the exact information they need ? For this we need a system to know what¡¯s inside every book & how books relate to each other. ? System needs to take a lot of information & send out the best answers for the questions. ? Search engines like Google & Yahoo are the librarians of the Internet.
  • 5. ? Their systems collect information about every page on the web so they can help people find what exactly they are looking for. ? Every search engine has a secret recipe called Algorithm for turning all that information into useful search results. ? When pages have higher Rankings they have more people finding those higher pages. ? The key to higher Ranking is making sure your website has ingredients search engines need for their Algorithm. This is called as Search Engine Optimization.
  • 6. Search Algorithm 1.Words matter 2.Titles matter 3.Links matter 4.Words in Links 5.Reputation
  • 7. Search Engine Optimization ¨C The process of maximizing the number of visitors to a particular website by ensuring that the site appears high on the list of results returned by search engines.
  • 8. Birth of Search Engines ? The concept of hypertext & memory extension came to life in July 1945 when Vannevar Bush¡¯s ¡°As We May Think¡± was published in The ¡®Atlantic Monthly¡¯. ? He urged scientists to work together to help build a body of knowledge for all mankind. ? He then proposed the idea of a virtually limitless, fast, reliable, extensible, associative memory storage & retrieval system. Vannevar Bush named this device a ¡°MEMEX¡±.
  • 9. ? Ted Nelson created the Project Xanadu in 1960 & coined the term ¡°Hypertext ¡° in 1963, much of the inspiration to create the WWW was drawn from Ted¡¯s work hence he is rightly called as the father of ¡®Search Engines¡¯. ? ARPANET is the network which eventually led to the Internet. ? Packet switching was based on the concepts & designs by American scientists Leonard Kleinrock & Paul Baran of the Lincoln Laboratory. ? The ARPANET was an early packet switching network & the first network to implement the protocol suite ¡®TCP/IP¡¯. ? The ARPANET was operated by the military during the two decades of its existence until 1990.
  • 11. ? First Internet search engine created was Archie, in 1990 by Alan Emtage a student at McGill University in Montreal. ? Archie was a database of web filenames which it would match with the users query. ? Later Veronica was developed which served the same purpose as Archie but it worked on plain text files. ? Soon another user interface named Jughead appeared & both were used for sending files via Gopher which served an alternative to Archie by Mark McCahill at the university of Minnesota in 1991.
  • 12. ? Tim Berners-Lee designed & built the first web browser & editor called httpd. ? The first website built was http://info.cern.ch/ & was put online on August 6,1991. ? In 1994, Berners ¨CLee founded World Wide Web Consortium at Massachusetts Institute of Technology. ? IN 1993 Martijn Koster created Archie-Like Indexing of the Web, or ALIWEB. ? ALIWEB crawled meta information & allowed users to submit their pages which they wanted to index with their own page descriptions.
  • 13. What is a Computer Bot? ? Computer robots are simply programs that automate repetitive tasks at speeds impossible for humans to reproduce. ? The term ¡®Bot¡¯ on the Internet is usually used to describe anything that interfaces with the user or that collects data. ? Search engines use ¡°spiders¡± which search the web for information. They read the contents of pages for indexing & also record the links. ? Another eg. Is Chatterbots which attempts to act like a human & communicate with humans on said topic.
  • 14. PRE MODERN SEARCH ENGINES
  • 15. Primitive Web Search ? By December of 1993, three full fledged bot fed search engines had surfaced on the web: JumpStation, the World Wide Web Worm, and the Repository-Based Software Engineering (RBSE) spider. ? JumpStation & the WWW worm gathered info about the title & URL¡¯s from webpages & retrieved these using a simple linear search. ? The problem with JumpStation & the WWW worm is that they listed results in the order that they found them & provided no discimination. ? The RBSE spider implemented a ranking system.
  • 16. ALTA VISTA ? AltaVista provided numerous search tips & advanced search features to the web. ? They had nearly unlimited bandwidth, the first to allow natural language queries, advanced searching techniques & they also allowed users to add or delete their own URL within 24 hrs. ? Due to poor mismanagement & fear of result manipulation AltaVista was largely driven into irrelevancy.
  • 17. WEB CRAWLER ? On April 20, 1994 Brian Pinkerton of the university of Washington released WebCrawler . ? It was the first crawler which indexed entire pages. ? In 1997 Excite bought out WebCrawler.
  • 18. ? Lycos was the next major search development designed at Carnegie Mellon University around July of 1994 by Michale Mauldin. ? Lycos provided ranked relevance retrieval, prefix matching & word proximity. ? In November 1996, Lycos had indexed over 60 million documents-more than any other web search engine. ? In October 1994, Lycos ranked first on Netscape¡¯s list of search engines by finding the most hits on the word ¡°surf¡±.
  • 19. ? In April 1994, David Filo & Jerry Yang created the Yahoo! Directory as a collection of their favourite web pages. ? As their no of links grew they had to reorganize & become a searchable directory. ? On September 26, 2014 Yahoo! Announced they would close the Yahoo! Directory at the end of 2014.
  • 20. ? LookSmart was founded in 1995. ? LookSmart was too dependent on MSN & in 2003 Microsoft announced that they were discontinuing LookSmart that basically killed their business model.
  • 21. ? The Inktomi Corporation came about on May20, 1996 with its search engine Hotbot. ? They failed to develop a profitable business model & sold out to Yahoo! For approx $235 million in December 2003.
  • 22. Ask.com(formerly Ask Jeeves): ? In April 1997, Ask Jeeves was launched as a natural language search engine. It used human editors to match the search queries. ? In 2001, Ask Jeeves bought Teoma to replace DirectHit search technology. ? On March 21, 2005 Barry Diller¡¯s IAC agreed to acquire Ask Jeeves for 1.85 billion dollars. ? In 2006, Ask Jeeves was renamed to Ask.
  • 24. MICROSOFT ? In 1998 MSN Search was launched, but Microsoft did not get serious about search until after Google proved the business model. ? On September 11, 2006, Microsoft launched their in house search technology Live Searchproduct. ? On June 1, 2009, Microsoft launched Bing, a new search service which changed the search landscape by placing inline search suggestions for related searches directly in the result set. ? For eg. When one searches for credit cards they will suggest related phrases like ¡°credit card types¡±, ¡°apply for credit cards¡±,¡± advice on credit cards¡±, etc.
  • 25. Yahoo! ? Getting Into Search: Yahoo! was founded in 1994 by David Filo and Jerry Yang as a directory of websites. ? Overture purchased AllTheWeb and AltaVista in 2003. Yahoo! purchased Inktomi in December, 2002, and then consumed Overture in July, 2003 & combined the technologies from the various search companies they bought to make a new search engine. ? On March 20, 2005 Yahoo! purchased Flickr, a popular photo sharing site. ? On December 9, 2005 Yahoo! Purchased Del.icio.us a social bookmarking site. ? Yahoo! also made a strong push to promote Yahoo! Answers, a popular free community driven question answering service. ? On July 29, 2009, Yahoo! decided to give up on search and signed a 10 yr deal to syndicate Bing ads and algorithmic results on their website.
  • 26. GOOGLE ? In 1995, Larry Page met Sergey Brin at Stanford. ? By January of 1996, Larry and Sergey had begun collaboration on a search engine called BackRub, named for its unique ability to analyze the "back links" pointing to a given website ? A year later their unique approach to link analysis was earning BackRub growing reputation. ? BackRub ranked pages using citation notation. In the Page Rank algorithm links count as votes i.e. how many people link to you & how trustworthy those links are. ? In 1998, Google was launched. Sergey tried to shop their PageRank technology, but nobody was interested in buying or licensing their search technology at that time
  • 28. The Main Parts of a Search Engine ? Spider (or ¡°web crawler¡±) ? Indexer ? Search software (an algorithm)
  • 29. 1. Web Crawling ? A web crawler (also known as a web spider or web robot) is a program or automated script which browses the World Wide Web in a methodical, automated manner. This process is called Web crawling or spidering. ? Many legitimate sites, in particular search engines, use spidering as a means of providing up-to-date data. ? Web crawlers are mainly used to create a copy of all the visited pages & are also used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. ? For eg:Imagine the World Wide Web as a network of stops in a big city subway system. Each stop is a unique document (usually a web page, but sometimes a PDF, JPG, or other file). The search engines need a way to ¡°crawl¡± the entire city and find all the stops along the way, so they use the best path available¡ªlinks.
  • 30. ? Links allow the search engines' automated robots, called "crawlers" or "spiders," to reach the many billions of interconnected documents on the web. ? Once the engines find these pages, they decipher the code from them and store selected pieces in massive databases, to be recalled later when needed for a search query. ? The monstrous storage facilities hold thousands of machines processing large quantities of information very quickly. ? When a person performs a search at any of the major engines, they demand results instantaneously; even a one- or two-second delay can cause dissatisfaction, so the engines work hard to provide answers as fast as possible.
  • 31. Indexing ? Search engine indexing is the process of a search engine collecting, parses and stores data for use by the search engine. ? The actual search engine index is the place where all the data the search engine has collected is stored. It is the search engine index that provides the results for search queries, and pages that are stored within the search engine index that appear on the search engine results page. ? Without a search engine index, the search engine would take considerable amounts of time and effort each time a search query was initiated, as the search engine would have to search not only every web page or piece of data that has to do with the particular keyword used in the search query, but every other piece of information it has access to, to ensure that it is not missing something that has something to do with the particular keyword. ? There are many different parts to a search engine index, such as design factors and data structures.
  • 32. ? The design factors of a search engine index design decide how the index actually works. The parts all combine to create the working of search engine index, and include: ? Index size-which pertains to the amount of computer space necessary to support the index. ? Storage techniques-which is the decision of the information should be stored .Larger files are compressed while smaller files are simply filtered. ? Fault tolerance-refers to the issue of how important it is for the search engine index to be reliable. ? Lookup speed-is exactly as it sounds, pertaining to how quickly a word can be found when the data is searched in the inverted index. ? Maintenance-is an important factor as well because the better maintained a search engine index, the better it works.
  • 33. What is a Search Engine Algorithm? ? A search algorithm is defined as a math formula that takes a problem as input and returns a solution to the problem, usually after evaluating a number of possible solutions. ? In simple words, a search engine algorithm is a set of rules, or a unique formula, that the search engine uses to determine the significance or rankings of a web page, and each search engine has its own set of rules. ? Search algorithm sorts on the basis of many things like location of keyword, synonyms, adjacent words, etc ? But there are certain things that all search engine algorithms have in common.
  • 34. ? Relevancy ? Individual Factors ? Off-Page Factors SAERCH ALGORITHM PRINCIPLES
  • 35. Relevancy ? This is the First thing every search engine checks. ? The algorithm will determine whether this web page has any relevancy at all for the particular keyword. ? Location of keywords in that page is also important for the relevancy of that website. ? Web pages that have the keywords in the title, as well as within the headline or the first few lines of the text will rank better for that keyword than websites that do not have these features
  • 36. Individual Factors ? ?A?second?part?of?search?engine?algorithms?are?the? individual?factors?that?make?that?particular?search?engine? different?from?every?other?search?engine?out?there.? ? Each?search?engine?has?unique?algorithms,?and?the? individual?factors?of?these?algorithms?are?why?a?search? query?turns?up?different?results?on?Google?than?MSN?or? Yahoo!.? ? One?of?the?most?common?individual?factors?is?the?number? of?pages?a?search?engine?indexes. ? They?may?just?have?more?pages?indexed,?or?index?them? more?frequently,?but?this?can?give?different?results?for?each? search?engine.? ? Some?search?engines?also?penalize?for?spamming,?while? others?do?not.?
  • 37. Off-Page Factors ? Another part of algorithms that is still individual to each search engine are off-page factors. ? Off-page factors are such things as click-through measurement and linking. ? The frequency of click-through rates and linking can be an indicator of how relevant a web page is to actual users and visitors, and this can cause an algorithm to rank the web page higher. ? Off-page factors are harder for web masters to craft, but can have an enormous effect on page rank depending on the search engine algorithm.
  • 38. Classified List of search engines
  • 39. They are classified based on Content/Topic, Type of Information and Model. They are further sub categorized as: ¨CContent/topic: General: ? A general search engine operates using a search algorithm. Websites that are listed in the search engine's directory are used to search for information based on various search qualities. The goal is that the user gets relevant results with useful pages. ? Examples: , bing.com, duckduckdo.com, exalead.com, google.co.in, munax.com Metasearch Engines: ? A metasearch engine (or aggregator) is a search tool that uses another search engine's data to produce their own results from the Internet. Metasearch engines take input from a user and simultaneously send out queries to third party search engines for results. ? Examples: Blingo, Yippy, DeeperWeb, Dogpile, Excite, HotBot, Info.com, Mamma, Metacrawler, Mobissimo, Otalo, and Skyscanner.
  • 40. ¨C Geographically Limited: ? Which search engine you use which has Geographical limited scope. Means that engine finds search result only related to that geographical area. In this we are not counting Google.co.in or many others like this. Here is a list of some of those Search Engines with their respective Geographical area. ? Examples: Accoona, Ansearch, Biglobe, Daum, Egerin, Goo, Leit.is, Maktoob, Miner.hu, Najdi.si, Naver, Onkosh, Rambler, Rediff, SAPO, Search.ch, Sesam, Seznam, Ziplocal, etc. ¨C Semantic: ? Semantic search seeks to improve search accuracy by understanding searcher intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results. Semantic search systems consider various points including context of search, location, intent, variation of words, synonyms, generalized and specialized queries, concept matching and natural language queries to provide relevant search results. Major web search engines like Google and Bing incorporate some elements of semantic search. ? Examples: Sophia Search, Evi, Yummy, Swoogle.
  • 41. ¨C Business: ? Business search helps us to keep in touch with the global world. All the latest information regarding the dynamic market is available with just a click. ? Examples: Business.com, Getit Infoservices Private Limited, GenieKnows, GlobalSpec, Nexis(Lexis Nexis), Thomasnet, Justdial. ¨C Academic Materials Only: ? Examples: BASE, CiteULike, GoogleScholar, Library of congress, Shodan, Noodle Education, SkilledUp, Chegg. ¨C Enterprise: ? Examples: Funnelback, Jumper 2.0, Oracle Corporation, Q-Sensei, TeraText, SimilarWeb, Swifttype. ¨C Jobs: ? Examples: Adzuna, Bixee.com, CareerBuilder.com, Craigslist, Dice.com, Eluta.com, Hotjobs.com, JobSreet.com, Incruit, Indeed.com, Glassdooor.com, LinkUp.com, Monster.com, Naukri.com.
  • 42. ¨C Medical: ? Examples: Bing Health, Bioinformatic Harvester, CiteAb, EB-eye, Entrez, GenieKnows, GoPubMed, Healia, Healthline, Nextbio, Quertle, Searchmedica, WebMD. ¨C News: ? Examples: Bing News, Daylife, Google News, MagPortal, Newslookup, Nexis, Topix.net, Trapit, Yahoo! News.
  • 43. ¡ñ Type of Information ? 4.2.1 Blog: ? Examples: Amatomu, Bloglines, BlogScope, IceRocket, Munax, Regator, Technorati. ¨C Multimedia: ? Examples: Bing Videos, blinkx, FindSounds, Google Videos, Munax¡¯s Play Audio Video, Picsearch, Pixsta, Podscope, ScienceStage, SeeqPod, Songza, TinEye, TV Genius, Veveo. ¨C Source code: ? Examples: Google Code Search, Koders, Krugle. ¨C BitTorrent ¨C Examples: BTDigg, Isohunt, Mininova, The Pirate Bay, TorrentSpy, Torrentz, Torrentus. ¨C Maps: ? Examples: Bing Maps, Geoportail, Google Maps, MapQuest, Nokia Maps, OpenStreetMap, Wikiloc, WikiMapia, Yahoo! Maps. ¨C Price: ? Examples:Bing Shopping, Google Shopping, Kelkoo, MySimon, PriceGrabber, PriceRunner, PriceSCAN, Pronto.com, Shopping.com, ShopWiki, Shopzilla, SwoopThat.com, TheFind.com.
  • 44. Model Privacy search engines: ? Examples: DuckDuckGo, Ixquick. ¨C Open source search engines: ? Examples: DataparkSearch, Gigablast, Grub, ht://Dig, Isearch, Lemur Toolkit & Indri Search Engine, Lucene, Namazu, Nutch, Recoll, Sciencenet, Searchdaimaon, Seeks, etc. ¨C Social search engines: ? Examples: ChaCha Search, Delver, Eurekster, Majhalo.com, Rollyo, Search Team, Sproose, Trexy
  • 45. GOOGLE AS A SEARCH ENGINE
  • 46. Winning the Search War ? Later that year Andy Bechtolsheim gave them $100,000 seed funding and Google received $25 million Sequoia Capital . ? In 1999 AOL selected Google as a search partner. ? In 2000 Google also launched their popular Google Toolbar. ? On May 1, 2002, AOL announced they would use Google to deliver their search related ads, which was a strong turning point in Google's battle against Overture. ? In 2003 Google also launched their AdSense program, which allowed them to expand their ad network by selling targeted ads on other websites.
  • 47. Google Maps Google News Google Book Search Google Scholar Google Blog Search Google Base Google Video VERTICAL GALORES
  • 48. Google Universal Search Email Analytics Radio ads Office productivity software Calendar Checkout
  • 50. ? Crawling & Indexing:- Search starts with the web. It¡¯s made up of over 60 trillion individual pages & it¡¯s constantly growing. Google navigates the web by crawling that means it follows links from page to page. Pages are sorted by their content & other factors & it keeps a track of it in ¡®THE INDEX¡¯ (It¡¯s over 100 million gigabytes) ? Algorithms:- Work looking for clues to better understand the user means. Based on the clues we pull relevant documents from ¡®The Index¡¯.The results are ranked according to freshness, site& page quality, safe search, user context, translation, Universal search. These results can take a variety of forms. (All this happens in 1/8th of a sec) The Search Lab: The algorithms are constantly changing. These changes begin as ideas in the minds of the engineers. They take these ideas and run experiments, analyze the results, tweak them & run them again& again to get the following results:
  • 51. ? Knowledge Graph:-Provides results based on a database of real world people, places, things & connections between them. ? Snippets:-Shows small previews of information, such as a page¡¯s title& short descriptive text for each search results. ? News:-Includes results from online newspapers & blogs from around the world. ? Answers:-Displays immediate answers & information for things such as the weather, sports, scores, quick facts, ? Videos:-Shows video-based results with thumbnails so you can quickly decide which video to watch. ? Refinements:-Provides features like ¡®Advanced Search¡¯ related searches & other search tools, all of which helps one find the respective search. ? Voice search:-With the Google search app simply say what you want and get answers spoken right back to you. ? Mobile:-Include improvements designed specifically for mobile devices such as tablets & smartphones.
  • 52. Fighting Spam Google fights spam 24/7 to keep the results relevant.The majority of spam removal is automatic.Other questionable documents are examined by hand and incase any spam is detected manual action is taken. Types of Spams:- ? Unnatural links from a site: Google detected a pattern of unnatural, artificial, deceptive or manipulative outbound links on this site.This may be the result of selling links that pass PageRank or participating in link schemes. ? Cloaking &/ Sneaky redirects:-Site appears to be cloaking(displaying different content to human users than is shown to search engines) to redirecting users to a different page than google. ? Hacked Site:-Some pages on this site may have been hacked by a third party to display spammy content or links.Websites owners should take immediate actions to clean their sites & fix any security vulnerabilities. ? Hidden text &/ or keyword stuffing:-Some of the pages may contain hidden texts &/ keyword stuffing.
  • 53. And that¡¯s how Google search engines works. Behind a simple page of results is a complex system, carefully crafted & tested, to support more than one hundred billion searches each month.
  • 55. ? A web search engine is a software system that is designed to search for information on the World Wide Web. ? Working process of search engines which starts with Web crawling, Indexing and Searching which uses an algorithm to give relevant search results within fraction of seconds. ? History of Search engines from its inception i.e in 1945 when Vannevar Bush proposed the visionary idea of maintaining a record of all the knowledge available to mankind which led to an era of revolution for search engines. ? Various type of search engines like metasearch engine, business, educational, social, LookSmart,Lycos, Microsoft, Yahoo, human answers machines like Quora, etc. ? It saves time and gives us the precise & relevant information needed.

Editor's Notes

  1. 1.Search engines accounts for every word on the web. 2. Each page on the web has an official Title 3. Links between websites matter. When one page links other it is usually a recommendations telling readers. 4. Words that are used in the links matter 5. Search engine care about reputation <number>
  2. <number>