This document provides an introduction to Apache Solr, an open source search engine. It discusses what features are needed from a text search engine, including storing and querying documents, handling natural language complications, and highlighting search results. It then describes what Solr is, its architecture which uses Apache Lucene for indexing and searching, and its usage including concepts like schemas, fields, and data types. Special Solr features are highlighted such as advanced search methods, scoring, grouping, highlighting, and real-time indexing. Competitors to Solr like Elasticsearch are also briefly mentioned.
2. Topics
What we need from a text search engine
What is Solr?
Why Solr?
Concepts And Architecture
Usage
Special Features
Competitors
3. Text Retrieval vs Database
Retrieval
Information and Query
Unstructured vs Structured
Ambiguous vs Well defined
Answers
Relevant documents (ambiguous) vs matched
documents
4. What we want from text search
engine
Basic Search Features:
Store some documents with some fields
Query for documents
Text Search Features
Find most relevant docs
Handle Natural language Complications (stop words, stem, tokenizing )
Highlight text
5. Problems with Text Search
SampleProblem
惘擧惠悋惡愆忰惆惶悋惆Tokenization
Different Letter representation
惘惘惘惆Similar words
惺悛慍擯悋惘Synonymous words
愆惘Word ambiguity
惡悋悋愕惠惡惘惠...Stop words
擯悵悋惘愆Spell errors
Spoken language
6. What is Solr?
An Open Search Engine
Written in Java
Wrapping Apache Lucene
With REST API
Fault tolerant
Scalable
Distributable