The document discusses concepts and tools related to search. It defines search and provides synonyms. It discusses hunting, looking, and questing for information. It introduces Apache Lucene as an open source search library and outlines some of its key features like free text search, relevancy ranking, and real-time indexing. The document then covers text analysis and the inverted index data structure used in search. It also discusses factors that influence relevance and considerations for schema and configuration in Apache Solr.
8. Open Source Search Library (Java API)
Free text search
Relevancy ranking
Faceting and filtering
Hit-term highlighting
Near real-time indexing/querying
Inverted Index
9. Free text search via
Keyword
Wildcard
Proximity
Fuzzy
Range
Geospatial
10. Free text search via
Keyword
Wildcard
Proximity
Fuzzy
Range
Geospatial
walk*
M?ham?d
M[ou]hamm?[ae]d
11. Free text search via
Keyword
Wildcard
Proximity
Fuzzy
Range
Geospatial
12. Free text search via
Keyword
Wildcard
Proximity
Fuzzy
Range
Geospatial
13. Free text search via
Keyword
Wildcard
Proximity
Fuzzy
Range
Geospatial
[* TO N]
14. Free text search via
Keyword
Wildcard
Proximity
Fuzzy
Range
Geospatial
15. Text Analysis
Convert text into searchable words
CharFilter
o Mutates single stream of text
Tokenizer
o Splits single stream of text into one or more
tokens
TokenFilter
o Mutates token stream
16. Notable Character Filters
HTML Strip
o <p>Example <a href=/test>link</a></p>
o Example link
Pattern Replace
o pattern="[^a-zA-Z]" replacement=""
o Testing123
o Testing
17. Notable Tokenizers
Keyword
o Hello World!
o {Hello World!}
Whitespace
o Hello World!
o {Hello, World!}
Standard
Pattern
ICU (International Components for Unicode)
18. Notable Token Filters
Lower Case
o {Hello, World!}
o {hello, world!}
Synonym
o synonyms.txt (expand=true): JPN, Japan, JN
則р {to, Japan}
則р {to, {Japan, JPN, JN}}
o synonyms.txt (expand=false)
則р {to, JPN}
19. Notable Token Filters
Word Delimiter
o {F22-Raptor}
o {F22, Raptor}
o {F, 22, Raptor}
o {F, {22, F22}, {Raptor, F22Raptor}}
Porter Stem
o {walked, walking, walks}
o {walk, walk, walk}
20. Inverted Index
T[0] = "It is what it is"
T[1] = "what is it?"
T[2] = "it is a banana"
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"It": {0}
"it?": {1}
"what": {0, 1}
21. Inverted Index
T[0] = "It is what it is"
T[1] = "what is it?"
T[2] = "it is a banana"
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
22. What is Relevant?
TF-IDF
o Term Frequency - Inverse Document Frequency
Boosting
o Important terms
o Signals
29. Solr Schema
Field Definitions
o Field Type, Indexed, Stored,
Multivalued, Doc Values
o Copy Fields
o Dynamic Fields
則р <dynamicField name="*_sort" type="lowercase" />
Field Types
o Analysis Chain