ݺߣ

ݺߣShare a Scribd company logo
Using SOLR as Open-Source Search
Platform for Organizational Research
Experts Retrieval
Dr. Gan Keng Hoon
School of Computer Sciences
Universiti Sains Malaysia
GKH USM 1
7th Annual Open Source Summit 2020
Day 5 - 15th December
Talk Outline
Text Search with SOLR
What is SOLR
What is Text Search
Text Search in an Organization
Information Retrieval Concept and SOLR in Use
Demo of Organizational Research Expert Retrieval
GKH USM 2
Part I: Text Search with SOLR
GKH USM 3
What is SOLR?
GKH USM 4
SOLR
Solr is the
enterprise search platform built on
Apache Lucene™.
GKH USM 5
popular
blazing-fast
open source
Solr Resources
Official Website: https://lucene.apache.org/solr/
Documentation: https://lucene.apache.org/solr/resources.html
GKH USM 6
What is Text Search?
Are you thinking about Google?
GKH USM 7
Let’s start with something we are familiar
with …
Users want to type in a few simple keywords and get back great results.
Involved matching query terms to documents
GKH USM 8
Ranked Results
Return “ranked” documents for a query.
A search engine returns documents sorted in descending order by a
score that indicates the strength of the match of the document to the
query.
Ranking by relevancy is important.
Side Question: How to determine the relevancy?
GKH USM 9
What Else
Search engine also
Returns Images, Videos, Social Feeds, Products etc.
Provides search suggestions, auto complete etc.
Gives answer, facts, files etc.
GKH USM 10
Text Search Solution in Your Organization
How do you perform the search within your organization?
documents finding tool in your operating system.
querying of data stored in your database using Sql.
Will you implement a (similar to) Google Search Engine for
your organization?
GKH USM 11
Questions for Audience
1. List one TEXT or DOCUMENT
LOOKUP/SEARCH/NAVIGATION PROBLEM that you have
encountered in your organization.
2. Is there any existing search tool or feature implemented
within your organization that you can adapt to address the
problem?
GKH USM 12
Enterprise Search
Managing search solutions within an organization or for the benefit of an
organization …
GKH USM 13
IR and Search Engines
Relevance
-Effective ranking
Evaluation
-Testing and
measuring
Information needs
-User interaction
Performance
-Efficient search and indexing
Incorporating new data
-Coverage and freshness
Scalability
-Growing with data and users
Adaptability
-Tuning for applications
Specific problems
-e.g. Spam
Information
Retrieval
Search
Engines
14
Enterprise
Search
Engines
GKH USM
Enterprise Search Features
1. Not only Texts.
Unifying structured and unstructured data.
2. Not only Search.
Search + Analytics.
3. Not for Everyone.
Not addressing a general problem, but specific to
application/domain/business needs.
GKH USM 15
Part II: IR Concept & SOLR in Use
GKH USM 16
High Level Information Retrieval Concept
GKH USM 17
Documents
Document
Representation
Information Needs
Query
Retrieved
Documents
Indexing Formulation
Retrieval Function
Relevance Feedback
Diagram
of
the
main
components
of
Solr
4
GKH USM 18
Image Source: Solr in
Action, Graiger & Potter
Documents
Indexing
Document
Representation
Query
Retrieval Function
Retrieved
Documents
Glimpse of Solr In Use
GKH USM 19
SOLR Downloads Site
GKH USM 20
Unzipping SOLR into Directory
For Windows users, we highly recommend that you extract Solr to a directory that
doesn’t have spaces in the name; that is, avoid extracting Solr into directories like
C:Documents and Settings or C:Program Files. For example, use path like
c:solr-8.2.0 instead.
For Linux users, choose a location like /opt/solr/.
GKH USM 21
View SOLR in Your Directory
Example directory listing of the solr-8.2.0 installation after extracting the
downloaded archive on your computer. We’ll refer to the top-level
directory as $SOLR_INSTALL/ throughout the rest of the slides.
GKH USM 22
Start Solr
To start Solr, you need to run solr script located at the bin folder.
For example, if your placed solr at c:solr-8.2.0
Open a command line, and enter the following:
$ cd #this gets you to the base directory
$ cd $SOLR_INSTALL #this gets you to your solr folder
$ cd bin #this gets you to your bin folder
$ bin/solr start
Note: cd – change directory
GKH USM 23
Start Solr
GKH USM 24
Admin Console - http://localhost:8983/solr/
GKH USM 25
Create Your First Core
Your running server is empty.
Create your first core, called “techproduct”.
Go to command prompt,
$ bin/solr create –c techproduct
GKH USM 26
Meet Your First Core
GKH USM 27
View Properties of The Core
* You can also Add/Rename Core at the admin page.
GKH USM 28
Add Some Example Documents
When you first start Solr, there are no documents in the index. It’s an empty server
waiting to be filled with data to search.
Let’s add some documents from exampleexampledocs directory.
What are the example file types in your
$ SOLR_INSTALLexampleexampledocs ??
GKH USM 29
Use Post Tool to Add Documents
For Unix user, you can call Post Tool from bin
$ bin/post -c techproduct example/exampledocs/*.xml
For Window user, navigate to the exampleexampledocs folder
$ cd
$ cd $SOLR_INSTALLexampleexampledocs
$ java -jar –Dc=techproduct post.jar *.xml
specify core
the files to be added. In this case, we
are adding all files with .xml type
GKH USM 30
Status of Added Files
GKH USM 31
What is the speed of indexing?
Let’s Search
Go to http://localhost:8983/solr/
Select techproduct core
Select Query tab
Enter *:* at the query form.
GKH USM 32
GKH USM 33
Search results from
executing
the find of all
documents query.
View More Search Results
In the query form,
• Change start to 0
• Change rows to 32
GKH USM 34
14 files were indexed, but the search *:*
found 32 documents.
GKH USM 35
Part III: Demo of Organizational
Research Expert Retrieval
GKH USM 36
Organization Research Experts Retrieval
Target Users: Students, Researchers, Collaborators looking for expertise
from School of Computer Sciences, Universiti Sains Malaysia.
Data Set: Scopus publication data for all academics at the school.
Status: Prototype.
Purpose: In house solution.
Focused search and analytics capabilities.
GKH USM 37
1. Design Document/Retrieval Unit
GKH USM 38
2. Create Solr Core
GKH USM 39
Create a new core to
store the collection
3. Perform Indexing
Perform indexing at the
backend
using addDocuments()
by Solarium PHP library
GKH USM 40
https://github.com/solariumphp
4. Implement Search Front End
GKH USM 41
5. Format the Response into Results Page
GKH USM 42
6. Demo
Visit the prototype at
http://ir.cs.usm.my/exsearch4/
Try search “cryptography”.
GKH USM 43
Thank you
GKH USM 44
Visit our school at cs.usm.my
The work by IR research at ir.cs.usm.my
Drop me an email at khganATusm.my

More Related Content

OSS 2020 Using SOLR as Open-Source Search Platform.pdf

  • 1. Using SOLR as Open-Source Search Platform for Organizational Research Experts Retrieval Dr. Gan Keng Hoon School of Computer Sciences Universiti Sains Malaysia GKH USM 1 7th Annual Open Source Summit 2020 Day 5 - 15th December
  • 2. Talk Outline Text Search with SOLR What is SOLR What is Text Search Text Search in an Organization Information Retrieval Concept and SOLR in Use Demo of Organizational Research Expert Retrieval GKH USM 2
  • 3. Part I: Text Search with SOLR GKH USM 3
  • 5. SOLR Solr is the enterprise search platform built on Apache Lucene™. GKH USM 5 popular blazing-fast open source
  • 6. Solr Resources Official Website: https://lucene.apache.org/solr/ Documentation: https://lucene.apache.org/solr/resources.html GKH USM 6
  • 7. What is Text Search? Are you thinking about Google? GKH USM 7
  • 8. Let’s start with something we are familiar with … Users want to type in a few simple keywords and get back great results. Involved matching query terms to documents GKH USM 8
  • 9. Ranked Results Return “ranked” documents for a query. A search engine returns documents sorted in descending order by a score that indicates the strength of the match of the document to the query. Ranking by relevancy is important. Side Question: How to determine the relevancy? GKH USM 9
  • 10. What Else Search engine also Returns Images, Videos, Social Feeds, Products etc. Provides search suggestions, auto complete etc. Gives answer, facts, files etc. GKH USM 10
  • 11. Text Search Solution in Your Organization How do you perform the search within your organization? documents finding tool in your operating system. querying of data stored in your database using Sql. Will you implement a (similar to) Google Search Engine for your organization? GKH USM 11
  • 12. Questions for Audience 1. List one TEXT or DOCUMENT LOOKUP/SEARCH/NAVIGATION PROBLEM that you have encountered in your organization. 2. Is there any existing search tool or feature implemented within your organization that you can adapt to address the problem? GKH USM 12
  • 13. Enterprise Search Managing search solutions within an organization or for the benefit of an organization … GKH USM 13
  • 14. IR and Search Engines Relevance -Effective ranking Evaluation -Testing and measuring Information needs -User interaction Performance -Efficient search and indexing Incorporating new data -Coverage and freshness Scalability -Growing with data and users Adaptability -Tuning for applications Specific problems -e.g. Spam Information Retrieval Search Engines 14 Enterprise Search Engines GKH USM
  • 15. Enterprise Search Features 1. Not only Texts. Unifying structured and unstructured data. 2. Not only Search. Search + Analytics. 3. Not for Everyone. Not addressing a general problem, but specific to application/domain/business needs. GKH USM 15
  • 16. Part II: IR Concept & SOLR in Use GKH USM 16
  • 17. High Level Information Retrieval Concept GKH USM 17 Documents Document Representation Information Needs Query Retrieved Documents Indexing Formulation Retrieval Function Relevance Feedback
  • 18. Diagram of the main components of Solr 4 GKH USM 18 Image Source: Solr in Action, Graiger & Potter Documents Indexing Document Representation Query Retrieval Function Retrieved Documents
  • 19. Glimpse of Solr In Use GKH USM 19
  • 21. Unzipping SOLR into Directory For Windows users, we highly recommend that you extract Solr to a directory that doesn’t have spaces in the name; that is, avoid extracting Solr into directories like C:Documents and Settings or C:Program Files. For example, use path like c:solr-8.2.0 instead. For Linux users, choose a location like /opt/solr/. GKH USM 21
  • 22. View SOLR in Your Directory Example directory listing of the solr-8.2.0 installation after extracting the downloaded archive on your computer. We’ll refer to the top-level directory as $SOLR_INSTALL/ throughout the rest of the slides. GKH USM 22
  • 23. Start Solr To start Solr, you need to run solr script located at the bin folder. For example, if your placed solr at c:solr-8.2.0 Open a command line, and enter the following: $ cd #this gets you to the base directory $ cd $SOLR_INSTALL #this gets you to your solr folder $ cd bin #this gets you to your bin folder $ bin/solr start Note: cd – change directory GKH USM 23
  • 25. Admin Console - http://localhost:8983/solr/ GKH USM 25
  • 26. Create Your First Core Your running server is empty. Create your first core, called “techproduct”. Go to command prompt, $ bin/solr create –c techproduct GKH USM 26
  • 27. Meet Your First Core GKH USM 27
  • 28. View Properties of The Core * You can also Add/Rename Core at the admin page. GKH USM 28
  • 29. Add Some Example Documents When you first start Solr, there are no documents in the index. It’s an empty server waiting to be filled with data to search. Let’s add some documents from exampleexampledocs directory. What are the example file types in your $ SOLR_INSTALLexampleexampledocs ?? GKH USM 29
  • 30. Use Post Tool to Add Documents For Unix user, you can call Post Tool from bin $ bin/post -c techproduct example/exampledocs/*.xml For Window user, navigate to the exampleexampledocs folder $ cd $ cd $SOLR_INSTALLexampleexampledocs $ java -jar –Dc=techproduct post.jar *.xml specify core the files to be added. In this case, we are adding all files with .xml type GKH USM 30
  • 31. Status of Added Files GKH USM 31 What is the speed of indexing?
  • 32. Let’s Search Go to http://localhost:8983/solr/ Select techproduct core Select Query tab Enter *:* at the query form. GKH USM 32
  • 33. GKH USM 33 Search results from executing the find of all documents query.
  • 34. View More Search Results In the query form, • Change start to 0 • Change rows to 32 GKH USM 34
  • 35. 14 files were indexed, but the search *:* found 32 documents. GKH USM 35
  • 36. Part III: Demo of Organizational Research Expert Retrieval GKH USM 36
  • 37. Organization Research Experts Retrieval Target Users: Students, Researchers, Collaborators looking for expertise from School of Computer Sciences, Universiti Sains Malaysia. Data Set: Scopus publication data for all academics at the school. Status: Prototype. Purpose: In house solution. Focused search and analytics capabilities. GKH USM 37
  • 38. 1. Design Document/Retrieval Unit GKH USM 38
  • 39. 2. Create Solr Core GKH USM 39 Create a new core to store the collection
  • 40. 3. Perform Indexing Perform indexing at the backend using addDocuments() by Solarium PHP library GKH USM 40 https://github.com/solariumphp
  • 41. 4. Implement Search Front End GKH USM 41
  • 42. 5. Format the Response into Results Page GKH USM 42
  • 43. 6. Demo Visit the prototype at http://ir.cs.usm.my/exsearch4/ Try search “cryptography”. GKH USM 43
  • 44. Thank you GKH USM 44 Visit our school at cs.usm.my The work by IR research at ir.cs.usm.my Drop me an email at khganATusm.my