This document summarizes a presentation about using SOLR as an open-source search platform for organizational research expert retrieval. The presentation introduces SOLR and text search concepts, demonstrates how to set up and use a basic SOLR installation with sample data, and shows a prototype that indexes publication data from academics to enable expert search within an organization. The prototype crawls Scopus data, indexes it with SOLR, implements a search interface, and displays results to find research experts from the School of Computer Sciences.
1 of 44
Download to read offline
More Related Content
OSS 2020 Using SOLR as Open-Source Search Platform.pdf
1. Using SOLR as Open-Source Search
Platform for Organizational Research
Experts Retrieval
Dr. Gan Keng Hoon
School of Computer Sciences
Universiti Sains Malaysia
GKH USM 1
7th Annual Open Source Summit 2020
Day 5 - 15th December
2. Talk Outline
Text Search with SOLR
What is SOLR
What is Text Search
Text Search in an Organization
Information Retrieval Concept and SOLR in Use
Demo of Organizational Research Expert Retrieval
GKH USM 2
7. What is Text Search?
Are you thinking about Google?
GKH USM 7
8. Let’s start with something we are familiar
with …
Users want to type in a few simple keywords and get back great results.
Involved matching query terms to documents
GKH USM 8
9. Ranked Results
Return “ranked” documents for a query.
A search engine returns documents sorted in descending order by a
score that indicates the strength of the match of the document to the
query.
Ranking by relevancy is important.
Side Question: How to determine the relevancy?
GKH USM 9
10. What Else
Search engine also
Returns Images, Videos, Social Feeds, Products etc.
Provides search suggestions, auto complete etc.
Gives answer, facts, files etc.
GKH USM 10
11. Text Search Solution in Your Organization
How do you perform the search within your organization?
documents finding tool in your operating system.
querying of data stored in your database using Sql.
Will you implement a (similar to) Google Search Engine for
your organization?
GKH USM 11
12. Questions for Audience
1. List one TEXT or DOCUMENT
LOOKUP/SEARCH/NAVIGATION PROBLEM that you have
encountered in your organization.
2. Is there any existing search tool or feature implemented
within your organization that you can adapt to address the
problem?
GKH USM 12
14. IR and Search Engines
Relevance
-Effective ranking
Evaluation
-Testing and
measuring
Information needs
-User interaction
Performance
-Efficient search and indexing
Incorporating new data
-Coverage and freshness
Scalability
-Growing with data and users
Adaptability
-Tuning for applications
Specific problems
-e.g. Spam
Information
Retrieval
Search
Engines
14
Enterprise
Search
Engines
GKH USM
15. Enterprise Search Features
1. Not only Texts.
Unifying structured and unstructured data.
2. Not only Search.
Search + Analytics.
3. Not for Everyone.
Not addressing a general problem, but specific to
application/domain/business needs.
GKH USM 15
21. Unzipping SOLR into Directory
For Windows users, we highly recommend that you extract Solr to a directory that
doesn’t have spaces in the name; that is, avoid extracting Solr into directories like
C:Documents and Settings or C:Program Files. For example, use path like
c:solr-8.2.0 instead.
For Linux users, choose a location like /opt/solr/.
GKH USM 21
22. View SOLR in Your Directory
Example directory listing of the solr-8.2.0 installation after extracting the
downloaded archive on your computer. We’ll refer to the top-level
directory as $SOLR_INSTALL/ throughout the rest of the slides.
GKH USM 22
23. Start Solr
To start Solr, you need to run solr script located at the bin folder.
For example, if your placed solr at c:solr-8.2.0
Open a command line, and enter the following:
$ cd #this gets you to the base directory
$ cd $SOLR_INSTALL #this gets you to your solr folder
$ cd bin #this gets you to your bin folder
$ bin/solr start
Note: cd – change directory
GKH USM 23
26. Create Your First Core
Your running server is empty.
Create your first core, called “techproduct”.
Go to command prompt,
$ bin/solr create –c techproduct
GKH USM 26
28. View Properties of The Core
* You can also Add/Rename Core at the admin page.
GKH USM 28
29. Add Some Example Documents
When you first start Solr, there are no documents in the index. It’s an empty server
waiting to be filled with data to search.
Let’s add some documents from exampleexampledocs directory.
What are the example file types in your
$ SOLR_INSTALLexampleexampledocs ??
GKH USM 29
30. Use Post Tool to Add Documents
For Unix user, you can call Post Tool from bin
$ bin/post -c techproduct example/exampledocs/*.xml
For Window user, navigate to the exampleexampledocs folder
$ cd
$ cd $SOLR_INSTALLexampleexampledocs
$ java -jar –Dc=techproduct post.jar *.xml
specify core
the files to be added. In this case, we
are adding all files with .xml type
GKH USM 30
31. Status of Added Files
GKH USM 31
What is the speed of indexing?
32. Let’s Search
Go to http://localhost:8983/solr/
Select techproduct core
Select Query tab
Enter *:* at the query form.
GKH USM 32
33. GKH USM 33
Search results from
executing
the find of all
documents query.
34. View More Search Results
In the query form,
• Change start to 0
• Change rows to 32
GKH USM 34
35. 14 files were indexed, but the search *:*
found 32 documents.
GKH USM 35
36. Part III: Demo of Organizational
Research Expert Retrieval
GKH USM 36
37. Organization Research Experts Retrieval
Target Users: Students, Researchers, Collaborators looking for expertise
from School of Computer Sciences, Universiti Sains Malaysia.
Data Set: Scopus publication data for all academics at the school.
Status: Prototype.
Purpose: In house solution.
Focused search and analytics capabilities.
GKH USM 37