際際滷

際際滷Share a Scribd company logo
Solr 蟆讌蟲豢 覦
觜 伎/螳覦
豕  蠏
2016-04-25 蠍一
Lucene
SOLR
Lucene  螳ロ 螻焔 覲願(IR, Information Retrieval) 
企襴企.
Apache Lucene 襦語 蠍磯  ろ 蟆讌
Solr ?
Solr 譯殊蠍磯1
Full-text search
  碁曙 螳麹  襦 豕.
XML, JSON螻 HTTP煙 譴 ろ 誤壱伎るゼ 螻 (Restful-API)
HTML 蠍磯 蟯襴 誤壱伎
Linearly scalable(=), auto index replication (=覲旧),
 failover(=レ譟一) 覲糾規
壱螻 螳ロ XML 蠍磯 れ 蟆
リロ 蠏語 ろ豌
(=蠍  覿,  磯伎 &  螳覦螳)
Solr 譯殊蠍磯2
- 朱 覓語伎 觜襦 Numberic , れ企覩 ,   讌
- Lucene 貎朱Μ 伎 レ
- 蟆螻 磯 讌
- Geospatial 豺蠍磯 蟆 讌 () 覦蟆 2Km 企 蟆 ..
- 焔 豕, 貂讌
- 襦蠏 覈磯
- れ螳(Near Real-time) 讀覿 瑚骸  覲旧(replication)讌
- 覿 蟆螻 れ 語ろ瑚 る(sharded) 讌
- 一危磯伎れ XML  蠏碁Μ螻 HTTP襯 伎 れ 碁逢 讌
- Apache Tika襯 伎 れ (=PDF, Word, HTML ) 覓語
蟆ろ 碁煙 蟲譟 (DataImportHandler 覦)
SOLR 語覯
蠍一覲
豈覲
語蟆
Database View Or SP CALL Solr 讀覿 
Curl 語 覦
LIVE Data
CORE
碁煙 襦蠏 企
Solr index Data
Delta-Import
Full-Import
Data Import
Crontab shell script 覦一
Replication 覦 覿一襴
Service server2Service server1Index server
碁煙 覯 蟆 覯1
Replication
Replication
蟆 覯2
Load Balancer
(XML/JSON 襴)
 : 襴 (CentOS), WAS (Tomcat)
豌 (GET)
Solr 蟆曙れ
1. SolrConfig れ
- Cache
- Replication
- DataImportHandler れ
2. Schema れ
- ろる 襴ろ れ (type, indexed, stored, required, multivalued )
- UniqueKey れ
- 蠍,  貉れろ type Analyer れ
3. Data-Config.xml れ
- Query
- DeltaImportQuery
- deltaQuery
- deletePkQuery
豺危螻襴 覃
Facet 貎朱Μ 覦 Filter 貎朱Μ
 覦 朱襴ろ 覃
襴ろ 覦
ASP Solr client 螳覦
1) れ Include
- Solr_GlobalSearch.asp (Solr Helper 企)
- Solr_JSON_Helper.asp (Jscript JSON)
2) Call Solr_JSON_Helper() 觚襦 語
3) 朱貎朱Μ 
Set srchHandler = New Solr_SearchHandler
srchHandler.init 危, , WAS, 貊企
srchHandler.setQuery "貎朱Μ
srchHandler.setDefaultOp AND
srchHandler.setSortField score DESC
srchHandler.setStart 0
srchHandler.setRows 20
srchHandler.setResultField Job_Title,Job,JOB_CONTENT
'蟆郁骸覦蠍
Set searchData = ParseJSON(srchHandler.getResults())
totalCnt = searchData.response.numFound
Set itemSet = searchData.response.docs
4) setQuery 煙
(Title:覦 And -Title:覿 And Content:豈)^1.5
http://www.solrtutorial.com/solr-query-syntax.html
貎朱Μ 
5) 貎朱Μ 
Set srchHandler = New Solr_SearchHandler
srchHandler.init 危, , WAS, 貊企
srchHandler.setQuery 貎朱Μ
srchHandler.setFacetSort 
srchHandler.setStart 0
srchHandler.setRows 0
srchHandler.setResultField 
'覲
srchHandler.setFacet "true"
srchHandler.setFacetField 覈"
srchHandler.setFacetMinCount 1
srchHandler.setFacetNamedList "arrarr"
'蟆郁骸覦蠍
Set searchData = ParseJSON(srchHandler.getResults())
totalCnt = searchData.response.numFound
Set itemSet = searchData.response.docs
ロ蠏語 螳覦 覦 蟲豢
- 襴 蠍 蠍磯 覿蠍 
-  蠍磯レ  蠍 覿(豐,譴,譬,覲給)
- 蠍 覿蠍磯ゼ 牛 DB Like 蟆豌 螳 (=譯殊襦 蟆)
- N蠏碁 伎 () 覿轟覦螳覦 -> 覿|轟|覦|覦螳|螳覦
- CODE 蠍磯 豺危螻襴 蟆り (PatternTokenizerFactory)
)
れ 讌 螳 覦 蟲豢
1. Default Operator : AND , Exact Match 覦 レ
- 伎螳覦 : 伎 AND 螳覦
2. Query-Time Boost 
- (title:覦)^1.5 (body:覦)
3. Index-Time Boost  (Data Import xmlれ)
- transformer="script:boost_up
- <field column="$docBoost" />
- row.put('$docBoost',boost_num);
4. 蟲豢 牛 蟆蠍磯 レ
- レ, , 覲牛覈  牛 レ 覩瑚鍵覦 れ 覿
覯 覦 Tomcat
- GC襦蠏, Out of Memory error Heap Dump れ
- 豢覿 JVM Heap 覃覈襴 覲 (螳覃覈襴 50% れ)
- 觜襯 ろ IO (IO, SSD)
Solr 貂 (HttpWatch)
讌蟯襴 (Solr 覯 Health Check)
覈覦 Push
襴
豐蠍郁規豢  碁觚
1. ろ螳 蟒 谿殊 語ろ
- 一危 螳煙 螻, replication る
- ろ 覲  index殊   
- 螻手碓 log螳 蠍一朱 讌讌  襦 ろ襴渚 覦一
2. Facet貎朱Μ れ 貎朱Μ * 螳 れ願 覯 覿覦
- Load Average螳 蠏 1~2 -> 9伎朱 蠍蟆 讀螳 覓語覦
朱 豺
Analyzer?
Analyzer Tokenizer N螳 TokenFilter= +
Suggest (Prefix 覦)
Indexing 覯讌=
 
Query Prefix Query=
Suggest (Edge n-Gram 覦)
Indexing
覯讌
=
 
Query Match Query=
螳矧.

More Related Content

Solr development case

  • 1. Solr 蟆讌蟲豢 覦 觜 伎/螳覦 豕 蠏 2016-04-25 蠍一
  • 2. Lucene SOLR Lucene 螳ロ 螻焔 覲願(IR, Information Retrieval) 企襴企. Apache Lucene 襦語 蠍磯 ろ 蟆讌 Solr ?
  • 3. Solr 譯殊蠍磯1 Full-text search 碁曙 螳麹 襦 豕. XML, JSON螻 HTTP煙 譴 ろ 誤壱伎るゼ 螻 (Restful-API) HTML 蠍磯 蟯襴 誤壱伎 Linearly scalable(=), auto index replication (=覲旧), failover(=レ譟一) 覲糾規 壱螻 螳ロ XML 蠍磯 れ 蟆 リロ 蠏語 ろ豌 (=蠍 覿, 磯伎 & 螳覦螳)
  • 4. Solr 譯殊蠍磯2 - 朱 覓語伎 觜襦 Numberic , れ企覩 , 讌 - Lucene 貎朱Μ 伎 レ - 蟆螻 磯 讌 - Geospatial 豺蠍磯 蟆 讌 () 覦蟆 2Km 企 蟆 .. - 焔 豕, 貂讌 - 襦蠏 覈磯 - れ螳(Near Real-time) 讀覿 瑚骸 覲旧(replication)讌 - 覿 蟆螻 れ 語ろ瑚 る(sharded) 讌 - 一危磯伎れ XML 蠏碁Μ螻 HTTP襯 伎 れ 碁逢 讌 - Apache Tika襯 伎 れ (=PDF, Word, HTML ) 覓語
  • 5. 蟆ろ 碁煙 蟲譟 (DataImportHandler 覦) SOLR 語覯 蠍一覲 豈覲 語蟆 Database View Or SP CALL Solr 讀覿 Curl 語 覦 LIVE Data CORE 碁煙 襦蠏 企 Solr index Data Delta-Import Full-Import Data Import Crontab shell script 覦一
  • 6. Replication 覦 覿一襴 Service server2Service server1Index server 碁煙 覯 蟆 覯1 Replication Replication 蟆 覯2 Load Balancer (XML/JSON 襴) : 襴 (CentOS), WAS (Tomcat) 豌 (GET)
  • 7. Solr 蟆曙れ 1. SolrConfig れ - Cache - Replication - DataImportHandler れ 2. Schema れ - ろる 襴ろ れ (type, indexed, stored, required, multivalued ) - UniqueKey れ - 蠍, 貉れろ type Analyer れ 3. Data-Config.xml れ - Query - DeltaImportQuery - deltaQuery - deletePkQuery
  • 8. 豺危螻襴 覃 Facet 貎朱Μ 覦 Filter 貎朱Μ
  • 9. 覦 朱襴ろ 覃 襴ろ 覦
  • 10. ASP Solr client 螳覦 1) れ Include - Solr_GlobalSearch.asp (Solr Helper 企) - Solr_JSON_Helper.asp (Jscript JSON) 2) Call Solr_JSON_Helper() 觚襦 語 3) 朱貎朱Μ Set srchHandler = New Solr_SearchHandler srchHandler.init 危, , WAS, 貊企 srchHandler.setQuery "貎朱Μ srchHandler.setDefaultOp AND srchHandler.setSortField score DESC srchHandler.setStart 0 srchHandler.setRows 20 srchHandler.setResultField Job_Title,Job,JOB_CONTENT '蟆郁骸覦蠍 Set searchData = ParseJSON(srchHandler.getResults()) totalCnt = searchData.response.numFound Set itemSet = searchData.response.docs 4) setQuery 煙 (Title:覦 And -Title:覿 And Content:豈)^1.5 http://www.solrtutorial.com/solr-query-syntax.html
  • 11. 貎朱Μ 5) 貎朱Μ Set srchHandler = New Solr_SearchHandler srchHandler.init 危, , WAS, 貊企 srchHandler.setQuery 貎朱Μ srchHandler.setFacetSort srchHandler.setStart 0 srchHandler.setRows 0 srchHandler.setResultField '覲 srchHandler.setFacet "true" srchHandler.setFacetField 覈" srchHandler.setFacetMinCount 1 srchHandler.setFacetNamedList "arrarr" '蟆郁骸覦蠍 Set searchData = ParseJSON(srchHandler.getResults()) totalCnt = searchData.response.numFound Set itemSet = searchData.response.docs
  • 12. ロ蠏語 螳覦 覦 蟲豢 - 襴 蠍 蠍磯 覿蠍 - 蠍磯レ 蠍 覿(豐,譴,譬,覲給) - 蠍 覿蠍磯ゼ 牛 DB Like 蟆豌 螳 (=譯殊襦 蟆) - N蠏碁 伎 () 覿轟覦螳覦 -> 覿|轟|覦|覦螳|螳覦 - CODE 蠍磯 豺危螻襴 蟆り (PatternTokenizerFactory) )
  • 13. れ 讌 螳 覦 蟲豢 1. Default Operator : AND , Exact Match 覦 レ - 伎螳覦 : 伎 AND 螳覦 2. Query-Time Boost - (title:覦)^1.5 (body:覦) 3. Index-Time Boost (Data Import xmlれ) - transformer="script:boost_up - <field column="$docBoost" /> - row.put('$docBoost',boost_num); 4. 蟲豢 牛 蟆蠍磯 レ - レ, , 覲牛覈 牛 レ 覩瑚鍵覦 れ 覿
  • 14. 覯 覦 Tomcat - GC襦蠏, Out of Memory error Heap Dump れ - 豢覿 JVM Heap 覃覈襴 覲 (螳覃覈襴 50% れ) - 觜襯 ろ IO (IO, SSD)
  • 16. 讌蟯襴 (Solr 覯 Health Check) 覈覦 Push 襴
  • 17. 豐蠍郁規豢 碁觚 1. ろ螳 蟒 谿殊 語ろ - 一危 螳煙 螻, replication る - ろ 覲 index殊 - 螻手碓 log螳 蠍一朱 讌讌 襦 ろ襴渚 覦一 2. Facet貎朱Μ れ 貎朱Μ * 螳 れ願 覯 覿覦 - Load Average螳 蠏 1~2 -> 9伎朱 蠍蟆 讀螳 覓語覦 朱 豺
  • 19. Suggest (Prefix 覦) Indexing 覯讌= Query Prefix Query=
  • 20. Suggest (Edge n-Gram 覦) Indexing 覯讌 = Query Match Query=