際際滷

際際滷Share a Scribd company logo
仗仂仍亰仂于舒仆亳亠 Sphinx
 亟仍 仗仂仍仆仂亠从仂于仂亞仂
       仗仂亳从舒.
仂仍仆仂亠从仂于亶?
 仂仍仆仂亠从仂于亶 仗仂亳从  仗仂亳从 亟仂从仄亠仆舒 于 弍舒亰亠 亠从仂于 仆舒
         仂仆仂于舒仆亳亳 仂亟亠亢亳仄仂亞仂 亳 亟仂从仄亠仆仂于
舒从 仂仆仂 舒弍仂舒亠?
仂亠仄

 弍亶
 open source
 仆亠 亠弍亠 亟仂仗仂仍仆亳亠仍仆仂亞仂 
 仗亠亳舒仍仆仂 仆舒弍亢仆 于仂亰仄仂亢仆仂
  仂仂亠亞仂 于亰舒亳仄仂亟亠亶于亳  弍舒亰舒仄亳
  亟舒仆仆 SQL
 SphinxQL
亠仄仆仂亞仂 舒亳亳从亳

~3.5 仄亳仍仍亳仂仆舒 亰舒仗亳亠亶, ~5 GB 亠从舒
(亳亰 Wikipedia)
                          MySQL   Lucene   Sphinx

仆亟亠从舒亳, min           1627     176       84

仆亟亠从, MB                3011     6328    2850

Match all, ms/q            286      30       22

Match phrase, ms/q        3692      29       21

Match bool top-20, ms/q    24       29       13
仂亰仄仂亢仆仂亳

  于仂从舒 从仂仂 亳仆亟亠从舒亳亳 (亟仂 10 MB/亠从 仆舒
   仂于亠仄亠仆仆 仗仂亠仂舒)
  于仂从舒 从仂仂 仗仂亳从舒 (亠亟仆亳亶 亰舒仗仂
   仂弍舒弍舒于舒亠 仄亠仆亠亠 0.1 亠从仆亟 仆舒 2-4 GB
   亠从仂于 从仂仍仍亠从亳)
  仂弍亠仗亠亳于舒亠 仂仂 亠仍亠于舒仆仆仂 仗亠仄
   仂亠舒仆亳 舒仆亢亳仂于舒仆亳 仗仂仂亢亳 舒亰 亳 舒亳亳从亳
   舒仆亢亳仂于舒仆亳
  仗仂亟亟亠亢从舒 仂亟仆仂于亠仄亠仆仆仂 仆亠从仂仍从亳
   亟仂仗仂仍仆亳亠仍仆 舒亳弍仂于 亟仍 从舒亢亟仂亞仂 亟仂从仄亠仆舒
   (亞仗仗, 于亠仄亠仆仆亠 仄亠从亳 亳 .亟.)
亠亟仂舒从亳

 仄仂仆仂仍亳仆亠 亳仆亟亠从
 仂于亳亠 亟仂从仄亠仆舒亳亳 仆舒 从仂仄
丕舒仆仂于从舒 (*nix)



1.   舒仗舒从仂于舒
2.   $ ./configure
3.   $ make
4.   $ make install
亳 仂 亟舒仍亠?
Sphinx 仂仂亳 亳亰 亠 从仂仄仗仂仆亠仆仂于: 亞亠仆亠舒仂
亳仆亟亠从舒, 仗仂亳从仂于舒 亳亠仄舒 亳 仗仂亳从仂于舒 亳仍亳舒,
舒弍仂舒ム舒 于 从仂仄舒仆亟仆仂亶 仂从亠:
   亠仆亠舒仂 亳仆亟亠从舒 (indexer). 仆 于仗仂仍仆磳 亰舒仗仂 从 弍舒亰亠 亟舒仆仆,
    亳仆亟亠从亳亠 从舒亢亟 从仂仍仂仆从 于 从舒亢亟仂亶 仂从亠 亠亰仍舒舒 亳
    仗亳于磶于舒亠 从舒亢亟 亰舒仗亳 亳仆亟亠从舒 从 仗亠于亳仆仂仄 从仍ム 仂从亳.

   仂亳从仂于舒 亳亠仄舒 仗亠亟舒于仍磳 仂弍仂亶 亟亠仄仂仆, 从仂仂亶 仆舒亰于舒亠
    searchd.


   丕亟仂弍仆舒 亳仍亳舒 search 仗仂亰于仂仍磳 于仗仂仍仆 仗仂亳从 亳亰 从仂仄舒仆亟仆仂亶
    仂从亳 弍亠亰 仆舒仗亳舒仆亳 从仂亟舒.
舒仂亶从舒

        /usr/local/etc/sphinx.conf


 亳仂仆亳从亳 (source)
 亳仆亟亠从 (index)
 从仂仆亳亞 亟亠仄仂仆舒 (searchd)
亠仄仆仂亞仂 仗舒从亳从亳
source
source Post
{
   type               = mysql
   sql_host           = localhost
   sql_user            = sphinx
   sql_pass           = whyd0in33d1t
   sql_db             = zomg_test_forum
   sql_sock            = /tmp/mysql/mysql.sock
   sql_port            = 3306
  sql_query           = SELECT id, userId, threadId, title, text FROM Posts;
   sql_query_info     = SELECT * FROM Posts WHERE id=$id
   sql_query_range    = SELECT MIN(id), MAX(id) FROM Posts
   sql_range_step     = 500
   sql_query_pre      = SET NAMES utf8
   sql_field_string   = text
   sql_attr_multi     = uint tags from query;SELECT id FROM Tags
}
index

                               index Post
                               {
index Thread
                                    type = rt
{
                                    path = /mnt/data/Post
     type = rt
                                    rt_field = title
     path = /mnt/data/Thread
                                    rt_field = text
     rt_field = title
                                    rt_attr_string = _title
     rt_attr_string = _title
                                    rt_attr_string = _text
     rt_attr_uint = userId
                                    rt_attr_uint = threadId
    morphology = stem_enru
                                    rt_attr_uint = userId
}
                                    rt_attr_multi = tags
                                   morphology = stem_enru
                               }
searchd

searchd
{
  listen      = localhost:3307:mysql41
  port        = 3312
  log         = /etc/sphinx/searchd.log
  query_log   = /etc/sphinx/query.log
  pid_file    = /etc/sphinx/searchd.pid
}
仂仆仂 舒弍仂舒亠?

$ sudo /usr/local/bin/indexer --rotate --all
using config file '/usr/local/etc/sphinx.conf'...
indexing index 'Post'...
collected 8 docs, 0.0 MB
sorted 0.0 Mhits, 82.8% done
total 8 docs, 149 bytes
total 0.010 sec, 14900.00 bytes/sec, 800.00 docs/sec

$ /usr/local/bin/search wind
index 'Post': query 'wind ': returned 2 matches of 2 total in 0.000 sec
SphinxQL

1) SELECT * FROM Thread LIMIT 1,2

2) SELECT * FROM Post WHERE threadId = 1

3) SELECT * FROM Post WHERE match('art') ORDER BY
@weight DESC

4) SELECT _title FROM Post WHERE match('@title art')

5) SELECT * FROM Post WHERE tags in (1,2) AND match
('google')
舒仆亢亳仂于舒仆亳亠 (ranker)
 SELECT * FROM test WHERE MATCH('@title hello
 @body world')
 OPTION ranker=bm25, max_matches=3000,
   field_weights=(title=10, body=3)

    SPH_RANK_PROXIMITY_BM25 ('proximity_bm25'), 亠亢亳仄 仗仂 仄仂仍舒仆亳 -
     亳于舒亠 弍仍亳亰仂 仍仂于 亳 舒仆亢亳仂于舒仆亳亠 BM25

    SPH_RANK_BM25 ('bm25'), 仂仍从仂 BM25, 从舒从 于 弍仂仍亳仆于亠 亟亞亳 仗仂亳从仂于
     亳亠仄 (弍亠亠 1亞仂 亠亢亳仄舒)

    SPH_RANK_NONE ('none'), 于仂仂弍亠 弍亠亰 舒仆亢亳仂于舒仆亳 - 舒仄亶 弍亶 亠亢亳仄

    SPH_RANK_WORDCOUNT ('wordcount'), 仗仂仂亶 亳 弍亶, 亳舒亠 从仂仍-于仂
     仂于仗舒亟亠仆亳亶
丕舒?


1.   丕舒仆仂于亳仍亳 Sphinx
2.   舒仂亳仍亳
3.   ....
4.   profit?

More Related Content

Sphinx search

  • 1. 仗仂仍亰仂于舒仆亳亠 Sphinx 亟仍 仗仂仍仆仂亠从仂于仂亞仂 仗仂亳从舒.
  • 2. 仂仍仆仂亠从仂于亶? 仂仍仆仂亠从仂于亶 仗仂亳从 仗仂亳从 亟仂从仄亠仆舒 于 弍舒亰亠 亠从仂于 仆舒 仂仆仂于舒仆亳亳 仂亟亠亢亳仄仂亞仂 亳 亟仂从仄亠仆仂于
  • 4. 仂亠仄 弍亶 open source 仆亠 亠弍亠 亟仂仗仂仍仆亳亠仍仆仂亞仂 仗亠亳舒仍仆仂 仆舒弍亢仆 于仂亰仄仂亢仆仂 仂仂亠亞仂 于亰舒亳仄仂亟亠亶于亳 弍舒亰舒仄亳 亟舒仆仆 SQL SphinxQL
  • 5. 亠仄仆仂亞仂 舒亳亳从亳 ~3.5 仄亳仍仍亳仂仆舒 亰舒仗亳亠亶, ~5 GB 亠从舒 (亳亰 Wikipedia) MySQL Lucene Sphinx 仆亟亠从舒亳, min 1627 176 84 仆亟亠从, MB 3011 6328 2850 Match all, ms/q 286 30 22 Match phrase, ms/q 3692 29 21 Match bool top-20, ms/q 24 29 13
  • 6. 仂亰仄仂亢仆仂亳 于仂从舒 从仂仂 亳仆亟亠从舒亳亳 (亟仂 10 MB/亠从 仆舒 仂于亠仄亠仆仆 仗仂亠仂舒) 于仂从舒 从仂仂 仗仂亳从舒 (亠亟仆亳亶 亰舒仗仂 仂弍舒弍舒于舒亠 仄亠仆亠亠 0.1 亠从仆亟 仆舒 2-4 GB 亠从仂于 从仂仍仍亠从亳) 仂弍亠仗亠亳于舒亠 仂仂 亠仍亠于舒仆仆仂 仗亠仄 仂亠舒仆亳 舒仆亢亳仂于舒仆亳 仗仂仂亢亳 舒亰 亳 舒亳亳从亳 舒仆亢亳仂于舒仆亳 仗仂亟亟亠亢从舒 仂亟仆仂于亠仄亠仆仆仂 仆亠从仂仍从亳 亟仂仗仂仍仆亳亠仍仆 舒亳弍仂于 亟仍 从舒亢亟仂亞仂 亟仂从仄亠仆舒 (亞仗仗, 于亠仄亠仆仆亠 仄亠从亳 亳 .亟.)
  • 7. 亠亟仂舒从亳 仄仂仆仂仍亳仆亠 亳仆亟亠从 仂于亳亠 亟仂从仄亠仆舒亳亳 仆舒 从仂仄
  • 8. 丕舒仆仂于从舒 (*nix) 1. 舒仗舒从仂于舒 2. $ ./configure 3. $ make 4. $ make install
  • 9. 亳 仂 亟舒仍亠? Sphinx 仂仂亳 亳亰 亠 从仂仄仗仂仆亠仆仂于: 亞亠仆亠舒仂 亳仆亟亠从舒, 仗仂亳从仂于舒 亳亠仄舒 亳 仗仂亳从仂于舒 亳仍亳舒, 舒弍仂舒ム舒 于 从仂仄舒仆亟仆仂亶 仂从亠: 亠仆亠舒仂 亳仆亟亠从舒 (indexer). 仆 于仗仂仍仆磳 亰舒仗仂 从 弍舒亰亠 亟舒仆仆, 亳仆亟亠从亳亠 从舒亢亟 从仂仍仂仆从 于 从舒亢亟仂亶 仂从亠 亠亰仍舒舒 亳 仗亳于磶于舒亠 从舒亢亟 亰舒仗亳 亳仆亟亠从舒 从 仗亠于亳仆仂仄 从仍ム 仂从亳. 仂亳从仂于舒 亳亠仄舒 仗亠亟舒于仍磳 仂弍仂亶 亟亠仄仂仆, 从仂仂亶 仆舒亰于舒亠 searchd. 丕亟仂弍仆舒 亳仍亳舒 search 仗仂亰于仂仍磳 于仗仂仍仆 仗仂亳从 亳亰 从仂仄舒仆亟仆仂亶 仂从亳 弍亠亰 仆舒仗亳舒仆亳 从仂亟舒.
  • 10. 舒仂亶从舒 /usr/local/etc/sphinx.conf 亳仂仆亳从亳 (source) 亳仆亟亠从 (index) 从仂仆亳亞 亟亠仄仂仆舒 (searchd)
  • 12. source source Post { type = mysql sql_host = localhost sql_user = sphinx sql_pass = whyd0in33d1t sql_db = zomg_test_forum sql_sock = /tmp/mysql/mysql.sock sql_port = 3306 sql_query = SELECT id, userId, threadId, title, text FROM Posts; sql_query_info = SELECT * FROM Posts WHERE id=$id sql_query_range = SELECT MIN(id), MAX(id) FROM Posts sql_range_step = 500 sql_query_pre = SET NAMES utf8 sql_field_string = text sql_attr_multi = uint tags from query;SELECT id FROM Tags }
  • 13. index index Post { index Thread type = rt { path = /mnt/data/Post type = rt rt_field = title path = /mnt/data/Thread rt_field = text rt_field = title rt_attr_string = _title rt_attr_string = _title rt_attr_string = _text rt_attr_uint = userId rt_attr_uint = threadId morphology = stem_enru rt_attr_uint = userId } rt_attr_multi = tags morphology = stem_enru }
  • 14. searchd searchd { listen = localhost:3307:mysql41 port = 3312 log = /etc/sphinx/searchd.log query_log = /etc/sphinx/query.log pid_file = /etc/sphinx/searchd.pid }
  • 15. 仂仆仂 舒弍仂舒亠? $ sudo /usr/local/bin/indexer --rotate --all using config file '/usr/local/etc/sphinx.conf'... indexing index 'Post'... collected 8 docs, 0.0 MB sorted 0.0 Mhits, 82.8% done total 8 docs, 149 bytes total 0.010 sec, 14900.00 bytes/sec, 800.00 docs/sec $ /usr/local/bin/search wind index 'Post': query 'wind ': returned 2 matches of 2 total in 0.000 sec
  • 16. SphinxQL 1) SELECT * FROM Thread LIMIT 1,2 2) SELECT * FROM Post WHERE threadId = 1 3) SELECT * FROM Post WHERE match('art') ORDER BY @weight DESC 4) SELECT _title FROM Post WHERE match('@title art') 5) SELECT * FROM Post WHERE tags in (1,2) AND match ('google')
  • 17. 舒仆亢亳仂于舒仆亳亠 (ranker) SELECT * FROM test WHERE MATCH('@title hello @body world') OPTION ranker=bm25, max_matches=3000, field_weights=(title=10, body=3) SPH_RANK_PROXIMITY_BM25 ('proximity_bm25'), 亠亢亳仄 仗仂 仄仂仍舒仆亳 - 亳于舒亠 弍仍亳亰仂 仍仂于 亳 舒仆亢亳仂于舒仆亳亠 BM25 SPH_RANK_BM25 ('bm25'), 仂仍从仂 BM25, 从舒从 于 弍仂仍亳仆于亠 亟亞亳 仗仂亳从仂于 亳亠仄 (弍亠亠 1亞仂 亠亢亳仄舒) SPH_RANK_NONE ('none'), 于仂仂弍亠 弍亠亰 舒仆亢亳仂于舒仆亳 - 舒仄亶 弍亶 亠亢亳仄 SPH_RANK_WORDCOUNT ('wordcount'), 仗仂仂亶 亳 弍亶, 亳舒亠 从仂仍-于仂 仂于仗舒亟亠仆亳亶
  • 18. 丕舒? 1. 丕舒仆仂于亳仍亳 Sphinx 2. 舒仂亳仍亳 3. .... 4. profit?