ݺߣ

ݺߣShare a Scribd company logo
Suf?x Array
Solr       2011/12/19




       1
?              (@nobu_k)

? Preferred Infrastructure (PFI   FI)

  ?
  ?
? Sedue(2 )
                      2
Suf?x Array
?   Suf?x Array(SA):

?   (      )                       1

    ?    Sedue

?   SA

?
    ?    +Sedue

?                         -           


                               3
?
?
    ?              (       )

    ? n-gram(q-gram)
?
                       4
Suf?x Array
?
?
? n-gram
  ?
?
                5
Suf?x(                )
                   0:   mississippi
                   1:   ississippi
                   2:   ssissippi
                   3:   sissippi
mississippi
                   4:   issippi
                   5:   ssippi
                   6:   sippi
                   7:   ippi
                   8:   ppi
                   9:   pi
                  10:   i
              6
Suf?x Array
 0:   mississippi       10:   i
 1:   ississippi         7:   ippi
 2:   ssissippi          4:   issippi
 3:   sissippi           1:   ississippi
 4:   issippi            0:   mississippi
 5:   ssippi             9:   pi
 6:   sippi              8:   ppi
 7:   ippi               6:   sippi
 8:   ppi                3:   sissippi
 9:   pi                 5:   ssippi
10:   i                  2:   ssissippi

                    7
10:   i
 7:   ippi          ?   mississippi    si
 4:   issippi
 1:   ississippi    ?   si
 0:   mississippi
 9:   pi            ?
 8:   ppi
 6:
 3:
      sippi
      sissippi
                    ?
 5:   ssippi
 2:   ssissippi         ?      3   6


                        8
10:   i             SA[i]:
 7:   ippi
 4:   issippi       10 7 4 1 0 9 8 6 3 5 2
 1:   ississippi     T[i]:
 0:   mississippi
 9:   pi            m i s s i s s i p p i
 8:   ppi
 6:   sippi
 3:   sissippi                   6
 5:   ssippi          T[SA[6]]
 2:   ssissippi      T[8]
                     ppi
                       9
(1/3)

T[i]:
   1    2         3   ...        n

                            SA

            SA[i]


             10
(2/3)
RedBull            !!

1. RedBull                             *2
    RedBull        SA[i]
          2.
    RedBull


     1         2        3   ...    n
                            11
(3/3)
3.
     RedBull


      1          2      3        ...          n

4.

(     1, 3), (       2, 4), (          3, 2),...,(   n, 2)

                            12
?       SA

    ?                  +

    ?        /n-gram

? SA
?
               13
SA
?                        (n-gram        )

    ?
?               n-gram

?
    ?
        ?   THIS IS IT
    ?   proximity

                                   14
SA
?
    ?
    ?
?
    ?   HDD

    ?         (        )

?
    ?
                  15
?
?                                   (   )

    ?   SAIS

    ?
?   HDD

    ?                                   (dc3, dc7)

?   Sedue      Haskell        C++

    ?   @tanakh++


                         16
?                         (             )

    ?
    ?
        ?    1       100GB/day

?   Sedue

    ?   SA                n-gram

    ?            n-gram

    ?   SA           n-gram

    ?

                                   17
HDD
?               HDD

?                     OK

    ?
    ?
?   SSD

    ?   SSD

?   Sedue              20       (80MB)

    ?   SA[i]


                           18
VS
1.   SA

     ?
2.

     ?    SSD+              500

3.

     ?    O(N)       CPU

4.

     ?
?           malloc



                           19
?   Sedue   1                  56

    ?           : 40

    ?             : 16          (UTF-16)

    ?                           2   3

?
    ?                           =

    ?
        ?                SSD

?

                                        20
SA
?
    ?            4(+1)

        ?   2-gram

    ?
        ?                %        OK

    ?
?           

    ?
                             21
?
    ?
    ?
?
    ?
        22
: groonga

? Sedue   groonga

  ?
?
? Sedue       groonga!!


                    23
:
?

?
?              (http://jubat.us/)

    ?   http://github.com/jubatus
    ?   @JubatusOf?cial
?                    with NTT PF

                               24
: Fluentd
?            Ruby

? Treasure Data, Inc.
  ? @frsyuki, @kzk_mover
? Solr
? gem install ?uentd
? Visit http://?uentd.org/doc/ now!!
                      25
?


    26

More Related Content

Suffix Array@Solr㏊

  • 1. Suf?x Array Solr 2011/12/19 1
  • 2. ? (@nobu_k) ? Preferred Infrastructure (PFI FI) ? ? ? Sedue(2 ) 2
  • 3. Suf?x Array ? Suf?x Array(SA): ? ( ) 1 ? Sedue ? SA ? ? +Sedue ? - 3
  • 4. ? ? ? ( ) ? n-gram(q-gram) ? 4
  • 6. Suf?x( ) 0: mississippi 1: ississippi 2: ssissippi 3: sissippi mississippi 4: issippi 5: ssippi 6: sippi 7: ippi 8: ppi 9: pi 10: i 6
  • 7. Suf?x Array 0: mississippi 10: i 1: ississippi 7: ippi 2: ssissippi 4: issippi 3: sissippi 1: ississippi 4: issippi 0: mississippi 5: ssippi 9: pi 6: sippi 8: ppi 7: ippi 6: sippi 8: ppi 3: sissippi 9: pi 5: ssippi 10: i 2: ssissippi 7
  • 8. 10: i 7: ippi ? mississippi si 4: issippi 1: ississippi ? si 0: mississippi 9: pi ? 8: ppi 6: 3: sippi sissippi ? 5: ssippi 2: ssissippi ? 3 6 8
  • 9. 10: i SA[i]: 7: ippi 4: issippi 10 7 4 1 0 9 8 6 3 5 2 1: ississippi T[i]: 0: mississippi 9: pi m i s s i s s i p p i 8: ppi 6: sippi 3: sissippi 6 5: ssippi T[SA[6]] 2: ssissippi T[8] ppi 9
  • 10. (1/3) T[i]: 1 2 3 ... n SA SA[i] 10
  • 11. (2/3) RedBull !! 1. RedBull *2 RedBull SA[i] 2. RedBull 1 2 3 ... n 11
  • 12. (3/3) 3. RedBull 1 2 3 ... n 4. ( 1, 3), ( 2, 4), ( 3, 2),...,( n, 2) 12
  • 13. ? SA ? + ? /n-gram ? SA ? 13
  • 14. SA ? (n-gram ) ? ? n-gram ? ? ? THIS IS IT ? proximity 14
  • 15. SA ? ? ? ? ? HDD ? ( ) ? ? 15
  • 16. ? ? ( ) ? SAIS ? ? HDD ? (dc3, dc7) ? Sedue Haskell C++ ? @tanakh++ 16
  • 17. ? ( ) ? ? ? 1 100GB/day ? Sedue ? SA n-gram ? n-gram ? SA n-gram ? 17
  • 18. HDD ? HDD ? OK ? ? ? SSD ? SSD ? Sedue 20 (80MB) ? SA[i] 18
  • 19. VS 1. SA ? 2. ? SSD+ 500 3. ? O(N) CPU 4. ? ? malloc 19
  • 20. ? Sedue 1 56 ? : 40 ? : 16 (UTF-16) ? 2 3 ? ? = ? ? SSD ? 20
  • 21. SA ? ? 4(+1) ? 2-gram ? ? % OK ? ? ? 21
  • 22. ? ? ? ? ? 22
  • 23. : groonga ? Sedue groonga ? ? ? Sedue groonga!! 23
  • 24. : ? ? ? (http://jubat.us/) ? http://github.com/jubatus ? @JubatusOf?cial ? with NTT PF 24
  • 25. : Fluentd ? Ruby ? Treasure Data, Inc. ? @frsyuki, @kzk_mover ? Solr ? gem install ?uentd ? Visit http://?uentd.org/doc/ now!! 25
  • 26. ? 26

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n