ݺߣ

ݺߣShare a Scribd company logo
ȫė󥸥
mroonga
1
⤽ȫėȤϣ
ȫė֤󤱤󤵤Full text search
ϡԥ`ˤơ}ĕե룩
ضФ뤳ȡե
䡸gһեڤЗȮʤꡢ}
ĕˤޤäơĕ˺ޤȫĤȤ
Ȥζʹä롣
Wikipediaȫė
2
ȫėʽ
grep
UNIXޥɤgrep˴롢Όg͡
ǰindexɤҪʤΥɥȤΗW
WindowsXPΗ(indexɤʤΈ)
W
3
ȫėʽ
(index)
ǰǩ`(index)ɤ٤ʗgF
4
 | եΈ | եθ | Flȡ
ձZȤνy
The worst is not, So long as we can say,
ӢZΈ
]
]
]
]
]
]
]
]
]
gZФ꤬֤䤹狼䤹
MySQL5.6ǵLjInnoDB Full Text Search⥹ک`ФͶ뤷ǩ`ǰȤƤ
ձZnZйZ
OοͤϤ褯ʳͤ
ΑBؽʹä狼N-gramˤַֽФҪ
5
ձZη1?ΑBؽ
ΑBؽȴǕʹä狼Ф
OοͤϤ褯ʳͤ
mФʴǕʹäƤmФ˥ѩ`롣
IPAǕ„ӛµȤ򥳩`ѥʹäƤΤLjԤ¤ϵ
# mecab
OοͤϤ褯ʳͤ
O ~,һ,*,*,*,*,O,ȥʥ,ȥʥ
 ~,B廯,*,*,*,*,,,
 ~,һ,*,*,*,*,,㥯,㥯
 ~,S~,*,*,*,*,,,
褯 ~,һ,*,*,*,*,褯,襯,襯
 ~,һ,*,*,*,*,,,
ʳ ~,,*,*,?д,,ʳ,,
 ~,һ,*,*,*,*,,㥯,㥯
 ~,*,*,*,?,,,,
EOS
6
mecab+IPADicǥƥ`Zȡ
# mecab
ǰƤ뤥Ĥʣ
 ^~,~ӾA,*,*,*,*,,䥹,䥹
 ~,һ,*,*,*,*,*
ǰ ~,һ,*,*,*,*,ǰ,󥻥,󥻥
 ~,һ,*,*,*,*,*
 ~,*,*,*,Z?,ԽӾA,,,
 ~,һ,*,*,*,*,*
 ~,β,݄~Z,*,*,*,,ƥ,ƥ
 ~,*,*,*,?,ԽӾA,,,
 ӛ,һ,*,*,*,*,,,
EOS
ǰƤ뤥Ĥʣ?
(ԥ20095º Ҋ)

7
ձZη2?N-gram
gZgλǤϤʤօgλǷֽ
OοͤϤ褯ʳͤ
N=1 uni-gram
N=2 bi-gram
N=3 tri-gram
?Υ
?ΑBؽꥤǥå󤭤ʤ
O?ο?ͤ?Ϥ?褯??ʳ?ʳ??ͤ
N=2 bi-gramΈ
8
褯„ȫė󥸥
쥹ݥ󥹤XMLJSON
?Elasticsearch
?Apache Solr
?AWS CloudSearch
?InnoDB FTS
(Full Text Search)
?mroonga
MySQL٩`ʤ
å
?`뤷䤹
?ץ饰_kk
!
ǥå
?RDBMSХåɤʥƥǤ
?Mzߴ
å
?RDBMSХåɤʥƥ
?Mzߤ䤹
!
ǥå
?MySQLʹʤ
?`뤷ˤ9
MySQLLIKE
LIKEǥǥåʹΤǰһ¤Τ
󥫥 LIKE ``%
󥫥 LIKE %``%
󥫥 LIKE %``
ǰһ
gһ
᷽һ
OK
NG
NG
MyISAM(5.5ǰ)InnoDB FTS(5.6Խ)ʹʤޤ
textͤΥ˥ǥåNʤ
10
MySQL٩`ȫėʹä˵ä
MySQLΥ᥸`Щ`ע
ȥ`󥸥
InnoDB
ȥ󥶥ʹ
MyISAM
?ȥ󥶥ʹʤ
?•r˥Ʃ`֥å
?=ФSELECTˤʤ
5.0
5.1
5.5
5.6
5.7(ڥЩ`)
᥸`Щ`Ƥ֤ˤ
InnoDBǥեȤĴɤΈϱ
11
Nȫė󥸥؏
12
InnoDB FTS(Full Text Search)
Tritonn
mroonga MySQL-ftppc
MySQL 5.0
MySQL 5.1
MySQL 5.6
MySQL 5.5
(My ISAMΤ)
(My ISAMΤ)
MySQLʹNȫė󥸥
13
Tritonn mroonga MySQL-ftppc InnoDB FTS
InnoDB  
MyISAM   
MeCab   
N-gram   
MySQL5.0 
MySQL5.1  
MySQL5.5  
MySQL5.6   
6 [gh] MySQLȫė󥸥Tritonn顸mroongaؤХɣ1
http://gihyo.jp/dev/clip/01/groonga/0006?page=1
Ȥ櫓
?MySQL5.0ΤߤˌꤷƤTritonnϳ
?MyISAMΤߤΌȤʤMySQL-ftppcTritonnϳ
(ѥå`MySQL5.1ԽΤ)
?ȥ󥶥ʹʤ
?•r˥Ʃ`֥å줫ղ
14
mroongaθҪ
?groongaMySQLХǥ󥰰
?°MySQL 5.6ˌ
!
?ȥ`󥸥ȤInnoDBÿ
?ȥ󥶥ÿ
?•rϥƩ`֥åǤϤʤХå
!
?Ngramuni-gram(n=1)bigram(n=2)tri-gram(n=3)Ȥ
?ÿ
!
?åѩ``ɤȥȥ``
?åѩ``
??ȴΥȥ`󥸥åפΤDŽ
?ȥ``
??ȥ󥶥ʹʤ
15
mroongaθҪ
g\äȤƤϥǩ`hr˷֤ȤҪʤ
CREATE TABLE dtb_products (
id INT PRIMARY KEY AUTO_INCREMENT,
keyword VARCHAR(255),
FULLTEXT INDEX (keyword) COMMENT 'parser "TokenUnigram"'
) ENGINE=mroonga comment = 'engine "innodb"';
INSERT INTO dtb_products VALUES(0, դϤݤǤ )
!
SELECT * FROM dtb_products MATCH(keyword) AGAINST( + IN BOOLEAN MODE)
16
Ʒ(mroonga)
?ȥ`󥸥mroonga
?``ɥˌ
?FULL TEXT INDEX׷
Ʒ(InnoDB FTS)
?ȥ`󥸥InnoDB
?``ɥˌ
?FULL TEXT INDEX׷
Ʒ(LIKE )
?ȥ`󥸥InnoDB
?``ɥˌ
?ͨΥǥå׷
100(5000Ʒ 200n)
4(200ƥ` 200n)
100(5000Ʒ 200n)
100(5000Ʒ 200n)
100(5000Ʒ 200n)
¤ΥƩ`֥ˌSELECTUPDATEĤ
1،gФƥ٥ީ`gʩ
ʹäƩ`֥
17
ʹäh
`
ߥɥ륦
AWSm1.mediumʹá
?CPU Intel(R) Xeon(R) CPU E5430 @ 2.66GHz 1
?3.75GB
!
CentOS 6.4 x86_64
?MySQL5.6.12
?mroonga4.0.0
18
SELECT
Ʒ
,

FROM
ƥ`
LEFT JOIN
Ʒƥ`(~)
ON
ƥ`.ƥ`ID=Ʒƥ`.ƥ`ID
LEFT JOIN
Ʒ USING(ƷID)
WHERE
Ʒ.ե饰=0 AND Ʒ.Ʃ`=1
AND Ʒ.`` LIKE %Z

LIMIT 15 OFFSET 0
SELECT(LIKE)
Zϥꚰ˥˩`Zʹ
19
SELECT
Ʒ
,

FROM
ƥ`
LEFT JOIN
Ʒƥ`(~)
ON
ƥ`.ƥ`ID=Ʒƥ`.ƥ`ID
LEFT JOIN
Ʒ USING(ƷID)
WHERE
Ʒ.ե饰=0 AND Ʒ.Ʃ`=1
AND MATCH(Ʒ.``) AGAINST( Z )

LIMIT 15 OFFSET 0
SELECT(mroonga/InnoDB FTS)
Zϥꚰ˥˩`Zʹ
20
LIKEgһ
InnoDB FTS
mroonga(mecab)
mroonga(Unigram)
mroonga(Bigram)
mroonga(Trigram)
10 100
5
16
27
9
19
3,000
14
21
39
17
31
2,000
ƽ 낎
SELECTgЕrg

mroongaInnoDB FTSLIKEgһ—˱Ȥ100ǰ١
mroongaNgramΈϡN΂󤭤ۤɸ
ߥ
100Υǩ`Ͷ뤷Ʃ`֥1SELECTg
21
UPDATE
Ʒ
SET
keyword= 
WHERE
Ʒ.ƷID=ID
UPDATE
ƷIDϥꚰ˥˩`IDʹ
22
LIKEgһ
InnoDB FTS
mroonga(mecab)
mroonga(Unigram)
mroonga(Bigram)
mroonga(Trigram)
0 2.5 5 7.5 10
7
7
8
7
6
6
7
7
9
8
7
7
ƽ 낎
UPDATEgЕrg
ߥ
ɤʹäƤ⤽ۤɲo
100Υǩ`Ͷ뤷Ʃ`֥1UPDATEg
23
LIKEgһ
InnoDB FTS
mroonga(mecab)
mroonga(Unigram)
mroonga(Bigram)
mroonga(Trigram)
10 100 1000 10000
291.45
210.82
208.69
201.01
6,106.13
75.8
100ǩ`index•rg


InnoDB FTSW
mroongaNgramΈN΂󤭤ۤɸ¤˕rg줫
24
ݤˤʤäƤ
Ʃ`֥ˤ¤ޤޤޤgroongaޤ
1ĤΥ`󥵥: 4096Bytes
`ΥӋނ: 4GBytes
!
gHˤTƼsˤӛ΂ޤǵ_ʤϤ⤢ޤ
25
5.  ? Mroonga v4.01 documentation
http://mroonga.org/ja/docs/reference/limitations.html

More Related Content

Viewers also liked (16)

Ѳ+ѰǴDzԲȫė
Ѳ+ѰǴDzԲȫėѲ+ѰǴDzԲȫė
Ѳ+ѰǴDzԲȫė
yoyamasaki
?
ҰǴDzԲ2015
ҰǴDzԲ2015ҰǴDzԲ2015
ҰǴDzԲ2015
Kouhei Sutou
?
Ѳ⳧ϳͨȫė󥸥󳧱ԲԲ/ǴDzԲäˤĤ
Ѳ⳧ϳͨȫė󥸥󳧱ԲԲ/ǴDzԲäˤĤѲ⳧ϳͨȫė󥸥󳧱ԲԲ/ǴDzԲäˤĤ
Ѳ⳧ϳͨȫė󥸥󳧱ԲԲ/ǴDzԲäˤĤ
Tasuku Suenaga
?
ҰǴDzԲνؤڴ
ҰǴDzԲνؤڴҰǴDzԲνؤڴ
ҰǴDzԲνؤڴ
Naoya Murakami
?
ȫė󥸥󲵰ǴDzԲܽ鰪DZǿ
ȫė󥸥󲵰ǴDzԲܽ鰪DZǿȫė󥸥󲵰ǴDzԲܽ鰪DZǿ
ȫė󥸥󲵰ǴDzԲܽ鰪DZǿ
Tasuku Suenaga
?
Ѳ⳧ϳȫėƥѰǴDzԲ()
Ѳ⳧ϳȫėƥѰǴDzԲ()Ѳ⳧ϳȫėƥѰǴDzԲ()
Ѳ⳧ϳȫėƥѰǴDzԲ()
Hideshi Ogoshi
?
Elastic searchȤ_󥸥˥㏊20140212
Elastic searchȤ_󥸥˥㏊20140212Elastic searchȤ_󥸥˥㏊20140212
Elastic searchȤ_󥸥˥㏊20140212
󥸥˥㏊ `ӥ
?
Ichii gree-crooz-20120126
Ichii gree-crooz-20120126Ichii gree-crooz-20120126
Ichii gree-crooz-20120126
Takashi Ichii
?
پ𲹰٩`ȫėƥ
پ𲹰٩`ȫėƥپ𲹰٩`ȫėƥ
پ𲹰٩`ȫėƥ
Shinsuke Sugaya
?
blog`ӥȫėԒ - #groonga Ϧ
blog`ӥȫėԒ - #groonga Ϧblog`ӥȫėԒ - #groonga Ϧ
blog`ӥȫėԒ - #groonga Ϧ
Masahiro Nagano
?
ձȫėѲ⳧ϳͣ
ձȫėѲ⳧ϳͣձȫėѲ⳧ϳͣ
ձȫėѲ⳧ϳͣ
Kouhei Sutou
?
MySQL 5.7 InnoDB ձZȫėΣ
MySQL 5.7 InnoDB ձZȫėΣMySQL 5.7 InnoDB ձZȫėΣ
MySQL 5.7 InnoDB ձZȫėΣ
yoyamasaki
?
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loader
Sadayuki Furuhashi
?
How To Become A Rubyist
How To Become A RubyistHow To Become A Rubyist
How To Become A Rubyist
masayoshi takahashi
?
Rails`ȥꥢΚi (4)
Rails`ȥꥢΚi (4)Rails`ȥꥢΚi (4)
Rails`ȥꥢΚi (4)
Yohei Yasukawa
?
Ѳ⳧ϳͨȫė󥸥󳧱ԲԲ/ǴDzԲäˤĤ
Ѳ⳧ϳͨȫė󥸥󳧱ԲԲ/ǴDzԲäˤĤѲ⳧ϳͨȫė󥸥󳧱ԲԲ/ǴDzԲäˤĤ
Ѳ⳧ϳͨȫė󥸥󳧱ԲԲ/ǴDzԲäˤĤ
Tasuku Suenaga
?
ȫė󥸥󲵰ǴDzԲܽ鰪DZǿ
ȫė󥸥󲵰ǴDzԲܽ鰪DZǿȫė󥸥󲵰ǴDzԲܽ鰪DZǿ
ȫė󥸥󲵰ǴDzԲܽ鰪DZǿ
Tasuku Suenaga
?
Elastic searchȤ_󥸥˥㏊20140212
Elastic searchȤ_󥸥˥㏊20140212Elastic searchȤ_󥸥˥㏊20140212
Elastic searchȤ_󥸥˥㏊20140212
󥸥˥㏊ `ӥ
?
Ichii gree-crooz-20120126
Ichii gree-crooz-20120126Ichii gree-crooz-20120126
Ichii gree-crooz-20120126
Takashi Ichii
?
blog`ӥȫėԒ - #groonga Ϧ
blog`ӥȫėԒ - #groonga Ϧblog`ӥȫėԒ - #groonga Ϧ
blog`ӥȫėԒ - #groonga Ϧ
Masahiro Nagano
?
MySQL 5.7 InnoDB ձZȫėΣ
MySQL 5.7 InnoDB ձZȫėΣMySQL 5.7 InnoDB ձZȫėΣ
MySQL 5.7 InnoDB ձZȫėΣ
yoyamasaki
?
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loader
Sadayuki Furuhashi
?

More from 󥸥˥㏊ `ӥ (20)

`ӥ¼k
`ӥ¼k`ӥ¼k
`ӥ¼k
󥸥˥㏊ `ӥ
?
СӤˤ봡ο
СӤˤ봡οСӤˤ봡ο
СӤˤ봡ο
󥸥˥㏊ `ӥ
?
React Redux Redux-Saga + `Хɥ
React Redux Redux-Saga + `ХɥReact Redux Redux-Saga + `Хɥ
React Redux Redux-Saga + `Хɥ
󥸥˥㏊ `ӥ
?
Azure container serviceϤǥƥʥ٩`ǥ`ȥ`Η^򤷤Ƥߤ
Azure container serviceϤǥƥʥ٩`ǥ`ȥ`Η^򤷤ƤߤAzure container serviceϤǥƥʥ٩`ǥ`ȥ`Η^򤷤Ƥߤ
Azure container serviceϤǥƥʥ٩`ǥ`ȥ`Η^򤷤Ƥߤ
󥸥˥㏊ `ӥ
?
ϤƤƷ
ϤƤƷϤƤƷ
ϤƤƷ
󥸥˥㏊ `ӥ
?
Ҵˤ±ץꥱ`󿪰k
Ҵˤ±ץꥱ`󿪰kҴˤ±ץꥱ`󿪰k
Ҵˤ±ץꥱ`󿪰k
󥸥˥㏊ `ӥ
?
Cеѧ饤֥ : TensorFlow
Cеѧ饤֥ : TensorFlowCеѧ饤֥ : TensorFlow
Cеѧ饤֥ : TensorFlow
󥸥˥㏊ `ӥ
?
Developer Summit 2016 μӤƤޤ
Developer Summit 2016 μӤƤޤDeveloper Summit 2016 μӤƤޤ
Developer Summit 2016 μӤƤޤ
󥸥˥㏊ `ӥ
?
ۤΤնٶ
ۤΤնٶۤΤնٶ
ۤΤնٶ
󥸥˥㏊ `ӥ
?
ǰդǺΤä
ǰդǺΤäǰդǺΤä
ǰդǺΤä
󥸥˥㏊ `ӥ
?
2016 gv (1)
2016  gv (1)2016  gv (1)
2016 gv (1)
󥸥˥㏊ `ӥ
?
ٴdzäޥ`ӥˤĤ
ٴdzäޥ`ӥˤĤٴdzäޥ`ӥˤĤ
ٴdzäޥ`ӥˤĤ
󥸥˥㏊ `ӥ
?
ոΥƥ
ոΥƥոΥƥ
ոΥƥ
󥸥˥㏊ `ӥ
?
ܰǶƤеѧϰΤˤĤ
ܰǶƤеѧϰΤˤĤܰǶƤеѧϰΤˤĤ
ܰǶƤеѧϰΤˤĤ
󥸥˥㏊ `ӥ
?
쥤ȤˤĤ
쥤ȤˤĤ쥤ȤˤĤ
쥤ȤˤĤ
󥸥˥㏊ `ӥ
?
르ꥺȥǩ`죨i
르ꥺȥǩ`죨i르ꥺȥǩ`죨i
르ꥺȥǩ`죨i
󥸥˥㏊ `ӥ
?
ιʥ󥸥˥ϥƥȤ򤷤ʤΤ
ιʥ󥸥˥ϥƥȤ򤷤ʤΤιʥ󥸥˥ϥƥȤ򤷤ʤΤ
ιʥ󥸥˥ϥƥȤ򤷤ʤΤ
󥸥˥㏊ `ӥ
?
ǰդα򿼤Ƥߤ뻰
ǰդα򿼤Ƥߤ뻰ǰդα򿼤Ƥߤ뻰
ǰդα򿼤Ƥߤ뻰
󥸥˥㏊ `ӥ
?
ܰαȤλ
ܰαȤλܰαȤλ
ܰαȤλ
󥸥˥㏊ `ӥ
?
󥸥˥ǿ᡿ƷʤäƤʤʤΤ
󥸥˥ǿ᡿ƷʤäƤʤʤΤ󥸥˥ǿ᡿ƷʤäƤʤʤΤ
󥸥˥ǿ᡿ƷʤäƤʤʤΤ
󥸥˥㏊ `ӥ
?

ȫė󥸥ѰǴDzԲ奨󥸥˥ǿ20140418