�ݺ�ߣ

엘라스틱서치
김종민
이메일 : jongmin.kim@elastic.co
블로그 : http://kimjmin.net
1

엘라스틱서치 - Elasticsearch
• http://elastic.co
• Open Source - https://github.com/elastic/elasticsearch
• Java
• Apache Lucene
• Restful
• JSON Document Based
• Real-time Search
• Full-text Search
2

Use Cases
• Github, Sourceforge…
3

데이터 저장 – 관계 DB
PK Text
Doc 1 blue sky green land red sun
Doc 2 blue ocean green land
Doc 3 red flower blue sky
5
• PK, Index, 칼럼을 기준으로 순서대로 검색.

데이터 저장 – 역파일 색인
6
검색어
(term)
검색어가 가리키는
대상 문서
검색어
(term)
검색어가 가리키는
대상 문서
blue Doc 1, Doc 2, Doc 3 red Doc 1, Doc 3
sky Doc 1, Doc 3 ocean Doc 2
green Doc 1, Doc 2 flower Doc 3
land Doc 1, Doc 2 sun Doc 1
• 본문의 검색어를 먼저 추출한 뒤 검색어에 해당하는 문서
를 찾음

데이터 저장 프로세스
7
원본
텍스트
색인 공간
(index)
색인 과정
(indexing)
텍스트 분석
색인에 추가
검색
(searching)
질의 생성
결과 출력

관계DB vs 엘라스틱서치
HTTP CRUD SQL
GET Read Select
PUT Update Update
POST Create Insert
DELETE Delete Delete
8
관계 DB 엘라스틱서치
데이터베이스 (Database) 인덱스 (index)
테이블(Table) 타입(Type)
열(Row) 도큐먼트 (Document)
행(Column) 필드(Field)
스키마(Schema) 매핑(Mapping)

Restful API
• 단일 URL를 통한 자원의 접근
• http 메소드를 이용해서 자원 처리
• Not Rest
• 추가 : http://site.com/user.jsp?cmd=add&id=user1&name=kim
• 조회 : http://site.com/user.jsp?id=user1
• 수정 : http://site.com/user.jsp?cmd=modify&id=user1&name=lee
• 삭제 : http://site.com/user.jsp?cmd=delete&id=user1
• Rest
• 추가 : -POST http://site.com/user/user1 {name:kim}
• 조회 : -GET http://site.com/user/user1
• 수정 : -PUT http://site.com/user/user1 {name:lee}
• 삭제 : -DELETE http://site.com/user/user1
9

엘라스틱서치 Rest API
• http://host:port/인덱스/타입/도큐먼트 id
• curl -X‘메서드’ http://host:port/인덱스/타입/도큐먼트 id -d ‘{데이터}’
10
$ curl -XPUT http://localhost:9200/books/book/1 -d '
{
"title" : "Elasticsearch Guide",
"author" : "Kim",
"date" : "2014-05-01",
"pages" : 250
}
'
{"_index":"books","_type":"book","_id":"1","_version":1,"created":true}

엘라스틱서치 Rest API
11
$ curl -XGET http://localhost:9200/books/book/1
{"_index":"books","_type":"book","_id":"1","_version":1,"found":true, "_source" :
{
"title" : "Elasticsearch Guide",
"author" : "Kim",
"date" : "2014-05-01",
"pages" : 250
}
}

클러스터(cluster)
• 엘라스틱서치 시스템의 가장 큰 단위
• 하나의 클러스터는 다수의 노드로 구성
• 하나의 클러스터를 다수의 서버로 바인딩 해서 운영, 또는
역으로 하나의 서버에서 다수의 클러스터 운용 가능
12
config/elasticsearch.yml
cluster.name: elasticsearch
$ bin/elasticsearch --cluster.name=elasticsearch

노드 (Node)
• 엘라스틱서치를 구성하는 하나의 단위 프로세스
• 다수의 샤드로 구성됨
• 같은 클러스터명을 가진 노드들은 자동으로 바인딩 됨
13
node.name: "Node1"
$ bin/elasticsearch --node.name=Node1

노드 (Node)
• http 통신 포트 : 9200~ 차례대로 증가
• 노드 간 데이터 교환 포트 : 9300~ 차례대로 증가
14
클러스터 : elasticsearch
Node1
Node2
9300
9301
9200
9201
REST API
REST API

head 플러그인
15
$ bin/plugin --install mobz/elasticsearch-head

클러스터 : elasticsearch
Node1
Node2
9300
9301
9200
9201
REST API
클러스터 : elasticsearch2
Node3
9302
9202
REST API
$ bin/elasticsearch --cluster.name=elasticsearch --node.name=Node1
$ bin/elasticsearch --cluster.name=elasticsearch --node.name=Node2
$ bin/elasticsearch --cluster.name=elasticsearch2 --node.name=Node3
16

master node & data node
• 마스터 노드 : 클러스터 상태 관리
• 데이터 노드 : 데이터 입/출력, 검색 수행
17
node.master: true
node.data: true
$ bin/elasticsearch --node.master=true --node.data=true

$ bin/elasticsearch --node.name=Node1 --node.master=true --node.data=false
$ bin/elasticsearch --node.name=Node2 --node.master=false --node.data=true
18

샤드 (shard) & 레플리카 (replica)
• 샤드 : 데이터 검색 단위 인스턴스
• 레플리카 : 샤드의 복사본
19
index.number_of_shards: 5
index.number_of_replicas: 1
$ curl -XPUT localhost:9200/books -d '
{
"settings" : {
"number_of_shards" : 5,
"number_of_replicas" : 1
}
}'

검색
20
$ curl -XPUT localhost:9200/books/book/1 -d '
{ "title": "Romeo and Juliet", "author": "William Shakespeare", "category":"Tr
agedies", "written": "1562-12-01T20:40:00", "pages" : 125 }'
{ "title": "The Prince and the Pauper", "author": "Mark Twain",
"category":"Children literature", "written": "1881-08-01T10:34:00", "pages" :
79}'
{ "title" : "Hamlet", "author": "William Shakespeare", "category":"Tragedies",
"written": "1599-06-01T12:34:00", "pages" : 172 }'

검색
21
$ curl localhost:9200/books/_search?pretty=true
{ … 중략 …
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [ {
… 중략 …
"_source":
{ "title": "Romeo and Juliet", "author": "William Shakespeare", "category":"Tragedies", "written
": "1562-12-01T20:40:00", "pages" : 125 }
},
… 중략 …

검색 - URI 검색
22
$ curl 'http://localhost:9200/books/_search?q=william&pretty=true'
$ curl 'http://localhost:9200/books/_search?q=author:william&pretty=true'
$ curl 'http://localhost:9200/books/_search?q=author:william&fields=title,auth
or,pages&pretty=true'

검색 - Request Body 검색
23
$ curl 'http://localhost:9200/books/_search?pretty=true' -d '
{
"query" : {
"match" : {
"author" : "William"
}
}
}'

텀(Term) 확인 - facet
24
$ curl 'localhost:9200/books/_search
?pretty' -d '
{
"facets" : {
"author_terms" : {
"terms" : { "field" : "author" }
}
}
}'
"facets" : {
"author_terms" : {
… 중략 …
"terms" : [ {
"term" : "william",
"count" : 2
}, {
"term" : "shakespeare",
"count" : 2
}, {
"term" : "twain",
"count" : 1
}, {
"term" : "mark",
"count" : 1
} ]

텀(Term)
25
검색어 (term) 검색어가 가리키는 대상 문서
william books/book/1, books/book/2
shakespeare books/book/1, books/book/2
twain books/book/3
mark books/book/3

26
{
"query" : {
"match" : {
}
}
}'
{
"query" : {
"term" : {
}
}
}'

매핑
27
$ curl -XPUT localhost:9200/books -d '
{
"mappings" : {
"book" : {
"properties" : {
"author" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}'

28
{ "title": "Romeo and Juliet", "author": "William Shakespeare", "category":"Tr
agedies", "written": "1562-12-01T20:40:00", "pages" : 125 }'
{ "title": "The Prince and the Pauper", "author": "Mark Twain",
"category":"Children literature", "written": "1881-08-01T10:34:00", "pages" :
79}'
{ "title" : "Hamlet", "author": "William Shakespeare", "category":"Tragedies",
"written": "1599-06-01T12:34:00", "pages" : 172 }'

텀(Term) 확인 - facet
29
$ curl 'localhost:9200/books/_search
?pretty' -d '
{
"facets" : {
"author_terms" : {
"terms" : { "field" : "author" }
}
}
}'
"facets" : {
"author_terms" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "William Shakespeare",
"count" : 2
}, {
"term" : "Mark Twain",
"count" : 1
} ]
}
}

30
{
"query" : {
"term" : {
"author" : "William Shakespeare"
}
}
}'

분석(Analyze)
• 애널라이저(Analyzer)를 이용해서 입력된 문장을 텀(term)
으로 분해하는 과정
• 1 개의 토크나이저 (Tokenizer)
• 0~n 개의 토큰 필터 (Token Filter)
31

토크나이저(Toknizer)
• Whitespace 토크나이저 - 공백, 탭, 개행 문자 등을 기준으
로 문장 분리.
32
Around the World
in Eighty Days
Whitespace
Around
the
World
in
Eighty
Days

토큰필터(Token Filter)
• Lowercase 토큰필터 - 소문자로 변환
33
Lowercase
around
the
world
in
eighty
days
Around
the
World
in
Eighty
Days

토큰필터(Token Filter)
• Stop 토큰 필터 - stopword 배제
34
Stop
around
world
eighty
days
around
the
world
in
eighty
days

애널라이저 API - _analyze
35
$ curl -XPOST 'http://localhost:9200/books/_analyze?tokenizer=whitespace&filters
=lowercase,stop&pretty' -d 'Around the World in Eighty Days'
{
"tokens" : [ {
"token" : "around", "start_offset" : 0, "end_offset" : 6, "type" : "word", "position" : 1
}, {
"token" : "world", "start_offset" : 11, "end_offset" : 16, "type" : "word", "position" : 3
}, {
"token" : "eighty", "start_offset" : 20, "end_offset" : 26, "type" : "word", "position" : 5
}, {
"token" : "days", "start_offset" : 27, "end_offset" : 31, "type" : "word", "position" : 6
} ]
}

사용자 정의 애널라이저
36
$ curl -XPOST 'localhost:9200/books/_analyze?analyzer=my_analyzer&pretty' -d 'A
round the World in Eighty Days'
curl -XPUT 'http://localhost:9200/books' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "whitespace",
"filter" : [ "lowercase", "stop" ]
}
}
}
}
}'

Ngram 토크나이저
37
Lowercase
A
Ar
Aro
Arou
Aroun
Around
Around

한글 형태소 분석기
38
한글 형태소
동해
동해물
동해물과
백두
백두산
백두산이
동해물과
백두산이

엘라스틱서치 사용시 고려사항
• 저장할 데이터 형태와 검색 결과 설계
• 데이터 매핑 구조와 애널라이저 설계
• 저장할 데이터의 유효성 검증
• 원본 데이터 보관c
39

로그스태시 (Logstash)
40
로그스태시표준 입력
파일
Syslog
표준 출력
파일
엘라스틱서치
…
…
input filter output
Network
(TCP/UDP)
Email
Twitter

42
$ bin/logstash -f standard.confstandard.conf
input {
stdin { }
}
output {
stdout { }
}
• 표준 입력  표준 출력
Hello World
2015-06-01T07:19:27.594Z Jongminui-
MacBook-Pro.local Hello World

43
• 표준 입력  표준 출력 {codec => json}
standard.conf
input {
stdin { }
}
output {
stdout { codec => json }
}
Hello World
{"message":"Hello World","@version":"1
","@timestamp":"2015-06-01T07:21:33.
876Z","host":"Jongminui-MacBook-Pro.l
ocal"}

44
• 표준 입력 {codec => json}  표준 출력 {codec => json}
standard.conf
input {
stdin { codec => json }
}
output {
stdout { codec => json }
}
{ "name":"Jongmin Kim", "age":35 }
{"name":"Jongmin Kim","age":35,"@ver
sion":"1","@timestamp":"2015-06-01T07
:25:52.784Z","host":"Jongminui-MacBoo
k-Pro.local"}

45
• 엘라스틱서치 출력
elasticsearch.conf
output {
elasticsearch {
cluster => "elasticsearch"
node_name => "node-logstash"
index => "tests"
document_type => "test-
%{+YYYY.MM.dd}"
id => "%{id}"
}
}

46
{ "id":"kimjmin", "name":"Jongmin Kim", "age":35 }
curl localhost:9200/tests/_search?pretty
…
"hits" : [ {
"_index" : "tests",
"_type" : "test-2015.06.02",
"_id" : "kimjmin",
"_score" : 1.0,
"_source":{"id":"kimjmin","name":"Jongmin Kim","age":35,"@version":"1","@timesta
mp":"2015-06-02T08:44:47.877Z","host":"Jongminui-MacBook-Pro.local"}
} ]

47
• 파일 입력
standard.conf
input {
file {
codec => json
path => "/Users/kimjmin/git/elastic-demo/data/*.log"
}
}

48
로그스태시 (Logstash) - Filter
• 입력 데이터를 분해, 추가, 삭제, 변형 등의 과정을 거친 뒤
출력으로 전송
• grok, mutate, date …
• 입력한 순서 대로 위에서 부터 차례대로 적용됨

49
• 공통
• add_field => { "comment" => "My name is %{name}" }
• remove_field => [ "name", "age" ]
• grok
• match => { "message" => "Duration: %{NUMBER:duration}" }
• mutate
• convert => { “age" => "integer" }
• lowercase => [ "name" ]
• split => { "fieldname" => "," }

50
• grok
filter.conf
filter {
grok {
match => {
"message" =>
"%{IP:client} %{WORD:method} %{
URIPATHPARAM:request} %{NUMB
ER:bytes} %{NUMBER:duration}"
}
}
}
55.3.244.1 GET /index.html 15824 0.0
43
{"message":"55.3.244.1 GET /index.htm
l 15824 0.043","@version":"1","@timest
amp":"2015-06-03T05:25:30.529Z","hos
t":"Jongminui-MacBook-Pro.local","clien
t":"55.3.244.1","method":"GET","request
":"/index.html","bytes":"15824","duration
":"0.043"}

키바나 (Kibana) - 3
• Only HTML, Javascript (AngularJS)
• 클라이언트에서 실행
• 9200 포트 개방 필요 - 보안에 취약
• 별도 웹서버 필요. Tomcat, Nginx 등.
• Facet based
52

• Elasticsearch 1.4.4 이상 필요.
• NodeJS 서버 사용 - port : 5601
• Aggregation based.
53
bin/kibana

• 기준 index, time-field 설정.
54

• Discover  Time Filter 설정
55

• Visualize 탭에서 미리 저장한 시각 도구를 가지고
Dashboard 탭에서 대시보드 작성.
61

�ݺ�ߣ

엘라스틱서치, 로그스태시, 키바나

Recommended

More Related Content

What's hot (20)

Viewers also liked (7)

Similar to 엘라스틱서치, 로그스태시, 키바나 (20)

엘라스틱서치, 로그스태시, 키바나