狠狠撸

狠狠撸Share a Scribd company logo
The material contained in this documentation is proprietary and confidential to PIXNET. Copies are made available on the basis that use is limited to the sole purpose of
evaluating PIXNET’s capabilities. It is not permissible to use, modify, copy or disclose any information contained in this presentation document for any other purpose
without the express written permission of PIXNET. If you are not the intended recipient of this material you are requested to take immediate steps to destroy it.
Copyright ? 2018 PIXNET. All rights reserved.
2019.05.15
PIXNET 曾祺元 (小明)
如何大量提升Elasticsearch
在雲端上的效能
台灣最大社群網站
PIXNET 創立於2003年,2006年成立「優像數位媒體科技股份有限公司」,並於2007年加入城邦媒體控股集團。我們是一間以社群
為核心的科技公司,旗下主要服務包含:痞客邦、PIXgoods、PIXmarketing、PIXinsight,透過創新的數據應用、多樣化社群服務,
實現「Guide to SMART Life」企業核心價值。2018年,PIXNET 推出「全新痞客邦」加 興趣同好彼此凝聚及交流,並持續與產業各
界結盟,踏實建構「社群共榮圈」願景。
● 源起 / 目的
● Elasticsearch 簡介
● PIXNET search engine 服務架構
● 雲端硬碟 / 效能 / 費用
● 測試方式與結果
● Q & A
Agenda
源起 / 目的
Apache Solr 年久失修原作業系統老舊
地端設備擴充不易
舊架構不好改公司有新專案
全文檢索
Elasticsearch 簡介
● Search engine based on Lucene library
● Full-text search engine
● Schema free
● HTTP interface with JSON document
● Support languages:Java、PHP、Python、Go
PIXNET search engine 服務架構
此次說明部份
ES Cluster
Redis HA
API Server
ES 單台規格
● vCPU x2
● vRAM 15G
● HDD x7
● SSD x1
使用雲端技術打 快 的 AI 服務上線
Raix 15:10 101A
雲端硬碟 / 效能 / 費用
HDD 1000G Read Write
IOPS 750 1,500
Throughput (MB/s) 120 120
費用
USD 40 / Month
TWD 1,240 / Month
SSD 1000G Read Write
IOPS 15,000 ~ 30,000 15,000 ~ 30,000
Throughput (MB/s) 240 ~ 480 240 ~ 400
費用
USB 170 / Month
TWD 5,270 / Month
GCE disk IOPS / throughput vs. disk capacities
簡單說
若要提升 Disk IO 只能花更多的錢買空間來提升
有沒有能提升 Disk IO 但又不增加費用?
Cache / Hybrid / fusion ?
試試ZFS?
What is ZFS?
http://wiki.lustre.org/ZFS_OSD_Hardware_Considerations
https://docs.oracle.com/cd/E26505_01/html/E37384/zfsover-2.html#scrolltoc
● ZFS is a combined file system and logical volume manager designed by Sun
Microsystems.
● The ZFS file system is a file system that fundamentally changes the way file
systems are administered, with features and benefits not found in other file
systems available today. ZFS is robust, scalable, and easy to administer.
Platform
● Solaris / OpenSolaris
● macOS / FreeBSD
● FreeNAS / NAS4free / pfsense
● Software Raid - recommand HBA card
● 128 bit filesystem
● no fsck - scrub / resilvering
● RAID-Z / mirror
● Snapshots
https://en.wikipedia.org/wiki/ZFS
https://zfsonlinux.org/
Elasticsearch 系統規劃
ZFS Storage Pool
SSD
ZIL:30G
cache:90G
ZIL L2ARC - Cache
ARC
Raid 0
zfs zfs_arc_max
3221225472 (3G)Elasticsearch service
記憶體規劃
系統預留:500M ~ 1000M
Elasticsearch JVM:總記憶體 35% ~ 50%
ZFS ARC:總記體 35% ~ 50%
15G 記憶體分配方式
/etc/elasticsearch/jvm.options
-Xms5632m
-Xmx5632m
/etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=3221225472 # 3G
L2ARC
● 若有獨立 (SSD) Cache 則稱為 SLOG (Separate ZFS Intent Log, SLOG)
● 若沒有獨立 Cache 則由所有 (virtual devices, vdevs) 分擔 ZIL 功能
http://wiki.lustre.org/ZFS_OSD_Hardware_Considerations
Compare L2ARC on and L2ARC off
PIXNET search engine 服務架構
ES Cluster
Redis Server
API Server
Search engine cache architecture
Cache Server
(Redis)
ES
Cache
ARC
L2ARC
ZIL
H
D
D
PIXNET backend
Service
Cache
建立 ZFS Storage Pool
ZFS Storage Pool
zpool create -f -O canmount=off -m none storage /dev/sd[bcdefgh] log /dev/sdi1 cache /dev/sdi2
zfs set atime=off storage ; zfs set checksum=fletcher4 storage
zfs create -o mountpoint=/storage/esdata storage/esdata
zfs set setuid=off storage/esdata ; zfs set exec=off storage/esdata
zfs set devices=off storage/esdata
zfs create -o mountpoint=/storage/eslog storage/eslog
zfs set setuid=off storage/eslog ; zfs set exec=off storage/eslog ; zfs set devices=off storage/eslog
/etc/elasticsearch/elasticsearch.yml
path.data: /storage/esdata # 存放 Elasticsearch sharding / document 的目錄
path.logs: /storage/eslog
測試方式
ZFS
硬碟 100G x14,從 100G x2 (199G) 每次增加 1 顆,至 100G x14 (1.36T)
Single disk
單顆硬碟,從 200G 開始,每次增加 100G,至 1400G
格式化:XFS
測試工具:fio / randrw
fio --ioengine=libaio --direct=0 --name=test --filename=/storage/esdata/test --bs=4k --size=256m
--readwrite=randrw --rwmixread=50 --gtod_reduce=1
Linux software RAID (MD)
硬碟 100G x14,從 100G x2 (199G) 每次增加 1 顆,至 100G x14 (1.36T)
格式化:XFS
每個相同硬碟容量
皆產生 10 組數據
共 13 種不同容量
3 種不同 Storage Pool
共 390 組數據
測試結果
read iops
ZFS md single disk
200G 2246.0 385.2 198.4
300G 2243.7 383.6 168.1
400G 2249.0 429.8 330.6
500G 2246.6 503.1 313.1
600G 2246.2 248.4 318.6
700G 2246.2 168.1 347.2
800G 2248.2 159.6 332.4
900G 2246.7 404.3 337.6
1000G 2245.3 357.8 366.1
1100G 2248.7 306.5 384.3
1200G 2248.8 288.6 165.4
1300G 2244.7 329.7 174.0
1400G 2251.1 263.2 301.3
write iops
ZFS md single disk
200G 2256.3 387.2 196.1
300G 2253.9 385.2 168.7
400G 2259.2 431.6 332.1
500G 2257.0 505.6 314.5
600G 2256.6 249.4 319.8
700G 2256.7 169.0 348.7
800G 2258.4 160.1 324.8
900G 2257.1 383.0 339.1
1000G 2256.1 345.6 364.0
1100G 2259.3 321.6 386.0
1200G 2259.0 289.8 166.1
1300G 2255.3 331.2 175.0
1400G 2261.8 264.4 302.8
約 6.5 倍
ZFS vs single disk & ZFS vs Linux software RAID - read IOPS
~ x6.0
~ x4.5
ZFS vs single disk & ZFS vs Linux software RAID - write IOPS
~ x6.0
~ x4.5
Locust Stress Test
CPU: 16vCPU
RAM: 14.4 GB
JVM: 300MB
PUT /index/_settings
{ "index.requests.cache.enable": false }
elasticsearch.yml
indices.requests.cache.size: 0%
indices.fielddata.cache.size: 0%
10.254.250.18
ZFS: 10G x7 + 100G SSD
ZFS ARC: 7G / L2ARC: 10G
10.254.250.17
/dev/sdb: 70G , XFS
index: 1
document: 18,275,688
sotrage size: 13.1 GB
實驗組 對照組
10.254.250.18
ZFS: 10G x7 + 100G SSD
ZFS ARC: 7G / L2ARC: 10G?
10.254.250.17
/dev/sdb: 70G , XFS
Locust Stress Test - prod. environment
User QPS QPS (avg.)
4000 428.5 438.23
5000 514.8 451.1
6000 535.5 444.36
6500 469.9 444.68
CPU: 16vCPU / RAM: 60G GB
JVM: ~25G / ARC: 30G / L2ARC: 30G
User request fails fails %
4000 203951 153 0%
5000 309347 1520 0%
6000 308907 3068 1%
6500 323085 3210 1%
地端服務
雲上服務
Conclusion
● 多層 Cache 多層加
● 建議獨立 ZFS ZIL 放在 SSD 上
● ZFS / JVM 記憶體規劃合適有助系統穩定
● 16vCPU + 60G RAM + ZFS (SSD 120G) 負載 5000+ RPS
若 redis cache rate 近 95%,則 application 將有 100k RPS
● 相信雲端服務,可用 Raid 0 加
須常備份
Q & A
第六屆 PIXNET Hackathon
https://www.facebook.com/pixnetbot/
請觀注
PIXNET Hackathon
FB 粉絲團
THANK
YOU

More Related Content

20190515 - 如何大量提升 Elasticsearch 在雲端上的效能

  • 1. The material contained in this documentation is proprietary and confidential to PIXNET. Copies are made available on the basis that use is limited to the sole purpose of evaluating PIXNET’s capabilities. It is not permissible to use, modify, copy or disclose any information contained in this presentation document for any other purpose without the express written permission of PIXNET. If you are not the intended recipient of this material you are requested to take immediate steps to destroy it. Copyright ? 2018 PIXNET. All rights reserved. 2019.05.15 PIXNET 曾祺元 (小明) 如何大量提升Elasticsearch 在雲端上的效能
  • 3. ● 源起 / 目的 ● Elasticsearch 簡介 ● PIXNET search engine 服務架構 ● 雲端硬碟 / 效能 / 費用 ● 測試方式與結果 ● Q & A Agenda
  • 4. 源起 / 目的 Apache Solr 年久失修原作業系統老舊 地端設備擴充不易 舊架構不好改公司有新專案 全文檢索
  • 5. Elasticsearch 簡介 ● Search engine based on Lucene library ● Full-text search engine ● Schema free ● HTTP interface with JSON document ● Support languages:Java、PHP、Python、Go
  • 6. PIXNET search engine 服務架構 此次說明部份 ES Cluster Redis HA API Server ES 單台規格 ● vCPU x2 ● vRAM 15G ● HDD x7 ● SSD x1 使用雲端技術打 快 的 AI 服務上線 Raix 15:10 101A
  • 7. 雲端硬碟 / 效能 / 費用 HDD 1000G Read Write IOPS 750 1,500 Throughput (MB/s) 120 120 費用 USD 40 / Month TWD 1,240 / Month SSD 1000G Read Write IOPS 15,000 ~ 30,000 15,000 ~ 30,000 Throughput (MB/s) 240 ~ 480 240 ~ 400 費用 USB 170 / Month TWD 5,270 / Month GCE disk IOPS / throughput vs. disk capacities 簡單說 若要提升 Disk IO 只能花更多的錢買空間來提升 有沒有能提升 Disk IO 但又不增加費用? Cache / Hybrid / fusion ? 試試ZFS?
  • 8. What is ZFS? http://wiki.lustre.org/ZFS_OSD_Hardware_Considerations https://docs.oracle.com/cd/E26505_01/html/E37384/zfsover-2.html#scrolltoc ● ZFS is a combined file system and logical volume manager designed by Sun Microsystems. ● The ZFS file system is a file system that fundamentally changes the way file systems are administered, with features and benefits not found in other file systems available today. ZFS is robust, scalable, and easy to administer. Platform ● Solaris / OpenSolaris ● macOS / FreeBSD ● FreeNAS / NAS4free / pfsense ● Software Raid - recommand HBA card ● 128 bit filesystem ● no fsck - scrub / resilvering ● RAID-Z / mirror ● Snapshots https://en.wikipedia.org/wiki/ZFS https://zfsonlinux.org/
  • 9. Elasticsearch 系統規劃 ZFS Storage Pool SSD ZIL:30G cache:90G ZIL L2ARC - Cache ARC Raid 0 zfs zfs_arc_max 3221225472 (3G)Elasticsearch service 記憶體規劃 系統預留:500M ~ 1000M Elasticsearch JVM:總記憶體 35% ~ 50% ZFS ARC:總記體 35% ~ 50% 15G 記憶體分配方式 /etc/elasticsearch/jvm.options -Xms5632m -Xmx5632m /etc/modprobe.d/zfs.conf options zfs zfs_arc_max=3221225472 # 3G L2ARC ● 若有獨立 (SSD) Cache 則稱為 SLOG (Separate ZFS Intent Log, SLOG) ● 若沒有獨立 Cache 則由所有 (virtual devices, vdevs) 分擔 ZIL 功能
  • 11. PIXNET search engine 服務架構 ES Cluster Redis Server API Server
  • 12. Search engine cache architecture Cache Server (Redis) ES Cache ARC L2ARC ZIL H D D PIXNET backend Service Cache
  • 13. 建立 ZFS Storage Pool ZFS Storage Pool zpool create -f -O canmount=off -m none storage /dev/sd[bcdefgh] log /dev/sdi1 cache /dev/sdi2 zfs set atime=off storage ; zfs set checksum=fletcher4 storage zfs create -o mountpoint=/storage/esdata storage/esdata zfs set setuid=off storage/esdata ; zfs set exec=off storage/esdata zfs set devices=off storage/esdata zfs create -o mountpoint=/storage/eslog storage/eslog zfs set setuid=off storage/eslog ; zfs set exec=off storage/eslog ; zfs set devices=off storage/eslog /etc/elasticsearch/elasticsearch.yml path.data: /storage/esdata # 存放 Elasticsearch sharding / document 的目錄 path.logs: /storage/eslog
  • 14. 測試方式 ZFS 硬碟 100G x14,從 100G x2 (199G) 每次增加 1 顆,至 100G x14 (1.36T) Single disk 單顆硬碟,從 200G 開始,每次增加 100G,至 1400G 格式化:XFS 測試工具:fio / randrw fio --ioengine=libaio --direct=0 --name=test --filename=/storage/esdata/test --bs=4k --size=256m --readwrite=randrw --rwmixread=50 --gtod_reduce=1 Linux software RAID (MD) 硬碟 100G x14,從 100G x2 (199G) 每次增加 1 顆,至 100G x14 (1.36T) 格式化:XFS 每個相同硬碟容量 皆產生 10 組數據 共 13 種不同容量 3 種不同 Storage Pool 共 390 組數據
  • 15. 測試結果 read iops ZFS md single disk 200G 2246.0 385.2 198.4 300G 2243.7 383.6 168.1 400G 2249.0 429.8 330.6 500G 2246.6 503.1 313.1 600G 2246.2 248.4 318.6 700G 2246.2 168.1 347.2 800G 2248.2 159.6 332.4 900G 2246.7 404.3 337.6 1000G 2245.3 357.8 366.1 1100G 2248.7 306.5 384.3 1200G 2248.8 288.6 165.4 1300G 2244.7 329.7 174.0 1400G 2251.1 263.2 301.3 write iops ZFS md single disk 200G 2256.3 387.2 196.1 300G 2253.9 385.2 168.7 400G 2259.2 431.6 332.1 500G 2257.0 505.6 314.5 600G 2256.6 249.4 319.8 700G 2256.7 169.0 348.7 800G 2258.4 160.1 324.8 900G 2257.1 383.0 339.1 1000G 2256.1 345.6 364.0 1100G 2259.3 321.6 386.0 1200G 2259.0 289.8 166.1 1300G 2255.3 331.2 175.0 1400G 2261.8 264.4 302.8 約 6.5 倍
  • 16. ZFS vs single disk & ZFS vs Linux software RAID - read IOPS ~ x6.0 ~ x4.5
  • 17. ZFS vs single disk & ZFS vs Linux software RAID - write IOPS ~ x6.0 ~ x4.5
  • 18. Locust Stress Test CPU: 16vCPU RAM: 14.4 GB JVM: 300MB PUT /index/_settings { "index.requests.cache.enable": false } elasticsearch.yml indices.requests.cache.size: 0% indices.fielddata.cache.size: 0% 10.254.250.18 ZFS: 10G x7 + 100G SSD ZFS ARC: 7G / L2ARC: 10G 10.254.250.17 /dev/sdb: 70G , XFS index: 1 document: 18,275,688 sotrage size: 13.1 GB 實驗組 對照組
  • 19. 10.254.250.18 ZFS: 10G x7 + 100G SSD ZFS ARC: 7G / L2ARC: 10G? 10.254.250.17 /dev/sdb: 70G , XFS
  • 20. Locust Stress Test - prod. environment User QPS QPS (avg.) 4000 428.5 438.23 5000 514.8 451.1 6000 535.5 444.36 6500 469.9 444.68 CPU: 16vCPU / RAM: 60G GB JVM: ~25G / ARC: 30G / L2ARC: 30G User request fails fails % 4000 203951 153 0% 5000 309347 1520 0% 6000 308907 3068 1% 6500 323085 3210 1%
  • 21. 地端服務 雲上服務 Conclusion ● 多層 Cache 多層加 ● 建議獨立 ZFS ZIL 放在 SSD 上 ● ZFS / JVM 記憶體規劃合適有助系統穩定 ● 16vCPU + 60G RAM + ZFS (SSD 120G) 負載 5000+ RPS 若 redis cache rate 近 95%,則 application 將有 100k RPS ● 相信雲端服務,可用 Raid 0 加 須常備份
  • 22. Q & A