狠狠撸

False Sharing
隱藏在多核系統的效能陷阱
G7

Agenda
● What is false sharing
● How to avoid it
● How to use it to improvement performance

Which one is faster?
type MyTest struct {
param1 uint64
param2 uint64
}
var addTimes = 100000000
var wg sync.WaitGroup
func Inc(num *uint64) {
for i := 0; i < addTimes; i++ {
atomic.AddUint64(num, 1)
}
wg.Done()
}
func BenchmarkTestProcessNum1(b *testing.B) {
runtime.GOMAXPROCS(1)
myTest := &MyTest{}
wg.Add(2)
go Inc(&myTest.param1)
wg.Wait()
}
param1 uint64
param2 uint64
}
var addTimes = 100000000
var wg sync.WaitGroup
func Inc(num *uint64) {
for i := 0; i < addTimes; i++ {
atomic.AddUint64(num, 1)
}
wg.Done()
}
func BenchmarkTestProcessNum2(b *testing.B) {
runtime.GOMAXPROCS(2)
myTest := &MyTest{}
wg.Add(2)
wg.Wait()
}

Benchmark result
單核速度比雙核速度快了約 180%

兩個獨立的 Job ，單核跑的比雙核快，Why?

False sharing

CPU Cache
reference : https://chrisadkin.io/2015/01/20/large-memory-pages-how-they-work-and-the-logcache_access-spinlock/

False sharing，導致 CPU 被迫使用更慢的 memory 存取資料

Cache padding
param1 uint64
param2 uint64
}
param1 uint64
_p1 [8]int64
param2 uint64
_p2 [8]int64
}
ps. 目前主流 CPU cache line 為 64 byte

Benchmark result after padding

How to use it to improve performance

Lock free ring buffer
type RingBuffer struct {
head uint64
tail uint64
mask uint64
ringbuf []*entity
}
func (rb *RingBuffer) Put(item interface{}) error {
// 獲取最新的 head 位置
// 將資料放進該位置
}
func (rb *RingBuffer) Get() (interface{}, error) {
// 獲取最新的 tail 位置
// 將該位置的資料抓出來
}

Benchmark: channel, ring buffer

Who use lock free ring buffer
● LAMX Disruptor
● So You Wanna Go Fast?

example code: https://github.com/genchilu/falseSharingPresentation

狠狠撸

False sharing 隱藏在多核系統的效能陷阱

More Related Content

False sharing 隱藏在多核系統的效能陷阱