狠狠撸

狠狠撸Share a Scribd company logo
False Sharing
隱藏在多核系統的效能陷阱
G7
Agenda
● What is false sharing
● How to avoid it
● How to use it to improvement performance
What is false sharing?
Which one is faster?
type MyTest struct {
param1 uint64
param2 uint64
}
var addTimes = 100000000
var wg sync.WaitGroup
func Inc(num *uint64) {
for i := 0; i < addTimes; i++ {
atomic.AddUint64(num, 1)
}
wg.Done()
}
func BenchmarkTestProcessNum1(b *testing.B) {
runtime.GOMAXPROCS(1)
myTest := &MyTest{}
wg.Add(2)
go Inc(&myTest.param1)
go Inc(&myTest.param2)
wg.Wait()
}
type MyTest struct {
param1 uint64
param2 uint64
}
var addTimes = 100000000
var wg sync.WaitGroup
func Inc(num *uint64) {
for i := 0; i < addTimes; i++ {
atomic.AddUint64(num, 1)
}
wg.Done()
}
func BenchmarkTestProcessNum2(b *testing.B) {
runtime.GOMAXPROCS(2)
myTest := &MyTest{}
wg.Add(2)
go Inc(&myTest.param1)
go Inc(&myTest.param2)
wg.Wait()
}
Trace Result
seem better
Which one is faster?
type MyTest struct {
param1 uint64
param2 uint64
}
var addTimes = 100000000
var wg sync.WaitGroup
func Inc(num *uint64) {
for i := 0; i < addTimes; i++ {
atomic.AddUint64(num, 1)
}
wg.Done()
}
func BenchmarkTestProcessNum1(b *testing.B) {
runtime.GOMAXPROCS(1)
myTest := &MyTest{}
wg.Add(2)
go Inc(&myTest.param1)
go Inc(&myTest.param2)
wg.Wait()
}
type MyTest struct {
param1 uint64
param2 uint64
}
var addTimes = 100000000
var wg sync.WaitGroup
func Inc(num *uint64) {
for i := 0; i < addTimes; i++ {
atomic.AddUint64(num, 1)
}
wg.Done()
}
func BenchmarkTestProcessNum2(b *testing.B) {
runtime.GOMAXPROCS(2)
myTest := &MyTest{}
wg.Add(2)
go Inc(&myTest.param1)
go Inc(&myTest.param2)
wg.Wait()
}
Benchmark result
單核速度比雙核速度快了約 180%
兩個獨立的 Job ,單核跑的比雙核快,Why?
兩個獨立的 Job ,單核跑的比雙核快,Why?
False sharing
CPU Cache
CPU Cache
reference : https://chrisadkin.io/2015/01/20/large-memory-pages-how-they-work-and-the-logcache_access-spinlock/
CPU Cache
CPU Cache
CPU Cache
CPU Cache
兩個獨立的 Job ,單核跑的比雙核快,Why?
False sharing,導致 CPU 被迫使用更慢的 memory 存取資料
How to avoid:
cache padding
Cache padding
Cache padding
type MyTest struct {
param1 uint64
param2 uint64
}
type MyTest struct {
param1 uint64
_p1 [8]int64
param2 uint64
_p2 [8]int64
}
ps. 目前主流 CPU cache line 為 64 byte
Benchmark result after padding
How to use it to improve performance
Lock free ring buffer
Lock free ring buffer
Lock free ring buffer
type RingBuffer struct {
head uint64
tail uint64
mask uint64
ringbuf []*entity
}
func (rb *RingBuffer) Put(item interface{}) error {
// 獲取最新的 head 位置
// 將資料放進該位置
}
func (rb *RingBuffer) Get() (interface{}, error) {
// 獲取最新的 tail 位置
// 將該位置的資料抓出來
}
Benchmark: channel, ring buffer
Who use lock free ring buffer
● LAMX Disruptor
● So You Wanna Go Fast?
example code: https://github.com/genchilu/falseSharingPresentation
QA

More Related Content

False sharing 隱藏在多核系統的效能陷阱