狠狠撸

Chapter 8: Efficient collection processing
Effective otlin 讀書會
范聖佑
JetBrains
Developer Advocate
2022/12/08 導讀

第八章：有效率的 Collection 操作
—
? Collection 幾乎無所不在
? Collection 操作是 Functional Programming 裡最重要的功能
? Collection 操作最佳化在?型且在意效能的系統裡特別重要
? 好消息！Collection 操作最佳化並不難，只需掌握幾個原則即可！

討論主題
—
? 主題 51：傾向使? Sequences 來取代巨量且有多次操作?為的 Collection
? 主題 52：考慮使? Map 来储存关联的元素
? 主題 53：考慮使? groupingBy 來取代 groupBy
? 主題 54：限制操作的數量
? 主題 55：(有效能考量時) 使? Primitive Array 來操作資料
? 主題 56：考慮使? Mutable Collection

主題 51
傾向使? Sequences 來取代巨量且有多次
操作?為的 Collection

Iterable 與 Sequence 的差異
—
? Iterable
- 每?步都回傳?個新的 Collection
- 積極的 (Eager)，每?步都會觸發運算
? Sequence
- 每?步都回傳?個新的 Sequence
- 懶惰的 (Lazy)，除非有結束型操作才會開始運算

[Effective Kotlin 讀書會] 第八章 Efficient collection processing 導讀

Sequence 的惰性帶來的好處
—
? 操作時依?然順序執?
? 只做最少量的操作
? 可以無限?
? 不需要在每?步都產?新的 Collection

操作時依?然順序執?
—
? Sequence 的運算順序符合?然邏輯

預測?下兩者間的
操作順序？
—
sequenceOf(1, 2, 3)
.filter { print("F$it, "); it % 2 !" 1 }
.map { print("M$it, "); it * 2 }
.forEach { print("E$it, ") }
listOf(1, 2, 3)

sequenceOf(1, 2, 3)
!" Prints: F1, M1, E2, F2, F3, M3, E6,
listOf(1, 2, 3)
!" Prints: F1, F2, F3, M1, M3, E2, E6,
預測?下兩者間的
操作順序？
—

不? Collection 時
—
for (e in listOf(1, 2, 3)) {
print("F$e, ")
if (e % 2 !" 1) {
print("M$e, ")
val mapped = e * 2
print ("E$mapped, ")
}
}
!" Prints: F1, M1, E2, F2, F3, M3, E6,

只做最少量的操作
—
? 不做多餘事 (懶惰?上)

誰的動作比較少？
—
(1!#10)
.find { it > 5 }
(1!#10).asSequence()
.find { it > 5 }

誰的動作比較少？
—
(1!#10)
.find { it > 5 }
!" Prints: F1, F2, F3, F4, F5, F6, F7, F8,
!" F9, F10, M1, M3, M5, M7, M9,
(1!#10).asSequence()
.find { it > 5 }
!" Prints: F1, M1, F2, F3, M3,

官網 Iterable 範例
—
val words = "The quick brown fox jumps over the lazy dog"
.split(" ")
val lengthsList =
words.filter {
it.length > 3
}.map {
it.length
}.take(4)
println("Lengths of first 4 words longer than 3 chars:")
println(lengthsList)

官網 Sequence 範例
—
val words = "The quick brown fox jumps over the lazy dog"
.split(" ")
val wordsSequence = words.asSequence()
val lengthsSequence =
wordsSequence.filter {
it.length > 3
}.map {
it.length
}.take(4)
println("Lengths of first 4 words longer than 3 chars")
println(lengthsSequence.toList())

可以無限?
—
? 可? generateSequence() 或 sequence() 產?無限?
的 Sequence
? ?定要?結束型操作 (如 take()、first()、find()、
indexOf()…等) 才可取回
? 注意！? any()、all()、none() 前?定要有結束型操
作，以免進入無窮迴圈

產? Sequence
opt.1
—
generateSequence(1) { it + 1 }
.map { it * 2 }
.take(10)
.forEach { print("$it, ") }
!" Prints: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20,

產? Sequence
opt.2
—
val fibonacci: Sequence<BigDecimal> = sequence {
var current = 1.toBigDecimal()
var prev = 1.toBigDecimal()
yield(prev)
while (true) {
yield(current)
val temp = prev
prev = current
current += temp
}
}
print(fibonacci.take(10).toList())
!" Prints: [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

不需要在每?步都產?新的 Collection
—
? ?量的時候感覺不出來，處理?型檔案時就很有感

什麼時候會產?
新的 Collection？
—
numbers
.filter { it % 10 !" 0 } !" 1 collection here
.map { it * 2 } !" 1 collection here
.sum()
!" In total, 2 collections created under the hood
numbers
.asSequence()
.filter { it % 10 !" 0 }
.map { it * 2 }
.sum()
!" No collections created

?檔案比較
—
File("!!$").readLines()
.drop(1) !" Drop descriptions of the columns
.mapNotNull { it.split(",").getOrNull(6) }
!" Find description
.filter { "CANNABIS" in it }
.count()
.let(!%println)
File("!!$").useLines { lines: Sequence<String> !"
lines.drop(1) !" Drop descriptions of the columns
.mapNotNull { it.split(",").getOrNull(6) }
!" Find description
.filter { "CANNABIS" in it }
.count()
.let { println(it) }
}

「傾向使? Sequences 來取代巨量且有多次操作?為的 Collection 」
這句話到底是什麼意思？
—
? 多?叫巨量？
- 上萬筆以上的資料
- 元素本?就好幾 MB
? 多少次叫多次？
- 超過?次
- 依據數量級的不同才會有感

什麼時候 Sequence 會不快？
—
? 當使? sorted() 的時候
- 還是得把整個 Collection 跑?遍 (底層有做?些轉換)
- 無法?在無限?的 Sequence 上
- 但在?部份的情況下還是可能比 Collection 快?點

怎麼看 Java Stream？
—
? Kotlin 的函式比較多且語法比較好?
? Kotlin 的 Sequence 可以多平台使?
? Java 的 Stream 可以啟動 Parallel 模式，可以有很?幅度的
效能強化，但有些陷阱要注意
? 不使? Parallel 模式時，很難說 Stream 的效能比
Sequence 好

主題 52
考慮使? Map 来储存关联的元素

需要在 Collection 裡找東?的情境
—
? 從?個或多個檔案載入設定值的 Class
? 儲存下載資料的網路 Repository
? (在測試裡很常?的) 記憶體 Repository

從 Collection 裡取值時的考量
—
? 透過唯?值 (identifier、name、unique key) 在
Collection 裡搜尋取值
? 需要對每?個元素做比對
? 需考慮線性複雜度，愈?愈慢

Map 是個好解?
—
? Kotlin 的 Map 預設是? LinkedHashMap 實作
? 從 HashMap 裡找東?快多了
? 但只有找東?快，假如是修改或是迭代元素的話，List 和
Map 的差異不?

? List 實作
—
class InMemoryUserRepoWithList : UserRepo {
private val users: MutableList<User> = mutableListOf()
override fun getUser(id: Int): User? = users
.firstOrNull { it.id !" id }
fun addUser(user: User) {
users.add(user)
}
}

? Map 實作
—
class InMemoryUserRepoWithMap : UserRepo {
private val users: MutableMap<Int, User> = mutableMapOf()
override fun getUser(id: Int): User? = users[id]
fun addUser(user: User) {
users[user.id] = user
}
}

轉換 List 成 Map
—
val users = listOf(
User(1, "Michal"),
User(2, "Mark"),
User(3, "Mark"),
)
val byId: Map<Int, User> = users.associateBy { it.id }
println(byId)
!" { 1=User(id=1, name=Michal),
!" 2=User(id=2, name=Mark),
!" 3=User(id=3, name=Mark) }
val byName: Map<String, User> = users.associateBy { it.name }
println(byName)
!" { Michal=User(id=1, name=Michal),
!" Mark=User(id=3, name=Mark) }
?注意 Key 要唯?，不
然會被覆蓋

再從 Map 轉回 List
—
val originalList = byId.values
println(originalList)
!" [ User(id=1, name=Michal),
!" User(id=2, name=Mark),
!" User(id=3, name=Mark) ]

實務範例 pt.1
—
class ConfigurationsRepository(
configurations: List<Configuration>,
) {
private val configurations: Map<String, Configuration> =
configurations.associateBy { it.name }
fun getByName(name: String) = configurations[name]
}

實務範例 pt.2
—
class NetworkUserRepo(
private val userService: UserService,
) : UserRepo {
private var users: Map<Int, User>? = null
suspend fun loadUsers() {
users = userService.getUsers().associateBy { it.id }
}
override fun getUser(id: Int): User? = users!&get(id)
}

使?時機
—
? 很常需要從 Collection 裡找東?時
? ?般來說後端 (Server-Side) 比較常?，前端 (Mobile) 比較少?

主題 53
考慮使? groupingBy 來取代 groupBy

模擬情境
—
? 在?個使?者清單裡，依照其所在城市來計算?數
? 在?個球員清單裡，依照其團隊來計算總分
? 在?个选项清单裡，依照其分类来找出最好选择

情境實作
—
val usersCount: Map<City, Int> =
users.groupBy { it.city }
.mapValues { (_, users) !" users.size }
val pointsPerTeam: Map<Team, Int> =
players.groupBy { it.team }
.mapValues { (_, players) !"
players.sumOf { it.points }
}
val bestFormatPerQuality: Map<Quality, Resolution> =
formats.groupBy { it.quality }
.mapValues { (_, list) !"
list.maxOfOrNull { it.resolution }!'
}

groupBy 和 groupingBy
—
? 使? groupBy()
- 中間多了幾層操作
- ?便、好懂
? 使? groupingBy()
- 去掉中間的操作，但要搭配 eachCount()、fold()、
reduce()、aggregate() 使?
- 不好理解

圖解 groupingBy() + eachCount()
—
"
"
"
#
# ? ? ?
#
# ?
? ? " "
"
groupingBy()
eachCount()
= 2,
# ? = 3, " = 3

情境實作 pt.1
—
val usersCount: Map<City, Int> =
users.groupingBy { it.city }
.eachCount()

情境實作 pt.2-1
—
val pointsPerTeam: Map<Team, Int> =
players.groupingBy { it.team }
.fold(0) { accumulator, player !"
accumulator + player.points
}

情境實作 pt.2-2
—
val pointsPerTeamWithExtension: Map<Team, Int> =
players.groupingBy { it.team }
.eachSumBy{ it.points }
!( Extension Function
fun <T, K> Grouping<T, K>.eachSumBy(
selector: (T) !) Int
): Map<K, Int> = this.fold(0) { accumulator, element !"
accumulator + selector(element)
}

主題 54
限制操作的數量

什麼時候會有效能考量？
—
? 使? Collection 的時候
- 對元素有額外的迭代
- 在迭代時有額外的 Collection 被建立
? 使? Sequence 的時候
- 有額外的物件包裏整個 Sequence 時
- 建立 Lambda 表達式時

總之就是
愈少操作愈好
— !" Works
fun List<User>.getNames(): List<String> =
this.map { it.name }
.filter { it !* null }
.map { it!' }
!" Better
this.map { it.name }
.filterNotNull()
!" Best
this.mapNotNull { it.name }

要怎麼才能辦到？
—
? 依賴 IDE 提?
? 把表格背起來

主題 55
(有效能考量時) 使? Primitive Array 來操作資料

Primitive Type 的優勢
—
? 輕量 (多?層物件就多?層重量)
? 快 (透過 accessor 取值會有額外的成本)

Kotlin 的天性
—
? 在底層封裝了 Primitive Type (?動最佳化)
? Collection 裡放的是 Generic
? 但還是有?法可以? Primitive Type…

差異比較 pt.1
—
val ints: List<Int> = List(1_000_000) { it }
val array: Array<Int> = ints.toTypedArray()
val intArray: IntArray = ints.toIntArray()
println(getObjectSize(ints)) !" 20 000 040
println(getObjectSize(array)) !" 20 000 016
println(getObjectSize(intArray)) !" 4 000 016

差異比較 pt.2
—
open class PrimitiveArrayBenchmark {
lateinit var list: List<Int>
lateinit var array: IntArray
@Setup
fun init() {
list = List(1_000_000) { it }
array = IntArray (1_000_000) { it }
}
@Benchmark
!" On average 1 260 593 ns
fun averageOnIntList(): Double {
return list.average()
}
@Benchmark
!" On average 868 509 ns
fun averageOnIntArray(): Double {
return array.average()
}
}

後話
—
? 對比於 List，使? Primitive 在?多數的情境下並沒有?夠明顯
的效能優勢
? ?使? List 是更為直覺、?便的選擇
? 總之，應優先使? List，有效能考量時再改? Primitive

主題 56
考慮使? Mutable Collection

為什麼要? Mutable Collection？
—
? 使? Immutable 是為了安全，給我們更多控制
? 使? Mutable 是為了效能
? 取個平衡：
- 只在 Local Processing 的時候? Mutable Collection
- 像是在寫 Utils 的時候，常常修改 Collection 內容時

底層實作
—
public operator fun <T> Iterable<T>.plus(
elements: Array<out T>
): List<T> {
if (this is Collection) return this.plus(elements)
val result = ArrayList<T>()
result.addAll(this)
result.addAll(elements)
return result
}

Kotlin Collection
全?位解析攻略
—
collection.kotlin.tips
? 詳解 200+ Collection ?法
? 解析標準函式庫原始碼
? 實務範例
? 免費下載速查地圖

內容三?組成
—
技法 ?法實戰
快速充實腦中的
Collection ?具箱
追溯 Collection 原始碼
了解其語法設計與應?
透過情境解題
將 Collection 應?於?常任務

與本章有關的章節
—
? 主題 51：傾向使? Sequences 來取代 Collection
- 2-3-3 提升處理?量資料時的效能 (p.2-78)
- 4-4 關於效能與效率的反思 (p.4-6)
? 主題 52：考慮使? Map 来储存关联的元素
- 1-9-6 Associate 系列?法 (p.1-164)

—
? 主題 53：考慮使? groupingBy 來取代 groupBy
- 1-10-5 With-grouping 系列?法 (p.1-199)
- 3-2 資料統計運算 (p.3-18)
? 主題 54：限制操作的數量
- 4-3 條條?路通羅? (p.4-3)

—
? 主題 55：(有效能考量時) 使? Primitive Array 來操作資料
- 1-11-1 toArray 系列?法 (p.1-203)
? 主題 56：考慮使? Mutable Collection
-

關注粉絲?及頻道
—
Coding 職?塾
Kraftsman

狠狠撸

[Effective Kotlin 讀書會] 第八章 Efficient collection processing 導讀

More Related Content

[Effective Kotlin 讀書會] 第八章 Efficient collection processing 導讀