The document discusses Ext4 journaling and the write barrier feature. It notes that the write barrier forces a flush-to-disk call after writing the journal to ensure consistency. However, this can cause sluggishness when storage is full during OTA updates. Disabling the write barrier allows reordering of cache-to-disk writes, reducing latency and improving performance, though it introduces a small risk of filesystem corruption in the event of a power failure. Tests showed disabling the barrier reduced fsync latency and improved SQLite transactions per second on HDD and EMMC storage.
2. Current Ext4 Journaling
On File System (FS) failure, disk
contents can be corrupted
Journals keep data consistent
during failure by writing data
twice
Write (Journal)-> commit -> write
Failure will cause data to either
exist consistently or not at all
Ordered Mode only journals
metadata, but ensures data is
written to disk first
3. Current Ext4 Journaling cont
Sometimes, we do not need to journal the data, only the metadata:
ie. data corruption is OK, breaking the directory tree is not OK
Ordered Mode is default,
reduces the amount of
double writing, but allows
data corruption.
Data mode is very slow
Unordered mode exists, but
is much more dangerous
4. Current Ext4 Journaling cont
Fsync system call explicitly flushes OS
in-memory files to disk through Ext4s
journaling mechanism
Write barriers then forces a flush-to-disk
call after journal is sent to disk
This ensures the journal is on
non-volatile disk area (instead of volatile
disk caches)
5. PROBLEM
After OTA, SSHD NAND cache is filled with OTA data
Dex2oat does ahead-of-time compilation for Android apps
Dex2oat calls fdatasync (similar to fsync) at regular intervals,
causing disk flushes
Since NAND is full, every fsync causes all dirty data on SSHD Cache
(upto 64MB) to be flushed to platter
Fsync therefor causes a synchronous IO block, preempting any other
disk reads and writes
Causes huge amount of sluggishness at user experience side
6. Disabling write barrier
Allows disk to reorder cache-to-disk writes
Does not block disk reads while writes are queued to disk
Risks:
On power failure we can not longer ensure journal is consistent, as volatile cache
will be lost
Since only metadata is journaled, we can potentially introduce filesystem
corruption
However
Filesystem metadata is rarely written to compared to data
Disk drive uses a timeout system for cache-to-disk writes
Power failures are uncommon as a set top box device
7. Dex2oat Fsync latency
HDD mounted with barrier
300ms latency
HDD mounted with
nobarrier
105ms latency
8. Androbench SQLite
HDD mounted with barrier
Transactions Per Second (TPS)
HDD mounted with nobarrier
Transactions Per Second (TPS)
9. SQLite Fsync latency
EMMC mounted with barrier
860us latency
EMMC mounted with
nobarrier
361us latency
10. Androbench SQLite
EMMC mounted with barrier
Transactions Per Second (TPS)
EMMC mounted with nobarrier
Transactions Per Second (TPS)