A flag aligned word (FAW) based integers or inverted index compression technique. FAW does not require extra memory space to store the encoded data. It stores encoded words within the same memory that holds input integers.
1 of 11
More Related Content
Flag aligned word (faw) based compression
1. Flag Aligned Word (FAW)-based Integers Compression
Presented by
Md Mostofa Kamal Rasel
2. Outline
Introduction
Related Works
Motivation
Problem Statement
Proposed Technique
Mostofa Kamal Rasel, and Young-Koo Lee, "An efficient subgraphs compression-based technique for reducing the I/O cost of join-based graph mining
algorithms, 7th International Conference on Emerging Databases Technologies, Applications, and Theory (EDB), Busan, South Korea, 2017.
Mostofa Kamal Rasel and Young-Koo Lee, On-the-fly Output Compression for Join-based Graph Mining Algorithms , IEEE Access, IEEE (SCIE, IF: 4.098).
3. Introduction
Sources of big graphs are very common
now-a-days
Social networks
Web graphs
Biological/ecological networks
Big graphs are represented using bitmap,
adjacency list or edge list
Requires enormous space to store
RDBMS also uses bitmap or inverted list to
represent indices, which are large
Need to Compress these large data for
Reducing memory footprint
Efficient computation
4. Related Works
Word Aligned Hybrid (WAH) Compression [1]
Represents a list of integers in bitmaps
Divides bitmap into 31-bits groups
Encode the groups into 0-Fill, 1-Fill, and literal words
Limitation
Sometimes size of encoded data exceeds originals !!
Run-length based compression, so random access not possible
Roaring Bitmap Compression [2]
Divides a list of integers into a range of 216
Each group is identified by most significant 16-bits
Remove the most significant 16-bits from the member of each group
Two type of groups
Array container: cardinality <= 2048
Represents using 16-bit integers
Bitmap container: cardinality > 2048
Represents using 216
bits long bitmap
Limitation
Data type are not same before and after compression
1. Kesheng Wu, E. J. Otoo and A. Shoshani, "Compressing bitmap indexes for faster search operations," Proceedings 14th International Conference on Scientific and Statistical Database
Management, Edinburgh, Scotland, UK, 2002.
2. S. Chambi, D. Lemire, O. Kaser, and R. Godin, Better bitmap performance with roaring bitmaps, Software: Practice and Experience, vol. 46, pp. 709719, 2016.
5. Motivation
Reuse the memory of input data to store encoded data
Allocated memory
to holds array of
integers
Allocated
memory to
holds encoded
data
Traditional Encoding
Algorithm
New memory allocation for holding
encoded data comes with cost
Allocated memory
to holds array of
integers
Encoding
Algorithm
Put an encoded word in-place of an input word,
which is already considered for encoding
Access related encoded data only
= [37, 45, 46, 65, 112, 117, 201, 207, 211, 225, 228] O( )
37, 45, 46, 65, 112, 117, 201, 207, 211, 225, 228
Encoded Word, 0 Encoded Word, 1 Encoded Word, 2 Encoded Word, 3
Binary search to search an item
O( 4)
6. Problem Statement
Encode a list of integers, so that
Final output requires less space into memory
Never exceeds the size of original data
Does not require additional memory space to store the output
Allows random access on encoded data
Allows partial decoding
7. Proposed Technique
S-word: represents a single integer
M-word: represents multiple integers, where each high bit represents a single integer
R-word: represents the run of consecutive integers,
where ≠ > word length
FM-word: represents a single integer and indicates that next word is a M-word.
First bit high is set to 1
FR-word: represents a single integer and indicates that next word is a R-word.
First two bits are set to 1
37, 45, 46, 65, 112, 117, 201, 207, 211, 225, 228
Encoded Word, 0 Encoded Word, 1 Encoded Word, 2 Encoded Word, 3
Encode multiple integers into a single word
Introduce a flag word in front of an encoded word
Use an integer (from data) as a flag word to avoid overflow
How presentation will benefit audience: Adult learners are more interested in a subject if they know how or why it is important to them.
Presenters level of expertise in the subject: Briefly state your credentials in this area, or explain why participants should listen to you.