The document presents a proposed compression scheme called EXCS for multidimensional data warehouses. EXCS uses an extendible array to store multidimensional data and compresses each subarray individually using a technique similar to compressed row storage. Performance is evaluated based on compression ratio and space savings for EXCS compared to other schemes like bitmap, header, offset compression and compressed row storage under varying data densities and dimensional sizes. EXCS achieves higher space savings than other techniques in most cases due to its ability to dynamically compress subarrays of an extendible multidimensional array.
1 of 18
Download to read offline
More Related Content
Data Compression for Multi-dimentional Data Warehouses
1. 1
Data Compression for Large
Multidimensional Data
Warehouses
Supervisor: Presented by:
Dr. K.M. Azharul Hasan Abdullah Al Mahmud,
Associate Professor, Roll : 0507006
Head of the Department, Md. Mushfiqur Rahman,
Department of CSE, KUET Roll : 0507029
This slide is prepared by Muhammad Mushfiqur Rahman & Abdullah Al Mahmud for the presentation of Thesis
3. 3
Objectives
Data compression technology reduces:
 effective price of logical data storage capacity
improves query performance
 Multidimensional array is widely used in large
number of scientific research.
 An efficient compression of multidimensional
array can handle large multidimensional data
sets of data warehouses
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
5. 5
Existing Compression Schemes (2/ 3)
(a) A sparse array. (b) The CRS scheme
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
6. 6
Existing Compression Schemes (3/ 3)
Classical methods cannot support updates
without completely readjusting runs .
Compressing sparse array
 Do not support extendibility
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
7. 7
Traditional Extendible Array
History
Table
0 1 3 5
 TEA supports
dynamic extension Address
Table
0 1 4 9
of dimension size.
0 0 0 1 4 9
Position <1,3> 2 2 2 3 5 10
H1[1]<H2[3] 4 6 6 7 8 11
Address of History Counter= 0
4
2
3
5
1
Cell=Address1[3]+1=10
Figure 1: TEA Construction And Access
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
8. 8
Proposed Compression Scheme
Multidimensional arrays are important for
sparse array operations
Extendibility of multidimensional arrays
 A compression technique that can work on
multidimensional extendible array
 Our proposed compression scheme is EXCS
(Extendible array based Compression
Scheme)
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
9. 9
Extendible array based
Compression Scheme (EXCS) 1/3
We implemented the multidimensional
extendible array in secondary memory
We have considered dimension =3 in our
experimental approach
The sub-arrays are distinguished to store
them individually in the secondary memory
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
10. 10
Extendible array based
Compression Scheme (EXCS) 2/3
The sub-arrays are of n-1(=2) dimension
A large no. of sub-arrays are generated to be
compressed
Sub-arrays are dynamically taken as input
Only the max no of sub-arrays is to be given
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
11. 11
Extendible array based
Compression Scheme (EXCS) 3/3
Each sub-array is compressed individually
The compression technique used is similar to
CRS
The compressed elements are written in the
secondary memory as RO, CO, VL of
subarray_1, subarray_2, … … subarray_N
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
12. 12
Performance Measurement
Performance is measured by measuring two
key factors of the compression schemes:
 Data Density
 Length of Dimension/ Number of Data
 compression ratio=
(compressed data/ original data)
 space savings = 1 – compression ratio
 we have considered space savings in percent
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
13. 13
Comparative Analysis (1/4)
100
80
60
Space savings
Header
40
Bitmap
CRS
EACRS
20
Offset
0
64 729 4096 15625 46656
-20
-40
No. of data
Figure: Comparison with fixed density = 20%
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
14. 14
Comparative Analysis (2/4)
80
60
40
Space savings
Header
Bitmap
20 CRS
EACRS
Offset
0
64 729 4096 15625 46656
-20
-40
No. of data
Figure: Comparison with fixed density = 25%
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
15. 15
Comparative Analysis (3/4)
100
80
60
compression ratio
40
Header
Bitmap
20
CRS
0
EACRS
10 20 30 40 50
Offset
-20
-40
-60
Density of data
Figure: Comparison with fixed no. of data=64
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
16. 16
Comparative Analysis 100
(4/4)
80
60
compression ratio
40
Header
Bitmap
20 CRS
EACRS
Offset
0
10 20 30 40 50
-20
-40
-60
Density of data
Figure: Comparison with fixed no. of data=4096
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
17. 17
Performance Measurement
 Extendibility of arrays
 Using multidimensional arrays
 Extendibility toward any dimension
EXCS allows dynamic extension of arrays.
In analysis, we can extend data up to n
dimensions
 Performance is good for large no. of data
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
18. 18
Conclusion
 Our proposed compression scheme is
experimentally done up to 3 dimension data
 It can be extended experimentally for
compressing n dimension data in future.
EXCS is effective for large multidimensional
data warehouses
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh