ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
1


      Data Compression for Large
      Multidimensional Data
      Warehouses



                     Supervisor:                                          Presented by:
     Dr. K.M. Azharul Hasan                                 Abdullah Al Mahmud,
            Associate Professor,                                    Roll : 0507006
       Head of the Department,                            Md. Mushfiqur Rahman,
      Department of CSE, KUET                                       Roll : 0507029

This slide is prepared by Muhammad Mushfiqur Rahman & Abdullah Al Mahmud for the presentation of Thesis
2


Presentation Layout

 Objectives
 Existing Compression Schemes
 Traditional Extendible Array
 Proposed Compression Scheme
 EXCS
 (Extendible Array Based Compression Scheme)
Comparative Analysis
Conclusion

   Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
3



Objectives
Data compression technology reduces:
  effective price of logical data storage capacity
 improves query performance

 Multidimensional array is widely used in large
 number of scientific research.
 An efficient compression of multidimensional
 array can handle large multidimensional data
 sets of data warehouses

    Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
4



Existing Compression Schemes                                       (1/ 3)

    Bitmap compression
    Run Length Encoding
    Header compression
    Compressed Column Storage
    Compressed Row Storage




  Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
5



Existing Compression Schemes                                       (2/ 3)




      (a) A sparse array.            (b) The CRS scheme




  Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
6


Existing Compression Schemes                                        (3/ 3)

  Classical methods cannot support updates
   without completely readjusting runs .

  Compressing sparse array

   Do not support extendibility




   Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
7


  Traditional Extendible Array
                                              History
                                              Table
                                                        0    1    3     5
 TEA supports
  dynamic extension                           Address
                                              Table
                                                        0    1    4     9
  of dimension size.
                                          0       0     0    1    4     9

   Position <1,3>                         2       2     2    3    5     10

   H1[1]<H2[3]                            4       6     6    7    8     11


Address of                                      History Counter= 0
                                                                 4
                                                                 2
                                                                 3
                                                                 5
                                                                 1

Cell=Address1[3]+1=10
                                     Figure 1: TEA Construction And Access

     Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
8


Proposed Compression Scheme
Multidimensional arrays are important for
 sparse array operations

Extendibility of multidimensional arrays

 A compression technique that can work on
 multidimensional extendible array

 Our proposed compression scheme is EXCS
 (Extendible array based Compression
 Scheme)
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
9


Extendible array based
Compression Scheme (EXCS)                                                 1/3

 We implemented the multidimensional
  extendible array in secondary memory

 We have considered dimension =3 in our
  experimental approach

 The sub-arrays are distinguished to store
  them individually in the secondary memory

  Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
10


Extendible array based
Compression Scheme (EXCS)                                                 2/3

 The sub-arrays are of n-1(=2) dimension

 A large no. of sub-arrays are generated to be
  compressed

 Sub-arrays are dynamically taken as input

 Only the max no of sub-arrays is to be given
  Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
11


Extendible array based
Compression Scheme (EXCS)                                                  3/3

 Each sub-array is compressed individually

 The compression technique used is similar to
  CRS

 The compressed elements are written in the
  secondary memory as RO, CO, VL of
  subarray_1, subarray_2, … … subarray_N
   Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
12


Performance Measurement
Performance is measured by measuring two
 key factors of the compression schemes:
  Data Density
  Length of Dimension/ Number of Data

 compression ratio=
    (compressed data/ original data)
 space savings = 1 – compression ratio

 we have considered space savings in percent
   Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
13


                 Comparative Analysis                                                     (1/4)
                100



                80



                60
Space savings




                                                                                              Header
                40
                                                                                              Bitmap
                                                                                              CRS
                                                                                              EACRS
                20
                                                                                              Offset


                 0
                        64            729           4096          15625         46656


                -20



                -40
                                                   No. of data
                                Figure: Comparison with fixed density = 20%
                      Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
14


                      Comparative Analysis                                                    (2/4)
                80




                60




                40
Space savings




                                                                                                Header
                                                                                                Bitmap
                20                                                                              CRS
                                                                                                EACRS
                                                                                                Offset

                 0
                          64           729           4096          15625         46656



                -20




                -40
                                                 No. of data
                               Figure: Comparison with fixed density = 25%
                      Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
15


Comparative Analysis                                                               (3/4)
                     100



                     80



                     60
 compression ratio




                     40
                                                                                   Header

                                                                                   Bitmap
                     20

                                                                                   CRS

                      0
                                                                                   EACRS
                           10        20             30      40       50

                                                                                   Offset
                     -20



                     -40



                     -60
                                          Density of data
                           Figure: Comparison with fixed no. of data=64
           Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
16


Comparative Analysis 100
                                                                                        (4/4)

                     80



                     60
 compression ratio




                     40
                                                                                             Header
                                                                                             Bitmap
                     20                                                                      CRS
                                                                                             EACRS
                                                                                             Offset
                      0
                             10          20            30        40          50


                     -20



                     -40



                     -60
                                              Density of data
                           Figure: Comparison with fixed no. of data=4096
                     Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
17



Performance Measurement

 Extendibility of arrays
 Using multidimensional arrays
 Extendibility toward any dimension
EXCS allows dynamic extension of arrays.
In analysis, we can extend data up to n
 dimensions
 Performance is good for large no. of data


    Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
18



Conclusion
 Our proposed compression scheme is
 experimentally done up to 3 dimension data

 It can be extended experimentally for
 compressing n dimension data in future.

EXCS is effective for large multidimensional
 data warehouses


   Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

More Related Content

Data Compression for Multi-dimentional Data Warehouses

  • 1. 1 Data Compression for Large Multidimensional Data Warehouses Supervisor: Presented by: Dr. K.M. Azharul Hasan Abdullah Al Mahmud, Associate Professor, Roll : 0507006 Head of the Department, Md. Mushfiqur Rahman, Department of CSE, KUET Roll : 0507029 This slide is prepared by Muhammad Mushfiqur Rahman & Abdullah Al Mahmud for the presentation of Thesis
  • 2. 2 Presentation Layout  Objectives  Existing Compression Schemes  Traditional Extendible Array  Proposed Compression Scheme  EXCS (Extendible Array Based Compression Scheme) Comparative Analysis Conclusion Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 3. 3 Objectives Data compression technology reduces:  effective price of logical data storage capacity improves query performance  Multidimensional array is widely used in large number of scientific research.  An efficient compression of multidimensional array can handle large multidimensional data sets of data warehouses Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 4. 4 Existing Compression Schemes (1/ 3)  Bitmap compression  Run Length Encoding  Header compression  Compressed Column Storage  Compressed Row Storage Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 5. 5 Existing Compression Schemes (2/ 3) (a) A sparse array. (b) The CRS scheme Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 6. 6 Existing Compression Schemes (3/ 3) Classical methods cannot support updates without completely readjusting runs . Compressing sparse array  Do not support extendibility Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 7. 7 Traditional Extendible Array History Table 0 1 3 5  TEA supports dynamic extension Address Table 0 1 4 9 of dimension size. 0 0 0 1 4 9 Position <1,3> 2 2 2 3 5 10 H1[1]<H2[3] 4 6 6 7 8 11 Address of History Counter= 0 4 2 3 5 1 Cell=Address1[3]+1=10 Figure 1: TEA Construction And Access Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 8. 8 Proposed Compression Scheme Multidimensional arrays are important for sparse array operations Extendibility of multidimensional arrays  A compression technique that can work on multidimensional extendible array  Our proposed compression scheme is EXCS (Extendible array based Compression Scheme) Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 9. 9 Extendible array based Compression Scheme (EXCS) 1/3 We implemented the multidimensional extendible array in secondary memory We have considered dimension =3 in our experimental approach The sub-arrays are distinguished to store them individually in the secondary memory Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 10. 10 Extendible array based Compression Scheme (EXCS) 2/3 The sub-arrays are of n-1(=2) dimension A large no. of sub-arrays are generated to be compressed Sub-arrays are dynamically taken as input Only the max no of sub-arrays is to be given Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 11. 11 Extendible array based Compression Scheme (EXCS) 3/3 Each sub-array is compressed individually The compression technique used is similar to CRS The compressed elements are written in the secondary memory as RO, CO, VL of subarray_1, subarray_2, … … subarray_N Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 12. 12 Performance Measurement Performance is measured by measuring two key factors of the compression schemes:  Data Density  Length of Dimension/ Number of Data  compression ratio= (compressed data/ original data)  space savings = 1 – compression ratio  we have considered space savings in percent Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 13. 13 Comparative Analysis (1/4) 100 80 60 Space savings Header 40 Bitmap CRS EACRS 20 Offset 0 64 729 4096 15625 46656 -20 -40 No. of data Figure: Comparison with fixed density = 20% Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 14. 14 Comparative Analysis (2/4) 80 60 40 Space savings Header Bitmap 20 CRS EACRS Offset 0 64 729 4096 15625 46656 -20 -40 No. of data Figure: Comparison with fixed density = 25% Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 15. 15 Comparative Analysis (3/4) 100 80 60 compression ratio 40 Header Bitmap 20 CRS 0 EACRS 10 20 30 40 50 Offset -20 -40 -60 Density of data Figure: Comparison with fixed no. of data=64 Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 16. 16 Comparative Analysis 100 (4/4) 80 60 compression ratio 40 Header Bitmap 20 CRS EACRS Offset 0 10 20 30 40 50 -20 -40 -60 Density of data Figure: Comparison with fixed no. of data=4096 Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 17. 17 Performance Measurement  Extendibility of arrays  Using multidimensional arrays  Extendibility toward any dimension EXCS allows dynamic extension of arrays. In analysis, we can extend data up to n dimensions  Performance is good for large no. of data Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 18. 18 Conclusion  Our proposed compression scheme is experimentally done up to 3 dimension data  It can be extended experimentally for compressing n dimension data in future. EXCS is effective for large multidimensional data warehouses Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh