ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Address?generation?unit?for?multimedia?
                  applications
     on?application?specific?instruction?set?
                   processors
?Marc?Moreno?Berengue,??Guillermo?Talavera?Velilla,?Aitor?Rodriguez?Alsina,??
                             Jordi?Carrabina
                  Universitat?Aut¨°noma?de?Barcelona?(Spain)




                            IECON?2010
                  7¨C10?November?¨C?Phoenix,?AZ,?USA
Motivation

?   Design?a?custom?Address?Generation?Unit?(AGU)
       ?   Connected?to?an?ASIP?data?path


?   Benefits?of?custom?AGU?design
       ?   Previous?software?optimizations.
       ?   Multimedia?applications



                                               2
Structure
?   Introduction
?   Design
?   Work?Flow
?   Results
?   Conclusions


                               3
?   Introduction
?   Design
?   Work?Flow
?   Results
?   Conclusions
Multimedia?applications?features
?   Multimedia?applications
        ?   Complex?index?manipulation
        ?   Large?number??of?data?access
?   Require
        ?   High?performance?
        ?   Low?energy?consumption


    It?is?crucial?reduce?these?data?accesses?and?related?address?
    computations?in?an?effective?way
                                                            5
SW?optimizations
Data?Transfer?and?Storage?Exploration?(DTSE)*?methodology?
has?oriented?to:
 ?   Reduce?data?transfers?between?memories?and?processor
 ?   Improve?the?energy?efficiency
 ?   Reduce?the?execution?time


     SW?transformations?create?high?overhead?in?the?address?
     generation?and?control?flow

                      *Methodology?developed?at?IMEC?research?center
                                                              6
SW?optimizations
                             ...

                             for (y=0; y<=M+2; ++y){
...
                              for (x=0; x<=N+2; ++x) {
for (x=1; x<=N-2; ++x)
                                 if (x>=0&&x<N &&y>=1&&y<=M-2)
 for (y=1; y<=N-2; ++y)
                                   D[x%3] = B[(y*N+x)%8704+
  for (k=-1; k<=1; ++k){
A[x][y] += B[x+k][y]                (y*N+x)/8704*16384+7680] ;
        *C[abs(k)];
                                 if (x-1>=1&&x-1<=N-2
    A[x][y] /=tot;
                                              &&y>=1&&y<=M-2) {
}
                                    for (k=-1; k<=1; ++k)
...
                                      acc += D[(x-1+k)%3]*C[abs(k)];
                             }

                                   acc /= tot;}

                             }

                             ...
                                                              7
SW?optimizations
                             ...

                             for (y=0; y<=M+2; ++y){
...
                               for (x=0; x<=N+2; ++x) {
for (x=1; x<=N-2; ++x)
                                 if (x>=0&&x<N &&y>=1&&y<=M-2)
 for (y=1; y<=N-2; ++y)
                                   D[x%3] = B[(y*N+x)%8704+
  for (k=-1; k<=1; ++k){
A[x][y] += B[x+k][y]                (y*N+x)/8704*16384+7680] ;
        *C[abs(k)];
                                 if (x-1>=1&&x-1<=N-2
    A[x][y] /=tot;
                                              &&y>=1&&y<=M-2) {
}
                                    for (k=-1; k<=1; ++k)
...     Need?to?be?optimized          acc += D[(x-1+k)%3]*C[abs(k)];
                             }

                                   acc /= tot;}

                             }

                             ...
                                                              8
Address?Generation?Unit
 The?Address?Generation?Unit?(AGU)?is?a?co?processor?which?use?
 the?address?equation?(AE)?to?generate?the?address?sequence?(AS).


                             &X[AE]=AS?


 Example:
 B[(y*N+x)%8704+(y*N+x)/8704*16384+7680]
 AE?=?(y*N+x)?%?8704?+?(y*N+x)?/?8704*16384+7680
???AS?=?7680,7681,7682,7683,?...
                                                           9
?   Introduction

?   Design
?   Work?Flow
?   Results
?   Conclusions
Application?specific?instruction?set?
             processor
Application?specific?instruction?set?processor?(ASIP)?
     ?   Extend?its?instruction?set
     ?   Fast?interface?for?read/write?data?from/to?specific?
           hardware
              ?   1?Instruction
              ?   1?Cycle


                                                                11
AGU?design

?   AGU?attached?to?the?ASIP?data?path?save?execution?time
        ¡ñ   1?instruction
        ¡ñ   1?cycle




                                                             12
AGU?skeleton
The?AGU?has?one?control?unit,?
one?process?unit?and?one?FIFO
                                 Custom Instruction interface


                                                         CI unit

                                          Change AE values


                                           Read AS values




                                                         CO unit




                                            AS generation




                                                            13
AGU?skeleton
The?AGU?has?one?control?unit,?
one?process?unit?and?one?FIFO
                                         Custom Instruction interface


  ?   CI?(custom?instruction)?unit                               CI unit

                                                  Change AE values
      ?   AE?configuration?&?read?FIFO
                                                   Read AS values




                                                                 CO unit




                                                    AS generation




                                                                    14
AGU?skeleton
The?AGU?has?one?control?unit,?
one?process?unit?and?one?FIFO
                                              Custom Instruction interface


  ?   CI?(custom?instruction)?unit                                    CI unit

                                                       Change AE values
      ?   AE?configuration?&?read?FIFO
                                                        Read AS values



  ?   ?CO?(co?processador)?unit                                       CO unit

      ?   Calculate?the?AE?to?generate?the?
          AS??and?store?all?values?in?the?               AS generation

          FIFO

                                                                         15
AGU?Creator




Web based application
                        16
?   Introduction
?   Design

?   Work?Flow
?   Results
?   Conclusions
Work?Flow




            18
Work?Flow
      Init.c                     Opt.c                      CI_code.c
int A[70],B[70],C=0;       int A[7],B[7],C=0;         int A[7],B[7],C=0,ix,x;

...                        ...                        initAGU(); initAGU2();

for (i=7; i<70; i++)       for (i=7; i<70; i++)       ...

{                          {                          for (i=7; i<70; i++)

B[i]=A[i-7]+B[i-7];        B[i%7]=A[(i-7)%7]          {

A[i]=i;           SW Opt.        +B[(i-7)%7];         x=readAGU();

C+=B[i];          (DTSE) A[i%7]=i;                    ix=readAGU2();

}                          C+=B[i%7];                 B[x]=A[ix]+B[ix];

...                        }                   AGUs   A[x]=i;
                           ...                        C+=B[x];

                                                      }

                                                      ...              19
?   Introduction
?   Design
?   Work?Flow

?   Results
?   Conclusions
Test?environment?
?   NIOS?II?soft?core?processor?(Altera)
    ¡ñ   32?bits?RISC?processor
    ¡ñ   Harvard?memory?architecture
    ¡ñ   Data/Instructions?cache?
    ¡ñ   256?Custom?Instructions?(Fast?data?path?interface)


?   Cyclone?II?EP2C35?Altera?FPGA




                                                             21
Test?Applications

?   Cavity?Detector
    Medical?imaging?application?to?detect?cavities?on?tomography?scans


?   Quad?tree?Structured?Difference?Pulse?Code?Modulation?
    (QSDPCM)
    An?inter?frame?compression?technique?for?video?imaging.




                                                                         22
Speedup
      Speedup ( Cavity )                   Speedup ( QSDPCM )
1.4                                  1.4

1.2                                  1.2

 1                                    1

0.8                                  0.8

0.6                                  0.6

0.4                                  0.4

0.2                                  0.2

 0                                    0
      DTSE
       Init      AGU inclusion
                  HW AGU inclusion         DTSE
                                            Init     AGU inclusion
                                                      HW AGU inclusion



      Speedup: 1.26                        Speedup: 1.19

                                                                         23
Energy?improvements?
        Energy ( Cavity )                    Energy ( QSDPCM )
 1                                      1


0.8                                    0.8


0.6                                    0.6


0.4                                    0.4


0.2                                    0.2


 0                                      0
      DTSE
        Init       AGU inclusion
                    HW AGU inclusion         DTSE
                                              Init     AGU inclusion
                                                        HW AGU inclusion




Energy reduction: 27%                  Energy reduction: 21%

                                                                           24
Area?penalties

                     Cavity (LEs)   QSPCM (LEs)

NIOS-F                  2644            2644

NIOS-F +AGU             3596            3592




  The?AGU?inclusion?in?the?NIOS?II?architecture?use
     2.9%?of?total?FPGA?resources?(33216LEs)


                                                      25
?   Introduction
?   Design
?   Work?Flow
?   Results

?   Conclusions
Conclusions
?   Extend?an?ASIP?by?AGUs?is?an?efficient?way?to?meet?the?
    performance/energy?requirements?of?multimedia?applications?
    after?some?SW?optimizations

?   The?innovation?of?connecting?the?AGU?in?the?processor?data?
    path?and?working?in?parallel?with?the?main?processor?allow?
    calculate?a?wide?range?of?values?before?the?processor?needs?them

?   Use?an?AGU?skeleton?and?a?wizard?decrease?the?design?and?
    implementation?time.


                                                               27
Future?Work
?   Improve?the?AGU?wizard?in?order?to:

    ¡ñ   Detect?automatically?AEs??and?show?relevant?informations?
        about?each?AE?for?a?given?C?file.
    ¡ñ   Generate?the?appropriate?AGU?for?a?specific?set?of?AEs
    ¡ñ   Generate?AGUs?for?more?than?one?ASIP


?   Extend?the?set?of?applications?have?been?used?in?this?work


                                                                 28
Thank?you!!


Questions?

More Related Content

Iecon slides

  • 1. Address?generation?unit?for?multimedia? applications on?application?specific?instruction?set? processors ?Marc?Moreno?Berengue,??Guillermo?Talavera?Velilla,?Aitor?Rodriguez?Alsina,?? Jordi?Carrabina Universitat?Aut¨°noma?de?Barcelona?(Spain) IECON?2010 7¨C10?November?¨C?Phoenix,?AZ,?USA
  • 2. Motivation ? Design?a?custom?Address?Generation?Unit?(AGU) ? Connected?to?an?ASIP?data?path ? Benefits?of?custom?AGU?design ? Previous?software?optimizations. ? Multimedia?applications 2
  • 3. Structure ? Introduction ? Design ? Work?Flow ? Results ? Conclusions 3
  • 4. ? Introduction ? Design ? Work?Flow ? Results ? Conclusions
  • 5. Multimedia?applications?features ? Multimedia?applications ? Complex?index?manipulation ? Large?number??of?data?access ? Require ? High?performance? ? Low?energy?consumption It?is?crucial?reduce?these?data?accesses?and?related?address? computations?in?an?effective?way 5
  • 6. SW?optimizations Data?Transfer?and?Storage?Exploration?(DTSE)*?methodology? has?oriented?to: ? Reduce?data?transfers?between?memories?and?processor ? Improve?the?energy?efficiency ? Reduce?the?execution?time SW?transformations?create?high?overhead?in?the?address? generation?and?control?flow *Methodology?developed?at?IMEC?research?center 6
  • 7. SW?optimizations ... for (y=0; y<=M+2; ++y){ ... for (x=0; x<=N+2; ++x) { for (x=1; x<=N-2; ++x) if (x>=0&&x<N &&y>=1&&y<=M-2) for (y=1; y<=N-2; ++y) D[x%3] = B[(y*N+x)%8704+ for (k=-1; k<=1; ++k){ A[x][y] += B[x+k][y] (y*N+x)/8704*16384+7680] ; *C[abs(k)]; if (x-1>=1&&x-1<=N-2 A[x][y] /=tot; &&y>=1&&y<=M-2) { } for (k=-1; k<=1; ++k) ... acc += D[(x-1+k)%3]*C[abs(k)]; } acc /= tot;} } ... 7
  • 8. SW?optimizations ... for (y=0; y<=M+2; ++y){ ... for (x=0; x<=N+2; ++x) { for (x=1; x<=N-2; ++x) if (x>=0&&x<N &&y>=1&&y<=M-2) for (y=1; y<=N-2; ++y) D[x%3] = B[(y*N+x)%8704+ for (k=-1; k<=1; ++k){ A[x][y] += B[x+k][y] (y*N+x)/8704*16384+7680] ; *C[abs(k)]; if (x-1>=1&&x-1<=N-2 A[x][y] /=tot; &&y>=1&&y<=M-2) { } for (k=-1; k<=1; ++k) ... Need?to?be?optimized acc += D[(x-1+k)%3]*C[abs(k)]; } acc /= tot;} } ... 8
  • 9. Address?Generation?Unit The?Address?Generation?Unit?(AGU)?is?a?co?processor?which?use? the?address?equation?(AE)?to?generate?the?address?sequence?(AS). &X[AE]=AS? Example: B[(y*N+x)%8704+(y*N+x)/8704*16384+7680] AE?=?(y*N+x)?%?8704?+?(y*N+x)?/?8704*16384+7680 ???AS?=?7680,7681,7682,7683,?... 9
  • 10. ? Introduction ? Design ? Work?Flow ? Results ? Conclusions
  • 11. Application?specific?instruction?set? processor Application?specific?instruction?set?processor?(ASIP)? ? Extend?its?instruction?set ? Fast?interface?for?read/write?data?from/to?specific? hardware ? 1?Instruction ? 1?Cycle 11
  • 12. AGU?design ? AGU?attached?to?the?ASIP?data?path?save?execution?time ¡ñ 1?instruction ¡ñ 1?cycle 12
  • 13. AGU?skeleton The?AGU?has?one?control?unit,? one?process?unit?and?one?FIFO Custom Instruction interface CI unit Change AE values Read AS values CO unit AS generation 13
  • 14. AGU?skeleton The?AGU?has?one?control?unit,? one?process?unit?and?one?FIFO Custom Instruction interface ? CI?(custom?instruction)?unit CI unit Change AE values ? AE?configuration?&?read?FIFO Read AS values CO unit AS generation 14
  • 15. AGU?skeleton The?AGU?has?one?control?unit,? one?process?unit?and?one?FIFO Custom Instruction interface ? CI?(custom?instruction)?unit CI unit Change AE values ? AE?configuration?&?read?FIFO Read AS values ? ?CO?(co?processador)?unit CO unit ? Calculate?the?AE?to?generate?the? AS??and?store?all?values?in?the? AS generation FIFO 15
  • 17. ? Introduction ? Design ? Work?Flow ? Results ? Conclusions
  • 18. Work?Flow 18
  • 19. Work?Flow Init.c Opt.c CI_code.c int A[70],B[70],C=0; int A[7],B[7],C=0; int A[7],B[7],C=0,ix,x; ... ... initAGU(); initAGU2(); for (i=7; i<70; i++) for (i=7; i<70; i++) ... { { for (i=7; i<70; i++) B[i]=A[i-7]+B[i-7]; B[i%7]=A[(i-7)%7] { A[i]=i; SW Opt. +B[(i-7)%7]; x=readAGU(); C+=B[i]; (DTSE) A[i%7]=i; ix=readAGU2(); } C+=B[i%7]; B[x]=A[ix]+B[ix]; ... } AGUs A[x]=i; ... C+=B[x]; } ... 19
  • 20. ? Introduction ? Design ? Work?Flow ? Results ? Conclusions
  • 21. Test?environment? ? NIOS?II?soft?core?processor?(Altera) ¡ñ 32?bits?RISC?processor ¡ñ Harvard?memory?architecture ¡ñ Data/Instructions?cache? ¡ñ 256?Custom?Instructions?(Fast?data?path?interface) ? Cyclone?II?EP2C35?Altera?FPGA 21
  • 22. Test?Applications ? Cavity?Detector Medical?imaging?application?to?detect?cavities?on?tomography?scans ? Quad?tree?Structured?Difference?Pulse?Code?Modulation? (QSDPCM) An?inter?frame?compression?technique?for?video?imaging. 22
  • 23. Speedup Speedup ( Cavity ) Speedup ( QSDPCM ) 1.4 1.4 1.2 1.2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 DTSE Init AGU inclusion HW AGU inclusion DTSE Init AGU inclusion HW AGU inclusion Speedup: 1.26 Speedup: 1.19 23
  • 24. Energy?improvements? Energy ( Cavity ) Energy ( QSDPCM ) 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 DTSE Init AGU inclusion HW AGU inclusion DTSE Init AGU inclusion HW AGU inclusion Energy reduction: 27% Energy reduction: 21% 24
  • 25. Area?penalties Cavity (LEs) QSPCM (LEs) NIOS-F 2644 2644 NIOS-F +AGU 3596 3592 The?AGU?inclusion?in?the?NIOS?II?architecture?use 2.9%?of?total?FPGA?resources?(33216LEs) 25
  • 26. ? Introduction ? Design ? Work?Flow ? Results ? Conclusions
  • 27. Conclusions ? Extend?an?ASIP?by?AGUs?is?an?efficient?way?to?meet?the? performance/energy?requirements?of?multimedia?applications? after?some?SW?optimizations ? The?innovation?of?connecting?the?AGU?in?the?processor?data? path?and?working?in?parallel?with?the?main?processor?allow? calculate?a?wide?range?of?values?before?the?processor?needs?them ? Use?an?AGU?skeleton?and?a?wizard?decrease?the?design?and? implementation?time. 27
  • 28. Future?Work ? Improve?the?AGU?wizard?in?order?to: ¡ñ Detect?automatically?AEs??and?show?relevant?informations? about?each?AE?for?a?given?C?file. ¡ñ Generate?the?appropriate?AGU?for?a?specific?set?of?AEs ¡ñ Generate?AGUs?for?more?than?one?ASIP ? Extend?the?set?of?applications?have?been?used?in?this?work 28