際際滷

際際滷Share a Scribd company logo
MATRIX ,TSFGA & DPGA
By Bhavana Sawant &
Pranoti Bachhav
Matrix concepts
 Matrix is designed to maintain flexibility in Instruction control.
 Matrix is based on a uniform array of primitive elements and
interconnect which can serve instruction control and data
functions.
 The key to providing this flexibility is a multilevel configuration
scheme which allows device to control the way it deliver
configuration information.
Matrix architecture
 Matrix microarchitecture is based around an array of identical
8 bit primitive datapath elements overlayed with a confiqurable
network
Tsfpga
Basic functional unit
1) 256x8 memory = function as a Single 256 byte ,dual ported and
128 X 8 bit
In register file mode the memoy supports two reads and one write
operation on each cycle.
2) 8- bit ALU=set of arithmatic and logic functions
3) Control logic=Composed of
1) Local pattern matcher for generating local control from the ALU
output
2)a reduction network for generating local control
3) a 20- input 8-output NOR block which can serve as half of PLA
MATRIX operation
 Matrix operation is pipelined at the BFU level with pipeline register at
each BFU input port.
 Pipeline stage includes:
I. Memory read
II. ALU operation
III. Memory write and local interconnect traversal= two operations proceed
in parallel
BFU role
-I store
-Data memory
-ALU function
Matrix network
 Collection of a 8 bit busses
 Dynamically switch network connections
1.Nearest neighbor Connection= connection between BFU and two
grid squares
2.Length four bypass connection=each BFU support level two
connections
- Which allows corner turns ,local fanout, medium distance
interconnect, data shifting and retiming
3.Global Lines-every row and column supports four interconnects
lines which span the entire row or column.
MATRIX example
 Finite Impulse Response filter
Dynamically Programmable Gate Arrays with Input
Registers
 We must hold the value on the output and tie up switches and wires between the
producer and the consumer until such time as the final consumer has used the
value.
 Switches and wires are forced to sit idle holding values for much longer than the
time. The alternative is to move the value registers to the inputs of the
computational elements.
 These input registers allow us to store values which need to traverse LUT evaluation
levels in memories rather than having them consume active resources during the
period of time which they are being retimed
Input Registers
Having four flip-flops on the input of each 4-LUT rather
than one flip-flop on the output. This modification allows
us to move the data from the producer to consumer in the
minimum transit time -- a time independent of when the
consumer will actually use the data.
Conceptually, the key idea here is that signal transport and
retiming are two different functions:
 Spatial Transport -- moves data in space -- route data
from source to destination
 Temporal Transport (Retiming) -- moves data in time --
make data available at some later time when it is
actually required
TSFPGA
TSFPGA HISTORY
TSFPGA WAS DEVELOPED JOINTLY BY DERRICK CHEN AND
ANDRE DEHON. DERRICK WORKED OUT VLSI
IMPLEMENTATION AND LAYOUT ISSUES, WHILE ANDRE
DEVELOPED THE ARCHITECTURE AND MAPPING TOOLS.
Why TSFPGA?
 If all retiming can be done in input registers, only a single wire is strictly
needed to successfully route the task.
 Extends the temporal range on the inputs without the linear increase in
input retiming size
 The trick we employ here is to have each logical input load its value
from the active interconnect at just the right time
 If we broadcast the current timestep, each input can simply load its
value when its programmed load time matches the current timestep.
Architecture of TSFPGA
Building elements:
 The basic TSFPGA building block is the subarray tile which contains a
collection of LUTs and a central switching crossbar.
 ARRAY ELEMENTS
 CROSSBAR
 SWITCHING ELEMENTS.
Array Element
Array Element
 The TSFPGA array element is made up of a number of LUTs which share
the same crossbar outputs and input.
 The LUT input values are stored in time-switched input registers. The
inputs to the array element are run to all LUT input registers. When the
current timestep matches the programmed load time, the input register
is enabled to load the value on the array-element input.
Crossbar
 Each crossbar input is selected from a collection of subarray network
inputs and subarray LUT outputs via by a pre-crossbar multiplexor.
 Subarray inputs are registered prior to the pre-crossbar multiplexor
and outputs are registered immediately after the crossbar, either on
the LUT inputs or before traversing network wires.
 This pipelining makes the LUT evaluations and crossbar traversal a
single pipeline stage.
 Each registered, crossbar output is routed in several directions to
provide connections to other subarrays or chip I/O.
Crossbar
 The single subarray crossbar performs all major switching roles:
 output crossbar -- routing data from LUT outputs to destinations or
intermediate switching crossbars
 routing crossbar -- routing data through the network between source and
destination subarrays
 input crossbar -- receiving data from the network and routing it to the
appropriate destination LUT input
Intra-Subarray Switching
 Communication within the subarray is simple and takes one clock cycle per LUT
evaluation and interconnect.
 Once a LUT has all of its inputs loaded, the LUT output can be selected as an input to
the crossbar, and the LUT's consumers within the subarray may be selected as
crossbar outputs.
Intra-Subarray Switching
 A number of subarray
outputs are run to
each subarray in the
same row and column.

 THANK YOU

More Related Content

Tsfpga

  • 1. MATRIX ,TSFGA & DPGA By Bhavana Sawant & Pranoti Bachhav
  • 2. Matrix concepts Matrix is designed to maintain flexibility in Instruction control. Matrix is based on a uniform array of primitive elements and interconnect which can serve instruction control and data functions. The key to providing this flexibility is a multilevel configuration scheme which allows device to control the way it deliver configuration information.
  • 3. Matrix architecture Matrix microarchitecture is based around an array of identical 8 bit primitive datapath elements overlayed with a confiqurable network
  • 5. Basic functional unit 1) 256x8 memory = function as a Single 256 byte ,dual ported and 128 X 8 bit In register file mode the memoy supports two reads and one write operation on each cycle. 2) 8- bit ALU=set of arithmatic and logic functions 3) Control logic=Composed of 1) Local pattern matcher for generating local control from the ALU output 2)a reduction network for generating local control 3) a 20- input 8-output NOR block which can serve as half of PLA
  • 6. MATRIX operation Matrix operation is pipelined at the BFU level with pipeline register at each BFU input port. Pipeline stage includes: I. Memory read II. ALU operation III. Memory write and local interconnect traversal= two operations proceed in parallel BFU role -I store -Data memory -ALU function
  • 7. Matrix network Collection of a 8 bit busses Dynamically switch network connections 1.Nearest neighbor Connection= connection between BFU and two grid squares 2.Length four bypass connection=each BFU support level two connections - Which allows corner turns ,local fanout, medium distance interconnect, data shifting and retiming 3.Global Lines-every row and column supports four interconnects lines which span the entire row or column.
  • 8. MATRIX example Finite Impulse Response filter
  • 9. Dynamically Programmable Gate Arrays with Input Registers We must hold the value on the output and tie up switches and wires between the producer and the consumer until such time as the final consumer has used the value. Switches and wires are forced to sit idle holding values for much longer than the time. The alternative is to move the value registers to the inputs of the computational elements. These input registers allow us to store values which need to traverse LUT evaluation levels in memories rather than having them consume active resources during the period of time which they are being retimed
  • 10. Input Registers Having four flip-flops on the input of each 4-LUT rather than one flip-flop on the output. This modification allows us to move the data from the producer to consumer in the minimum transit time -- a time independent of when the consumer will actually use the data. Conceptually, the key idea here is that signal transport and retiming are two different functions: Spatial Transport -- moves data in space -- route data from source to destination Temporal Transport (Retiming) -- moves data in time -- make data available at some later time when it is actually required
  • 11. TSFPGA TSFPGA HISTORY TSFPGA WAS DEVELOPED JOINTLY BY DERRICK CHEN AND ANDRE DEHON. DERRICK WORKED OUT VLSI IMPLEMENTATION AND LAYOUT ISSUES, WHILE ANDRE DEVELOPED THE ARCHITECTURE AND MAPPING TOOLS.
  • 12. Why TSFPGA? If all retiming can be done in input registers, only a single wire is strictly needed to successfully route the task. Extends the temporal range on the inputs without the linear increase in input retiming size The trick we employ here is to have each logical input load its value from the active interconnect at just the right time If we broadcast the current timestep, each input can simply load its value when its programmed load time matches the current timestep.
  • 14. Building elements: The basic TSFPGA building block is the subarray tile which contains a collection of LUTs and a central switching crossbar. ARRAY ELEMENTS CROSSBAR SWITCHING ELEMENTS.
  • 16. Array Element The TSFPGA array element is made up of a number of LUTs which share the same crossbar outputs and input. The LUT input values are stored in time-switched input registers. The inputs to the array element are run to all LUT input registers. When the current timestep matches the programmed load time, the input register is enabled to load the value on the array-element input.
  • 17. Crossbar Each crossbar input is selected from a collection of subarray network inputs and subarray LUT outputs via by a pre-crossbar multiplexor. Subarray inputs are registered prior to the pre-crossbar multiplexor and outputs are registered immediately after the crossbar, either on the LUT inputs or before traversing network wires. This pipelining makes the LUT evaluations and crossbar traversal a single pipeline stage. Each registered, crossbar output is routed in several directions to provide connections to other subarrays or chip I/O.
  • 18. Crossbar The single subarray crossbar performs all major switching roles: output crossbar -- routing data from LUT outputs to destinations or intermediate switching crossbars routing crossbar -- routing data through the network between source and destination subarrays input crossbar -- receiving data from the network and routing it to the appropriate destination LUT input
  • 19. Intra-Subarray Switching Communication within the subarray is simple and takes one clock cycle per LUT evaluation and interconnect. Once a LUT has all of its inputs loaded, the LUT output can be selected as an input to the crossbar, and the LUT's consumers within the subarray may be selected as crossbar outputs.
  • 20. Intra-Subarray Switching A number of subarray outputs are run to each subarray in the same row and column.