## CprE / ComS 583 Reconfigurable Computing

Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University

Lecture #4 - FPGA Technology Mapping

### • • Quick Points

- · Lectures are viewable for students via WebCT
  - Quality is higher
  - Use discussion forums
- Class e-mail list created: cpre583@iastate.edu
- · Less focus on interconnect theory
  - · More on interconnects in actual devices
  - Read [AggLew94], [ChaWon96A], [Deh96A] for more details

uguet 20, 2007

CarE E93 Reconfigurable Computing

. . . . . . . .

## Recap Various FPGA programming technologies (Anti-fuse, (E)EPROM, Flash, SRAM): Polysilicon N+ diffusion ONO Diselectric Priedd Oxide Oxide SRAM most popular

## 

# Outline Recap General Routing Architectures FPGA Architectural Issues Early Commercial FPGAs Xilinx XC3000 Xilinx XC4000 Technology Mapping using LUTs August 30, 2007 CprE 983 - Reconfigurable Computing Lect-04.5



























## **LUT Computational Limits** • k-LUT can implement 22k functions • Given *n* such *k*-LUTs, can implement $(2^{2^k})^n$ Since 4-LUTs are efficient, want to find n such that $(2^{2^4})^n >= 2^{2^M}$ • Example – implementing a 7-LUT with 4-LUTs:

## LUT Computational Limits (cont.)

- · How much computation can be performed in a table lookup?
- Upper bound (from previous) n <= 2<sup>M-3</sup>
- Need n 4-LUTs to cover a M-LUT:

$$(2^{2^4})^n >= 2^{2^M}$$
  
 $n\log(2^{2^4}) >= \log(2^{2^M})$   
 $n2^4\log(2) >= 2^M\log(2)$   
 $n2^4 >= 2^M$   
 $n >= 2^{M-4}$ 

• Adding upper bound  $-2^{M-4} \le n \le 2^{M-3}$ 

## **LUTs Versus Memories**

- Can also implement  $(2^{2^k})^w$  as a single large memory with k inputs
- Large memory advantage no need for interconnect and only one input decoder required
- Consider a 32K x 8bit memory (170M λ<sup>2</sup>, 21ns latency)
  - w = 8
- k = 16 (or 2 8-bit inputs to address  $2^{16}$  locations)
- Can implement an 8-bit addition or subtraction
- Xilinx XC3042 288 4-LUTs (180M λ², 13ns CLB delay)
- 15-bit parity calculation:
  - 5 4-LUTs (<2% of XC4032) 3.125M λ<sup>2</sup>)
  - Entire SRAM 170M  $\lambda^2$
- 7-bit addition:
  - 14 4-LUTs (<5% of XC4032) 8.75M λ²)</li>
  - Entire SRAM 170M  $\lambda^2$

LUT Technology Mapping

- · Task: map netlist to LUTs, minimizing area and/or delay
  - Similar to technology mapping for traditional
  - Library approach not feasible O(2<sup>2<sup>k</sup></sup> / k!) elements in library
  - · In general it is NP-hard











## Summary

- FPGA design issues involve number of logic blocks per cluster, number of inputs per logic block, routing architecture, and k-LUT size
- Can build *M*-LUT with *n k*-LUTs where  $2^{M-3} \le n \le 2^{M-4}$
- Large LUTs generally inefficient
   Technology mapping is simplified because of 4-LUT properties
  - Techniques decomposition, replication, reconvergence, dynamic programming
     Area- or delay-optimal mapping still NP hard