# **Routing Architectures for Hierarchical**

# **Field Programmable Gate Arrays**

Aditya A. Aggarwal and David M. Lewis

University of Toronto Department of Electrical and Computer Engineering Toronto, Ontario, Canada

# Abstract

This paper evaluates an architecture that implements a hierarchical routing structure for FPGAs, called a hierarchical FPGA (HFPGA). A set of new tools has been used to place and route several circuits on this architecture, with the goal of comparing the cost of HFPGAs to conventional symmetrical FPGAs. The results show that HFPGAs can implement circuits with fewer routing switches, and fewer switches in total, compared to symmetrical FGPAs, although they have the potential disadvantage that they may require more logic blocks due to coarser granularity.

# 1. Introduction

Field programmable gate array architecture has been the subject of several studies that attempt to evaluate various logic blocks and routing architectures, with the goals of reducing circuit area and increasing circuit speed [1-8]. Work in routing architectures in particular has focused on topics such as the number of programmable switches and length of routing wires. Most of the studies in routing architectures have been performed on FPGA architectures that have evolved from standard MPGA architectures, and contain a set of routing channels and possibly switch boxes. In this paper we refer to FPGAs with a symmetrical grid of logic blocks and routing channels on all four sides of the logic blocks as symmetrical FPGAs. Less work has been done for hierarchical FPGAs (HFPGAs), which contain a hierarchy of logic blocks and routing resources. Commercially available HFPGAs have switch patterns that comprise channels that are fully populated with switches [13]. As a result, HFPGAs offer lower density than other FPGAs due to the fully populated switch patterns, but also have an advantage that routing has more predictable as well as lower delays.

This paper explores the architecture of HFPGAs with regard to the particular issue of how the switch patterns of HFPGAs can be partly depopulated, while still maintaining 100% routability. Such architectures could offer higher density and higher speed than fully populated routing resources. Furthermore, by comparing HFPGA architecture to that of a symmetrical FPGA, and holding all other features constant, we hope to clarify the impact of HFPGA architecture in isolation.

The remainder of this paper studies the effect of routing architectures in HFPGAs as an independent architectural feature. It first defines an architecture for HFPGAs with partially populated switch patterns. Experiments are then presented comparing the number of routing switches required to the number required by a symmetrical FPGA. Results for the total number of switches and tracks are also given.

# 2. Architecture and Area Models for HPFGAs

Figs. 1(a) and 1(b) illustrate the components in a HFPGA. An HFPGA consists of two types of primitive blocks, or level-0 blocks: logic blocks and I/O blocks. While any logic block can be used for the level-0 logic block, this paper uses 4-input lookup tables because they are known to be a good choice, and a direct comparison to symmetrical FPGAs is easily performed.





Our HFPGA architecture defines a level-1 logic block as a collection of level-0 blocks and a routing channel, containing  $N_0$  logic blocks and  $M_0$  I/O blocks equally distributed on both sides of a routing channel. Some of the tracks are local to the level-0 block, and can only route

1063-6404/94 \$4.00 © 1994 IEEE

This work was supported by Micronet.

signals within that block, while others tracks are global, and can make connections outside the level-0 block. The global tracks of the level-1 logic block correspond to its pins at the level 2. Figs. 1(a) and 1(b) show the two types of level-0 blocks, and Fig. 1(c) shows a level-1 block constructed with  $N_0 = 2$  and  $M_0 = 2$ .

This architecture is extended to an arbitrary number of levels of hierarchy by defining a level-(i+1) block as containing  $N_i$  level-*i* blocks. I/O blocks occur only within level-1 blocks. The configuration of logic blocks in an *n* level HFPGA is described by the expression  $N_{n-1} \times \cdots \times N_1 \times (N_0, M_0)$ . Fig. 2 shows an example of a 4×2×(2,2) architecture. The channel at the highest level, i.e. level-*n*, is called the global channel. HFPGAs are potentially faster than symmetrical FPGAs because the maximum number of switches in series in any net in an HFPGA with N logic blocks is O(log(N)), while it is  $O(N^{1/2})$  in a symmetrical FPGA.



Figure 2. Example HFPGA with  $4 \times 2 \times (2,2)$  architecture

#### 2.1 Switch Patterns

We have investigated two switch patterns, a uniform and non-uniform one, shown in Fig. 3, but will only describe the latter, which requires fewer switches. The average number of switches per track at level *i* that are available for connecting to each level-*i*-1 logic block is designated  $F_i$ . Each channel is divided into four groups of tracks, nearly equal in size. Each track has a minimum of two switches per logic block, since this was observed to be the minimum necessary to have reasonable routability. The remaining switches left after allocating two to each track are allocated to the tracks using an exponential distribution, so that a track in the *j*th group has twice as many extra switches as a track in the j+1th group. Our studies show that this topology uses about 5% fewer switches than a uniform distribution.



Figure 3. Uniform and Non-Uniform Switch Patterns

# 2.2 HFPGA Cost Model

We have studied the area of HFPGAs using both a simple cost model based on routing switch counts, and a more refined model that more accurately estimates circuit area. This paper summarizes results including only switch counts. The results differ slightly using more detailed area models, but will not be discussed in this paper.

# 3. Experimental Study

This section presents a comparison between HFPGA and symmetrical FPGA switch counts using a set of 12 MCNC benchmarks. New software tools for placement (HPlacement), global routing (HGR), and detailed routing (HDR) were written to conduct this study. The experimental procedure is similar to previous architectural studies in FPGAs, using the following steps:

- 1. logic optimization using SIS [11]
- 2. technology mapping using chortle-crf [10].
- placement using HPlacement for HFPGAs, and LocusRoute for symmetrical FPGAs [12]
- 4. global routing using HGR for HFPGAs, and PGARoute for symmetrical FPGAs [12]
- detailed routing using HDR for HFPGAs, and CGE for symmetrical FPGAs [12]

In all experiments, a three-level architecture was used, since the size of the benchmarks made this necessary. Given a benchmark of some fixed size  $N_c$ , and performing an experiment on an HFPGA with specified  $N_1$  and  $N_0$ , the size of the smallest HFPGA that can contain the circuit is  $2 \times N_1 \times N_0 \times \left| \frac{N_c}{2 \times N_1 \times N_0} \right|$ . This is similar to previous experiments that choose FPGA sizes to contain the specific benchmark under investigation, but HFPGAs usually have coarser granularity, and are penalized as a result.

Routing with the channel width set equal to channel density has been observed to require a large number of switches. Our experiments investigate the effect of adding  $W_x$  extra tracks at each level, which increases track count,

but may reduce total area because not as many switches are required for routing completion.

### 3.1 Number of Routing Switches Required

The first experiment compares switch counts for HFPGAs and symmetrical FPGAs. The channel width at each level is set to the maximum channel density encountered at that level plus 3 (i.e.  $W_x = 3$ ), which was thought to be reasonable based on early investigations. Each circuit was then routed on all possible architectures, considering all possible values of  $F_i$ . Table 1 compares the HFPGA architecture that minimizes switch counts to those obtained for symmetrical FPGAs. CGE does not consider routing switches required to connect to I/O pins, so this table shows data for routing switches both including and excluding those switches. In every case there is a clear advantage for HFPGAs.

TABLE 1. Comparison of Switch Count for HFPGA and Symmetrical FPGA

| cct.   |        | SFPGA   |                  |     |
|--------|--------|---------|------------------|-----|
|        | w/ sw. | w/o sw. | $N_1 \times N_0$ |     |
| 9symm1 | 65     | 72      | 4×4              | 94  |
| c499   | 67     | 83      | 8×4              | 117 |
| c880   | 83     | 103     | 8×4              | 122 |
| i5     | 43     | 83      | 4×4              | 172 |
| x4     | 82     | 112     | 8×4              | 138 |
| ex2    | 85     | 118     | 8×4              | 143 |
| x1     | 94     | 112     | 8×4              | 111 |
| c1355  | 71     | 89      | 8×4              | 105 |
| alu2   | 90     | 96      | 4×4              | 116 |
| i2     | 64     | 108     | 4×4              | 105 |
| apex7  | 59     | 73      | 4×4              | 116 |
| c432   | 70     | 80      | 8×4              | 116 |
| avg.   | 73     | 94      |                  | 121 |

3.2 Evaluation of Hierarchical Logic Block Sizes

Table 1 shows that the optimal architectures when chosen for each benchmark individually tend to have relatively small low level blocks (levels 1 and 2). To design a series of HFPGAs it would be preferable to choose a single value for these sizes ( $N_1$  and  $N_1$ .) We investigated total area for all circuits using  $4\times 2$ ,  $4\times 4$ ,  $4\times 8$ ,  $4\times 16$ ,  $8\times 4$ ,  $8\times 8$ , and  $16\times 4$  architectures, with  $M_0$  being identical to  $N_0$  in each case. In this experiment we were interested in isolating the lower level logic block architecture from the granularity issue. To do so, we chose a FPGA size for each benchmark that was large enough to be an even multiple of the largest level-2 block for all architectures. This exhibits the same amount of under-utilization of logic blocks across all of the architectures, and thus eliminates the impact of granularity on the optimal architecture.

Table 2 shows the minimum value of  $\alpha$  required, and shows that architectures with small low level blocks, such as  $4\times 2$  and  $4\times 4$  architectures require the fewest switches.

The HFPGA architecture allows tradeoffs in the number of routing switches per track at each level and the

**TABLE 2.** Minimum  $\alpha$  for Each HFPGA Combination of  $N_0$  and  $N_1$ .

| cct.   | routing switch count |     |     |      |     |     |      |  |  |  |
|--------|----------------------|-----|-----|------|-----|-----|------|--|--|--|
|        | 4x2                  | 4×  | 4×8 | 4×16 | 8×4 | 8×8 | 16×4 |  |  |  |
| 9symm1 | 78                   | 80  | 94  | 119  | 73  | 91  | 86   |  |  |  |
| c499   | 86                   | 80  | 98  | 165  | 83  | 117 | 82   |  |  |  |
| c880   | 90                   | 100 | 131 | 172  | 103 | 128 | 100  |  |  |  |
| i5     | 40                   | 48  | 66  | 88   | 41  | 53  | 39   |  |  |  |
| x4     | 57                   | 67  | 101 | 114  | 70  | 90  | 63   |  |  |  |
| ex2    | 72                   | 92  | 77  | 119  | 83  | 88  | 79   |  |  |  |
| x1     | 104                  | 109 | 134 | 184  | 112 | 141 | 118  |  |  |  |
| c1355  | 78                   | 78  | 108 | 141  | 89  | 103 | 89   |  |  |  |
| alu2   | 98                   | 88  | 95  | 143  | 80  | 101 | 91   |  |  |  |
| i2     | 43                   | 56  | 65  | 84   | 47  | 72  | 67   |  |  |  |
| apex7  | 63                   | 79  | 79  | 110  | 75  | 83  | 78   |  |  |  |
| c432   | 95                   | 111 | 111 | 131  | 93  | 104 | 92   |  |  |  |
| avg.   | 75                   | 82  | 97  | 131  | 79  | 98  | 82   |  |  |  |

number of excess tracks. Using the 12 benchmarks and 7 different combinations of level-1 and level-2 logic blocks, we explored the effect of routing switch counts by routing the total of 84 different circuit and architecture combinations. The parameters  $F_1$ ,  $F_2$ ,  $F_3$ , and  $W_x$  were varied from 1 to 4, 1 to 7, 1 to 8, and 1 to 7 respectively. Because this implies a total of 131712 experiments, we simplified the process by performing the experiment for each parameter using only a few combinations of the other parameters.

Results for  $F_1$ ,  $F_2$ , and  $F_3$  are presented in Fig 4, which shows the total number of combinations successfully routed. In each case, one switch produces poor results, and two produces a dramatic increase, while three switches lead to nearly 100% completion. While higher levels tend to need slightly more switches, this has minimal impact on total switch count. Fig 5 shows that  $W_x$  of at least four is desirable for good routability. This shows that a few switches at each level are adequate if some extra tracks are provided.

The general conclusion from these studies is that HFPGAs with two to three routing switches at each level, and at least four extra tracks are desirable. Adding extra tracks or routing switches is less expensive at the higher levels, since there are fewer total tracks at the higher levels.

## 4. Conclusions

This paper has described an architecture for hierarchical FPGAs, and studied values of architectural parameters that lead to minimum area circuits. The conclusions are that HFPGAs have reduced switch counts, and consequently reduced costs compared to symmetrical FPGAs.

The detailed choice of routing architecture parameters was also studied. This shows that small low-level blocks are preferable, as well as showing that two to four switches per track, depending on the level of the channel, as well as four excess tracks can lead to nearly 100% routing success. 5. References



(c) Routing Completion versus F<sub>3</sub> Figure 4. Routing Completion Versus Switch Counts in Connection Boxes

 S. Singh, J. Rose, D. Lewis, K. Chung and P. Chow, "Optimization of Field-Programmable Gate Array Logic Block Architecture for Speed", Proceedings of the 1991 Custom Integrated Circuits Conference (CICC-91), May 1991, pp. 6.1.1-6.1.6



Figure 5. Routing Completion versus  $W_x$ 

- [2] J. Rose, R. J. Francis, D. Lewis, P. Chow, "Architecture of Field-Programmable Gate Arrays: The Effect of Logic Block Functionality on Area Efficiency", *IEEE Journal of Solid State Circuits* (JSSC), Vol. 25, No. 5, Oct. 1990, pp. 1217-1225
- [3] H. Hsieh, W. Carter, J. Y. Ja, E. Cheung, S. Schreifels, C. Erickson, P. Freidin and L. Tinkey, "Third-Generation Architecture Boosts Speed and Density of Field-Programmable Gate Arrays", Proceedings of the 1990 Custom Integrated Circuits Conference (CICC-90), pp. 31.2.1-31.2.7
- [4] K. Chung, S. Singh, J. Rose and P. Chow, "Using Hierarchical Logic Blocks to Improve the Speed of Field-Programmable Gate Arrays", Proceedings of the First International Workshop on Field Programmable Logic and Applications, Oxford, Sept. 1991, pp. 103-113
- [5] J. Rose and S. Brown, Flexibility of Interconnection Structures in Field Programmable Gate Arrays", *IEEE Journal of Solid State Circuits (JSSC)*, Vol. 26, No. 3, March 1991, pp. 277-282
- [6] S. D. Brown, "Routing Algorithms and Architectures for Field Programmable Gate Arrays", PhD. Thesis, University of Toronto, Toronto, Canada
- [7] J. Kouloheris and A. El Garnal, "FPGA Performance vs. Cell Granularity", Proceedings of the 1991 Custom Integrated Circuits Conference (CICC-91), May 1991, pp. 6.2.1-6.2.4
- [8] J. Kouloheris and A. El Gamal, "FPGA Area vs. Cell Granularity-Lookup tables and PLA Cells", ACM/SIGDA Workshop on Field-Programmable Gate Arrays, FPGA-92, Berkeley, CA, February 1992, pp. 9-14.
- [9] J. Kouloheris and A. El Gamal, "PLA-based FPGA Area vs. Cell Granularity", Proceedings of the 1992 Custom Integrated Circuits Conference (CICC-92), May 1992, pp. 4.3.1-4.3.4
- [10] R. J. Francis, J. Rose and Z. Vranesic, "Chortle-crf: Fast Technology Mapping for Lookup Table-Based FPGAs", Proceedings of the 28th Design Automation Conference (DAC-28), June 1991, pp. 227-233
- [11] R. Brayton, R. Rudell, A. Sangiovanni Vincentelli and A. Wang, "MIS: a Multiple-Level Logic Optimization System", *IEEE Tran*sactions on CAD (TCAD), Vol CAD-6, No. 6, Nov. 1987, pp. 1062-1081
- [12] J. Rose, Z. Vranesic and W. M. Snelgrove, "ALTOR: An automatic standard cell layout program", in *Proceedings of the Canadian Conference on VLSI*, Nov. 1985, pp. 168-173
- [13] Altera Corp., "The Maximalist Handbook", Altera, 1990