# Universal Switch-Module Design for Symmetric-Array-Based FPGAs \*

Yao-Wen Chang<sup>‡</sup>, D. F. Wong<sup>‡</sup>, and C. K. Wong<sup>§†</sup>

<sup>‡</sup>Department of Computer Sciences, University of Texas at Austin, Austin, Texas 78712 <sup>§</sup>Department of Computer Science, Chinese University of Hong Kong, Hong Kong

### Abstract

A switch module M with W terminals on each side is said to be universal if every set of nets satisfying the dimensional constraint (the number of nets on each side of M is at most W) is simultaneously routable through M. In this paper, we present a class of universal switch modules. Each of our switch modules has 6W switches and switch-module flexibility three  $(F_S = 3)$ . We prove that no switch module with less than 6Wswitches can be universal. We also compare our switch modules with those used in the Xilinx XC4000 family FPGA's and the anti-symmetric switch modules (with  $F_S = 3^1$ ) suggested by [15]. Although these two kinds of switch modules also have  $\ddot{F_S} = 3$  and  $6\breve{W}$  switches, we show that they are not universal. Based on combinatorial counting techniques, we show that each of our universal switch modules can accommodate up to 25% more routing instances, compared with the XC4000-type one of the same size. Experimental results demonstrate that our universal switch modules improve routability at the chip level. Finally, our work also provides a theoretical insight into the important observation by Rose and Brown [15] that  $F_S = 3$ is often sufficient to provide high routabilty.

#### **1** Introduction

As a relatively new technology, FPGA's are still undergoing significant change in their architectures [3, 19]. This paper addresses the FPGA architecture design problem. A typical FPGA consists of a symmetric array of logic modules which can be connected by general routing resources. Figure 1(a) shows the symmetric-array FPGA model. The logic modules contain circuits that implement logic functions. The routing resources comprises segments of wires and two kinds of modules, switch modules and connection modules, which contain user-programmable switches. The intersection of a horizontal and a vertical channels is referred to as a switch module; the switch module serves to connect wire segments, and this requires using programmable switches inside it. Figure 1(b) illustrates a switch module, in which the programmable switches, denoted by dashed lines, between terminal 1 and others are shown. The *flexibility* of a switch module, represented by  $F_S$ , is defined as the number of programming switches between a terminal and others [15]; for example, the switch module in Figure 1(b) has  $F_S = 6$ . Connection modules are used to connect logic-module pins to wire segments. We refer to the connection-module flexibility, denoted by  $F_C$ , as the number of tracks that a logic-module pin can connect to [15]; see Figure 2 for an illustration.



Figure 1: (a) The symmetric-array FPGA model. (b) A switch module.



Figure 2: Connection-module flexibility  $(F_C = 2)$ .

Architectural studies for the symmetric-array FPGA have been reported in numerous literature. Logic-module architectures were studied by [13, 16, 18], and connection-module ones by [9, 15]. Researchers have shown that the feasibility of FPGA design is constrained more by routing resources than by logic resources [2, 20]. Thus it is of importance to facilitate routing in the FPGA design. Switch modules are a crucial component for FPGA routing [1, 4, 6, 12, 14, 17]. Intuitively, a switch module with a larger routing capacity<sup>2</sup> would have better area performance in FPGA routing. To verify this intuition, we perform experiments and show that switch modules with larger routing capacities result in better routing solutions. In fact, work by [6, 15, 17] have also revealed the fact. The following crucial factors contribute to this phenomenon:

- Switch modules with larger routing capacity increase the connectivity of routing components, and thus improve the overall routability of an FPGA.
- Most logic-module pins are logically equivalent [19]<sup>3</sup>; the pin permutations combined with highly routable switch modules pave the way for optimizing routing.
- For practical applications, most nets are short. For example, about 60% (90%) of nets in the CGE [4] and SEGA [12] benchmark circuits route through no more

<sup>\*</sup>This work was partially supported by the Texas Advanced Research Program under Grant No. 003658459.

<sup>&</sup>lt;sup>†</sup>On leave from IBM Corporation, T. J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598.

<sup>&</sup>lt;sup>1</sup>More general  $F_S$ 's were considered in [15].

<sup>&</sup>lt;sup>2</sup>Number of routing instances that can route on a switch module. Section 2 gives a formal definition.

<sup>&</sup>lt;sup>3</sup>For example, the lookup-table and control inputs in a logic module are logically equivalent.



Figure 3: (a) (b) Two anti-symmetric switch modules. (c) A Xilinx XC4000-type switch module and (d) its switch-module model.

than two (five) switch modules, independent of FPGA sizes. Thus the routability of a single switch module plays an important role in overall FPGA routing.

Hence increasing the routing capacity of a switch module also improves the area performance of a router. Therefore, it is of particular importance to consider the switch-module architecture of an FPGA.

The main consideration in the FPGA switch-module design is the trade-off in the routing capacity and area limitation of a switch module. The programmable switches usually occupy large areas, and hence the number of switches that can be placed in a switch module usually is limited. On the other hand, fewer switches in a switch module would reduce its routing capacity. A switch module M with W terminals on each side is said to be *universal* if every set of nets satisfying the dimensional constraint, i.e., the number of nets on each side of M is at most W, is simultaneously routable through M.<sup>4</sup> A universal switch module has the maximum routing capacity, and thus it is desirable to design such a switch module using the minimum number of switches.

Switch-module architectures for symmetric-array FPGA's have been studied by [7, 11, 15, 17, 21, 22, 24] recently. Two kinds of well-known switch-module architectures were used by Rose and Brown [15] and Xilinx, Inc. [11, 23]. Figures 3(a)(b)show two anti-symmetric architectures used in [15]  $(F_S = 3)$ , in which W = 3 and W = 4, respectively. Figure 3(c) de-picts a switch module of W = 3 used in the Xilinx XC4000 family FPGA's [11, 23], and Figure 3(d) illustrates its switchmodule model; the XC4000-type switch modules also have  $F_S = 3$ . The effects of switch-module architectures on routing in symmetric-array FPGA's were first studied experimentally by [15]. An important observation by [15] is that 100% detailed-routing completion is often achieved for  $F_S = 3$  com-bined with high  $F_C$ . This provides an empirical way to choose a switch-module architecture. The switch modules used in the Xilinx XC4000 family FPGA's are currently regarded as a best architecture among those with  $F_S = 3$  [22]. However, we will show later that there exist universal switch modules whose routing capacities are the proper supersets of those of the anti-symmetric and the XC4000-type ones of the same size and with the same number of switches; that is, neither of these two kinds of well-known switch modules is universal.

In this paper, we present a class of universal switch modules. Each of our switch modules has 6W switches and  $F_S = 3$ . We prove that no switch module with less than 6W switches can be universal. We also compare our switch modules with the XC4000-type and the anti-symmetric switch modules (with  $F_S = 3$ ). Although these two kinds of switch modules also have  $F_S = 3$  and 6W switches, we show that they are not universal. Based on combinatorial counting techniques, we prove that each of our universal switch modules can accommodate up to 25% more routing instances, compared with the Xilinx XC4000-type one of the same size. Experimental results demonstrate that our universal switch modules a theoretical insight into the important observation by Rose



Figure 4: Six types of connections.

and Brown [15] that  $F_S = 3$  combined with high  $F_C$  is often sufficient to provide high routabilty.

We shall focus on switch modules with  $F_S = 3$  in this paper. As mentioned earlier, the reasons are threefold:

- It will be clear later that it suffices to use  $F_S = 3$  to construct a universal switch module.
- As shown in [3, 5, 15], 100% detailed-routing completion is often achieved for  $F_S = 3$ .
- The switch modules used in the Xilinx XC4000 family FPGA's have  $F_S = 3$ .

#### 2 Preliminaries

A switch module is a  $W \times W$  square block, where W is the number of terminals on each side of the switch module. Some pairs of terminals, on different sides of the module, may have programmable switches and thus can be connected by programming the switches to be "ON." Moreover, these switches are electrically *non-interacting*, unless they share a terminal. We represent a switch module by M(T,S), where T is the set of terminals, and S the set of switches. Label the terminals  $t_1, t_2, \ldots, t_{4W}$  starting from the bottom most terminal on the left side and proceeding clockwise. Let  $T_L = \{t_1, \ldots, t_W\}$  (*left terminals*),  $T_T = \{t_{W+1}, \ldots, t_{2W}\}$  (*top*),  $T_R = \{t_{2W+1}, \ldots, t_{3W}\}$  (*right*), and  $T_B = \{t_{3W+1}, \ldots, t_{4W}\}$  (*bottom*). Therefore,  $S = \{(t_i, t_j)\}$  there exists a programmable switch between terminals  $t_i$  and  $t_j\}$ , and  $T = \bigcup_{i \in \{L,T,R,B\}} T_i$ . For convenience, we often refer to a switch module M(T, S) simply as M, omitting T and S, if there is no ambiguity about T and S, or T and S are not of concern in the context.

A net can be routed through a switch module by programming some switch to be "ON." To characterize such a local route, we say a *connection* is established in the switch module between two terminals  $t_i$  and  $t_j$ , on different sides of the switch module, if the switch  $(t_i, t_j)$  is programmed to be "ON." There are six types of connections. Each type is characterized by two sides of a switch module. Figure 4 shows the classification. The connection labeled  $i, 1 \leq i \leq 6$ , in Figure 4, is said to be of *Type-i*. For instance, Type-3 connections connect terminals on the left and the top sides of a switch module.

A routing requirement vector (RRV)  $\vec{n}$  is a six-tuple  $(n_1, n_2, \ldots, n_6)$  where  $0 \le n_i \le W$ ,  $1 \le i \le 6$ . A routing for an RRV on a given switch module is a set of connections such that there are  $n_i$  of Type-*i* connections, for  $i \in \{1, \ldots, 6\}$ , and those connections are electrically non-interacting. An RRV  $\vec{n}$  is said to be routable on a switch module M if there exists a routing for  $\vec{n}$  on M. The routing capacity of a switch module M is referred to as the number of distinct routable vectors on M; i.e., the routing capacity of M is the cardinality  $|\{\vec{n}|\vec{n} \text{ is routable on } M\}|$ . The universal switch module is defined as follows:

**Definition 1** A switch module M of size W is called universal if the following set of inequlaities is the sufficient and necessary conditions for an RRV  $\vec{n} = (n_1, \ldots, n_6)$  to be routable on M:

| ſ | $n_1 + n_3 + n_6$ | $\leq$ | W  |
|---|-------------------|--------|----|
| J | $n_2 + n_3 + n_4$ | $\leq$ | W  |
| Ì | $n_1 + n_4 + n_5$ | $\leq$ | W  |
| U | $n_2 + n_5 + n_6$ | $\leq$ | W. |

 $<sup>^{4}</sup>$  A precise definition of the universal switch module will be given in Definition 1, Section 2.



Figure 5: Switch modules with different topologies (W = 2). (a)  $M_1$ . (b)  $M_2$ . (c)  $M_3$ . (d)  $M_4$ . (e)  $M_5$ . (f)  $M_6$ .

Note that the number of nets routing through each side of M can not exceed W; this dimensional constraint is characterized by the above four inequalities, one for each side. Therefore, a universal switch module has the maximum routing capacity. This paper addresses the problem of design universal switch modules using the minimum number of programmable switches.

### 3 Universal Switch Modules

Consider the six switch modules depicted in Figure 5. Each contains 12 switches and is of the size W = 2 and with the flexibility  $F_S = 3$ . However, only three out of the seven RRV's listed in Table 1 are routable on  $M_3$ ,  $M_4$ ,  $M_5$ , and  $M_6$  while all of the seven RRV's are routable on  $M_1$  and  $M_2$ . (In Table 1, a  $\bigcirc$  represents that the RRV listed in the same row is routable on the corresponding switch module, and a  $\times$  denotes that it is unroutable.) This shows the effects of switch-module topologies on routing. It is obvious that  $M_1$  and  $M_2$  have the largest routing capacity among these six switch modules. We refer to the topology of  $M_1$  as the symmetric topology. See Algorithm Symmetric Switch-Modules for the construction of symmetric switch modules. Note that the switch module  $M_3$  is associated with those used in the Xilinx XC4000 family FPGA's (see Figures 3(c)(d)).

|                                                                                                                                                                       | Switch module |         |            |              |               |              |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|---------|------------|--------------|---------------|--------------|
| RRV                                                                                                                                                                   | $M_1$         | $M_2$   | $M_3$      | $M_4$        | $M_5$         | $M_6$        |
| $ \begin{array}{c} (1,1,1,0,1,0) \\ (1,1,0,1,0,1) \\ (1,0,1,1,0,0) \\ (1,0,0,0,1,1) \\ (0,1,1,0,0,1) \\ (0,1,0,1,1,0) \\ (0,0,1,1,1,0) \\ (0,0,1,1,1,1) \end{array} $ | 0000000       | 0000000 | 00 × × × 0 | × × 00 × × 0 | × O × O × O × | × 00 × 0 × × |
| # Other routable RRV's                                                                                                                                                | 49            | 49      | 49         | 49           | 49            | 49           |
| Routing capacity                                                                                                                                                      | 56            | 56      | 52         | 52           | 52            | 52           |

Table 1: Effects of switch-module topologies on routing (O: routable; ×:unroutable).

As mentioned earlier, we intend to identify, not only a single, but a whole class of universal switch modules. We first borrow the terminology *isomorphism* from graph theory (and algebra). It will be used to identify a class of switch modules with the same routing capacity. Here gives its definition.

**Definition 2** Two switch modules M(T, S) and M'(T', S')are isomorphic if there exists a bijection  $f: T \to T'$  such that  $(t_i, t_j) \in S$  if and only if  $(f(t_i), f(t_j)) \in S'$  and, for any two

| Algorithm: Symmetric_Switch_Module( $W$ )<br>Input: $W$ - size of the switch module.<br>Output: $M(T, S)$ - the symmetric switch module of size $W$ ;<br>T: set of terminals; $S$ : set of switches. |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |  |  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| 1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10                                                                                                                                                      | $\begin{array}{l} T \leftarrow \cup_{i \in \{L,T,R,B\}} T_i; \ /* \ \text{See Section 2 for labeling }*/\\ S \leftarrow \emptyset;\\ \text{for } i \leftarrow 1 \ \text{to } W \ \text{do} \ \{\\ & S \leftarrow S \cup \{(t_i,t_{3W-i+1})\}; \ /* \ \text{Type-1 connections }*/\\ & S \leftarrow S \cup \{(t_{W+i},t_{4W-i+1})\}; \ /* \ \text{Type-2 connections }*/\\ & S \leftarrow S \cup \{(t_{W+i},t_{3W-i+1})\}; \ /* \ \text{Type-3 connections }*/\\ & S \leftarrow S \cup \{(t_{W+i},t_{3W-i+1})\}; \ /* \ \text{Type-4 connections }*/\\ & S \leftarrow S \cup \{(t_{2W+i},t_{4W-i+1})\}; \ /* \ \text{Type-5 connections }*/\\ & S \leftarrow S \cup \{(t_i,t_{4W-i+1})\}; \ /* \ \text{Type-6 connections }*/\\ & S \leftarrow S' \ (t_i,t_{4W-i+1})\}; \ /* \ \text{Type-6 connections }*/\\ & S \leftarrow S' \ (t_i,t_{4W-i+1})\}; \ /* \ \text{Type-6 connections }*/\\ & S \leftarrow S' \ (t_i,t_{4W-i+1})\}; \ /* \ \text{Type-6 connections }*/\\ & S' \leftarrow S' \ (t_i,t_{4W-i+1})\}; \ /* \ \text{Type-6 connections }*/\\ & S' \leftarrow S' \ (t_i,t_{4W-i+1})\}; \ /* \ \text{Type-6 connections }*/\\ & S' \leftarrow S' \ (t_i,t_{4W-i+1})\}; \ /* \ \text{Type-6 connections }*/\\ & S' \leftarrow S' \ (t_i,t_{4W-i+1})\}; \ /* \ \text{Type-6 connections }*/\\ & S' \leftarrow S' \ (t_i,t_{4W-i+1})\}; \ /* \ \text{Type-6 connections }*/\\ & S' \leftarrow S' \ (t_i,t_{4W-i+1})\}; \ /* \ \text{Type-6 connections }*/\\ & S' \leftarrow S' \ (t_i,t_{4W-i+1})\}; \ /* \ \text{Type-6 connections }*/\\ & S' \leftarrow S' \ (t_i,t_{4W-i+1})\}; \ /* \ \text{Type-6 connections }*/\\ & S' \leftarrow S' \ (t_i,t_{4W-i+1})\}; \ /* \ \text{Type-6 connections }*/\\ & S' \leftarrow S' \ (t_i,t_{4W-i+1})\}; \ /* \ \text{Type-6 connections }*/\\ & S' \leftarrow S' \ (t_i,t_{4W-i+1})\}; \ /* \ \text{Type-6 connections }*/\\ & S' \leftarrow S' \ (t_i,t_{4W-i+1})\}; \ (t_i,t_{4W-i+1}) \ (t_i,t_{4W-i+1})\}; \ (t_i,t_{4W-i+1}) \ (t_i,t_{4W-i+$ |  |  |
| 11                                                                                                                                                                                                   | Output $\dot{M}(T,S)$ ;                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |  |  |

Figure 6: The Symmetric\_Switch\_Module algorithm.



Figure 7: (a) A switch module and the original type definition. (b) An isomorphic switch module of (a) and its new type definition.

terminals  $t_i$  and  $t_j$ ,  $t_i$ ,  $t_j \in T_p$  if and only if  $f(t_i)$ ,  $f(t_j) \in T_q$ ,  $p, q \in \{L, T, R, B\}.$ 

In other words, M(T,S) and M'(T',S') are isomorphic if we can relabel the terminals of M to be the terminals of M', maintaining the corresponding switches in M and M'; and, for terminals on the same side of M, their corresponding terminals are also on the same side of M'. For instance, the switch modules shown in Figure 7 are isomorphic, and their corresponding terminals are indicated by the same number. For any two isomorphic switch modules, we have the following theorem.

**Theorem 1** Any two isomorphic switch modules have the same routing capacity.

**Corollary 1.1** For any two isomorphic switch modules M(T, S) and M'(T', S'), M(T, S) is universal if and only if M'(T', S') is universal.

By Corollary 1.1, we can identify a whole class of universal switch modules by performing isomorphism operations on a given universal switch module. The following theorem gives a way to find such a "base" universal switch module.

**Theorem 2** The switch modules constructed by Algorithm Symmetric\_Switch\_Modules are universal.

**Proof:** By Definition 1, we shall show that, for a switch module M of size W, constructed by Algorithm Symmetric\_Switch\_Modules,  $\vec{n}$  is routable on M if and only if the following inequalities are simultaneously satisfied:

$$\begin{cases} n_1 + n_3 + n_6 \leq W \\ n_2 + n_3 + n_4 \leq W \\ n_1 + n_4 + n_5 \leq W \\ n_2 + n_5 + n_6 \leq W. \end{cases}$$



Figure 8: Two universal switch modules and their submodules. (a) W = 4. (b) The two submodules of the switch modules in (a). (c) W = 3. (d) The two submodules of the switch module in (b).

For the switch modules constructed by the algorithm, we have the following key observations (see Figure 8). For a switch module of an even W, we can partition it into W/2non-interacting submodules (shown in Figure 8(b)); each submodule has the same topology as that of  $M_1$  in Figure 5(a). As mentioned earlier, the 56 RRV's satisfying the dimensional constraint for W = 2 are all routable on  $M_1$  (see Table 1); that is,  $M_1$  is universal. The reason is that, for W = 2, the three terminals, say terminals b, c and d, which connect to a terminal, say a, do not share any switch (see Figure 8(b)); thus the connections associated with them are non-interacting, except those associated with a. For a switch module with an odd W, we can partition it into  $\lceil W/2 \rceil$  non-interacting submodules, with each of  $\lfloor W/2 \rfloor$  submodules identical to  $M_1$  and one sub-module formed by the four terminals on the middle of each side of the switch module (see Figure 8(d)). Since terminals in different submodules are non-interacting, each submodule can be considered independently. Therefore, each symmetric switch module consists of  $\lfloor W/2 \rfloor$  independent universal switch submodules of size two, and one of size one if W is odd.

(If) If the constraints  $n_1 + n_3 + n_6 \leq W$  and  $n_1 + n_4 + n_5 \leq W$  ( $n_2 + n_3 + n_4 \leq W$  and  $n_2 + n_5 + n_6 \leq W$ ) are satisfied, by the above observations, it is always possible to place up to  $W - n_1$  ( $W - n_2$ ) Type-3 and -6, and Type-4 and -5 (Type-3 and -4, and Type-5 and -6) connections after  $n_1$  Type-1 ( $n_2$  Type-2) connections are placed. Hence, if all the four inequalities are satisfied, there must exist a feasible routing for  $\vec{n}$ ; that is,  $\vec{n}$  is routable on M.

(Only If) The total number of connections routing through each side of M can not exceed W. Thus, if  $\vec{n}$  is routable on M, the inequalities must be satisfied.

By Corollary 1.1 and Theorem 2, we can perform isomorphism operations on a switch module constructed by Algorithm Symmetric Switch\_Module to obtain a whole family of universal switch modules. Note that there are  $2 \times F_S \times W$  switches in a switch module [15]. Since the switch modules constructed by the algorithm have  $F_S = 3$ , we have the following corollary.

**Corollary 2.1** It needs only 6W switches to construct a universal switch module.

In particular, 6W switches are also the minimum requirement for constructing a universal switch module.

**Theorem 3** No switch module with less than 6W switches can be universal.



Figure 9: (a) A Xilinx XC4000-type switch module (W = 3) and its interconnect points, and (b) its corresponding switch-module model  $(F_S = 3)$ . (c) The three submodules of the switch module in (b).

By Corollary 2.1 and Theorem 3, our universal switch modules do have the minimum number of switches.

#### 4 Two Well-Known Switch Modules

We now consider the XC4000-type switch modules. Their switch-module architectures are illustrated in Figure 9. As mentioned in the preceding section, the switch module  $M_3$ of size W = 2 shown in Figure 5(c) is XC4000-type. The four RRV's (1,0,1,1,0,0), (1,0,0,0,1,1), (0,1,1,0,0,1), and (0,1,0,1,1,0) listed in Table 1 satisfying the dimensional constraint are not routable on the XC4000-type switch module. Hence the XC4000-type switch modules are not universal. Chang, Wong, and Wong [6] gave the feasibility condition for the XC4000-type switch modules as follows.

**Theorem** [**CWW**] For a Xilinx XC4000-type switch module M of size W,  $\vec{n}$  is routable on M if and only if  $\max\{n_1, n_2\} + \max\{n_3, n_5\} + \max\{n_4, n_6\} \leq W$ .

Figures 10(a) and (b) illustrate two different-sized antisymmetric switch modules generated by the program used in [15] (with  $F_S = 3$ ). It is simple to verify that the RRV (2, 2, 1, 0, 1, 0) which satisfies the dimensional constraint is not routable on the anti-symmetric switch module of W =3; see Figure 10(c) for an illustration. For different-sized anti-symmetric switch modules with  $F_S = 3$ , their switchconnection configurations are not uniform. Thus we will not explore their individual feasibility conditions; however, we shall note that the anti-symmetric switch modules are not universal.

### 5 Routing-Capacity Analysis

The preceding two sections give the feasibility conditions of the universal and the XC4000-type switch modules. We shall in this section analyze their routing capacities. Let  $M_{U,W}$ and  $M_{X,W}$  be a universal and a Xilinx XC4000-type switch modules of size W, respectively. Let  $U_W$  be the feasible set for  $M_{U,W}$ ; that is,  $U_W = \{\vec{n} | \vec{n} \text{ is routable on } M_{U,W} \}$ .  $X_W$  is similarly defined. We have the following lemma.

**Lemma 1**  $X_W \subseteq U_W$ .

Let  $|U_W|$  ( $|X_W|$ ) be the cardinality of  $U_W$  ( $X_W$ ). By enumerating the feasible routing instances, we can compute the ratio  $|U_W|/|X_W|$ . It will be shown that  $|U_W|/|X_W| \rightarrow 1.25$ ; in other words, for the two kinds of switch modules of the same size, the universal switch modules have up to 25% larger routing capacities than the XC4000-type ones. To obtain the ratio  $|U_W|/|X_W|$ , we first find the closed forms for  $|X_W|$  and  $|U_W|$ .



Figure 10: (a) (b) Two anti-symmetric switch modules with  $F_S = 3$ . (c) An unroutable vector (2, 2, 1, 0, 1, 0) for the anti-symmetric switch module of W = 3.

(c)

Lemma 2

 $\begin{array}{c} \overbrace{(1)}^{W} |_{X_{W}} = \begin{pmatrix} W + 6 \\ 6 \end{pmatrix} + 3 \begin{pmatrix} W + 5 \\ 6 \end{pmatrix} + 3 \begin{pmatrix} W + 4 \\ 6 \end{pmatrix} + \begin{pmatrix} W + 3 \\ 6 \end{pmatrix}; \\ (2) |U_{W}| = \lfloor \frac{1}{6!} (10W^{6} + 120W^{5} + 595W^{4} + 1560W^{3} + 2320W^{2} + 1920W + 720) \rfloor. \end{array}$ 

### **Proof**:

(1) By Theorem [CWW],  $X_W$  is the set of RRV's  $\vec{n}$ 's satisfying the following inequality:

$$\max\{n_1, n_2\} + \max\{n_3, n_5\} + \max\{n_4, n_6\} \leq W.$$

Hence, we have

$$\begin{cases} n_4 \leq W - \max\{n_1, n_2\} - \max\{n_3, n_5\} \\ n_6 \leq W - \max\{n_1, n_2\} - \max\{n_3, n_5\} \end{cases}$$

and

$$\begin{cases} n_3 + n_4 \leq W - \max\{n_1, n_2\} \\ n_4 + n_5 \leq W - \max\{n_1, n_2\} \\ n_5 + n_6 \leq W - \max\{n_1, n_2\} \\ n_3 + n_6 \leq W - \max\{n_1, n_2\}. \end{cases}$$

Consider the following two sets:

$$X_{W,p,1} = \{(n_1, n_2) | \max\{n_1, n_2\} = p\}$$
  

$$X_{W,p,2} = \{(n_3, n_4, n_5, n_6) | n_3 + n_4 \le W - p,$$
  

$$n_4 + n_5 \le W - p, \ n_5 + n_6 \le W - p$$
  

$$n_3 + n_6 \le W - p\}$$

where  $0 \leq p \leq W$ . We have

$$|X_W| = \sum_{p=0}^{W} |X_{W,p,1}| |X_{W,p,2}|$$
  
$$|X_{W,p,1}| = 2p + 1.$$

To compute  $|X_{W,p,2}|$ , we define the following two sets

$$X_{W-p,q,3} = \{(n_3, n_5) | \max\{n_3, n_5\} = q\}$$
  

$$X_{W-p,q,4} = \{(n_4, n_6) | n_4 \le W - p - q,$$
  

$$n_6 \le W - p - q\},$$

where  $0 \le q \le W - p$ . We have

$$X_{W-p,q,3} = 2q+1, \ 0 \le q \le W-p$$
  
$$X_{W-p,q,4} = \begin{pmatrix} & & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & \\ & & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & & \\ & & & & \\ & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ &$$

Hence,

$$\begin{aligned} |X_{W,p,2}| &= |\{(n_3, n_4, n_5, n_6)|n_3 + n_4 \le W - p, \\ n_4 + n_5 \le W - p, \\ n_5 + n_6 \le W - p\}| \\ &= \sum_{q=0}^{W-p} |X_{W-p,q,3}| |X_{W-p,q,4}| \\ &= \sum_{q=0}^{W-p} (2q+1) \left( \begin{smallmatrix} W - p - q + 2 \\ W - p \\ 2 \end{smallmatrix} \right) + \\ &\sum_{q=0}^{W-p+3} (2q+1) \left( \begin{smallmatrix} W - p - q + 1 \\ 2 \end{smallmatrix} \right) \\ &= \sum_{q=0}^{W-p+3} \left( \begin{smallmatrix} q \\ 1 \\ 1 \end{smallmatrix} \right) \left( \begin{smallmatrix} W - p + 3 - q \\ 2 \end{smallmatrix} \right) + \\ &2 \sum_{q=0}^{W-p+2} \left( \begin{smallmatrix} q \\ 1 \\ 1 \end{smallmatrix} \right) \left( \begin{smallmatrix} W - p + 3 - q \\ 2 \end{smallmatrix} \right) + \\ &\sum_{q=0}^{W-p+2} \left( \begin{smallmatrix} q \\ 1 \\ 1 \end{smallmatrix} \right) \left( \begin{smallmatrix} W - p + 3 - q \\ 2 \end{smallmatrix} \right) + \\ &\sum_{q=0}^{W-p+2} \left( \begin{smallmatrix} q \\ 1 \\ 1 \end{smallmatrix} \right) \left( \begin{smallmatrix} W - p + 3 - q \\ 2 \end{smallmatrix} \right) + \\ &\sum_{q=0}^{W-p+1} \left( \begin{smallmatrix} q \\ 1 \\ 1 \end{smallmatrix} \right) \left( \begin{smallmatrix} W - p + 2 - q \\ 2 \end{smallmatrix} \right) + \\ &= \left( \begin{smallmatrix} W - p + 1 \\ 4 \end{smallmatrix} \right) + 2 \left( \begin{smallmatrix} W - p + 1 - q \\ 4 \end{smallmatrix} \right) + \\ \end{aligned}$$

Note that the identity

$$\sum_{k=0}^{i} \binom{l-k}{m} \binom{q+k}{n} = \binom{l+q+1}{m+n+1},$$

where  $n \ge q \ge 0$  and  $l, m, n, q \in \mathbb{Z}^+ \cup \{0\}$ , is an extension of *Vandermonde's convolution* [10]. As a result,

$$|X_W| = |\{\vec{n} | \max\{n_1, n_2\} + \max\{n_3, n_5\} + \max\{n_4, n_6\} \le W\}|$$
  
= 
$$\sum_{p=0}^{W} |X_{W,p,1}| |X_{W,p,2}|$$
  
= 
$$\binom{W+6}{6} + 3\binom{W+5}{6} + 3\binom{W+4}{6} + \binom{W+3}{6}$$
  
= 
$$\frac{1}{6!} \vec{X} \cdot \vec{\omega},$$

where  $\vec{X} = (8, 96, 500, 1440, 2372, 2064, 720)$  and  $\vec{\omega} = (W^6, W^5, W^4, W^3, W^2, W, 1).$ 

(2) Applying similar techniques, we get

$$|U_W| = \left\lfloor \frac{1}{6!} \vec{U} \cdot \vec{\omega} \right\rfloor,$$

where  $\vec{U} = (10, 120, 595, 1560, 2320, 1920, 720).$ 

Π

**Theorem 4** (1)  $|U_W|/|X_W|$  is a strictly increasing function of  $W, W \ge 1$ ; (2)  $\lim_{W\to\infty} |U_W|/|X_W| = 1.25$ .

|    | Universal S. M.  | XC4000-type S. M. | Capacity ratio |
|----|------------------|-------------------|----------------|
| W  | $ U_W $          | $X_W$             | $ U_W / X_W $  |
| 1  | 10               | 10                | 1.000          |
| 2  | 56               | 52                | 1.077          |
| 3  | 214              | 190               | 1.126          |
| 5  | 1,620            | 1,372             | 1.181          |
| 10 | 41,336           | 33,748            | 1.225          |
| 20 | $1,\!573,\!121$  | 1,266,265         | 1.242          |
| 30 | $14,\!905,\!856$ | $11,\!959,\!552$  | 1.246          |
| 40 | $76,\!215,\!041$ | $61,\!075,\!609$  | 1.248          |

Table 2: Routing-capacity comparison of the universal and the XC4000-type switch modules.

Hence, a universal switch module has up to 25% larger routing capacity than the XC4000-type one of the same size. For current commercially available FPGA's, the sizes of switch modules are usually small, say  $W \leq 40$ . Thus the ratios for these small W's are of particular interest; Table 2 lists their corresponding routing-capacity ratios. It shows that the universal switch modules have about 22.5%, 24.2%, and 24.6% larger routing capacities than the XC4000-type ones for W = 10, 20, and 30, respectively.

#### 6 Experimental Results

To explore the effects of switch-module architectures on routing, we first modified the code of the CGE router [15] to consider various switch-module architectures, and then tested the area performance of the router based on the benchmark circuits used in [15]. The connection-module switches were automatically determined by the CGE package once an  $F_C$ value was specified. The switch-module architectures used were the universal, the Xilinx XC4000-type [11, 23], and the anti-symmetric [15] ones. The flexibilities of these switch modules are all three ( $F_S = 3$ ); thus each of them contains 6W switches. Note that the universal switch modules used in this experiments were constructed by Algorithm Symmetric\_Switch-Modules.

The quality of a switch module was evaluated by the area performance of the CGE detailed router. Table 3 shows the results. For the results listed in this table, we first determined the minimum number of tracks W and then the smallest connection-module flexibility  $F_C$  required for 100% routing completion for each circuit, using the three kinds of switch modules. We then obtained the minimum W's needed for 100% routing completion for each circuit using the three kinds of switch modules based on the previously determined  $F_C$ . The results based on this "minimal"  $F_C$  are then reported in the table. Note that, for each circuit, the detailed-routing results associated with different kinds of switch modules are all based on same global routes. Our results show that, among the three kinds of switch modules, the universal switch modules needed the minimum W's and  $F_C$ 's for 100% routing completion for all of the five circuits. Figure 12 shows the detailed-routing solution for the circuit BNRE with the parameters W = 12 and  $F_C = 12$ , using the symmetric switch module  $(F_S = 3)$ .

Our experimental results show that, among the three kinds of switch modules, the universal switch modules usually achieve the best area performance, and the XC4000-type ones often have the worst performance. Though not presented in Table 3, the results based on various  $F_C$ 's are highly consistent with this phenomenon. Note that the architectures of the universal and the anti-symmetric switch modules are alike (see Figures 8 and 10); however, as mentioned earlier, the antisymmetric ones are still not universal. This explains why the experimental performance of the anti-symmetric switch modules is worse than but close to that of the universal ones.

|         |       | # Tracks needed for CGE detailed routing |             |                |  |
|---------|-------|------------------------------------------|-------------|----------------|--|
| Circuit | $F_C$ | Universal                                | XC4000-type | Anti-symmetric |  |
| BUSC    | 9     | 10                                       | 11          | 10             |  |
| DMA     | 10    | 11                                       | 14          | 11             |  |
| DFSM    | 10    | 11                                       | 15          | 11             |  |
| BNRE    | 12    | 12                                       | 14          | 14             |  |
| Z03     | 14    | 14                                       | 15          | 14             |  |
| Total   | -     | 58                                       | 69          | 60             |  |

Table 3: Number of tracks required for 100% CGE detailed routing using the three types of switch modules  $(F_S = 3)$ .



Figure 11: Relationship of the feasible sets for the three kinds of switch modules.

## 7 Concluding Remarks

We have presented a class of universal switch modules and shown theoretically and experimentally that they have better performance in routing, compared with the two kinds of wellknown switch modules used in [11, 23] and [15]. The feasible sets of the three kinds of switch modules discussed in the paper bear the relationship illustrated in Figure 11<sup>5</sup>. Experiments with the three kinds of switch modules have shown that switch modules with larger routing capacities often result in better routing solutions. Our work paves a scientific foundation for the switch modules for FPGA design and for the exploration of the effects of switch-module architectures on FPGA routing. Finally, our research also provides a theoretical insight into the important observation by Rose and Brown [15] that  $F_S =$ 3 combined with high  $F_C$  is often sufficient to achieve high routability.

### 8 Acknowledgments

We would like to thank Professor Stephen Brown and the authors of [4] for providing us with the CGE package.

#### References

- M. Alexander and G. Robins, "New performance-driven FPGA routing algorithms," Proc. DAC, pp. 562-567, 1995.
- [2] N. Bhat and D. Hill, "Routable technology mapping for LUT FPGAs," Proc. ICCD, pp. 95-98, 1992.
- [3] S. D. Brown, R. J. Francis, J. Rose, and Z. G. Vranesic, *Field-Programmable Gate Arrays*, Kluwer Academic Pub., Boston, 1992.
- [4] S. Brown, J. Rose, and Z. G. Vranesic, "A detailed router for field-programmable gate arrays," *IEEE Trans. CAD*, pp. 620-627, 1992.
- [5] S. Brown, J. Rose, and Z. G. Vranesic, "A stochastic model to predict the routability of field-programmable gate arrays," *IEEE Trans. CAD*, pp. 1827–1838, 1993.
- [6] Y.-W. Chang, D. F. Wong, and C. K. Wong, "FPGA global routing based on a new congestion metric," *Proc. ICCD*, pp. 372-378, 1995.

<sup>5</sup>Both subsets P and Q in Figure 11 are non-empty. For instance, for W = 3, (2, 1, 1, 1, 0, 0) is routable on the anti-symmetric switch module (see Figure 3(a)), but not on the XC4000-type one (see Figure 3(d)); on the contrary, (2, 2, 1, 0, 1, 0) is routable on the XC4000-type switch module, but not on the anti-symmetric one. However, both RRV's are routable on the universal switch module (see Figure 8(c)).



Figure 12: The detailed routing solution for the circuit BNRE with the parameters W = 12,  $F_S = 3$ , and  $F_C = 12$  using the symmetrical universal switch module.

- [7] Y.-W. Chang, D. F. Wong, and C. K. Wong, "Design and analysis of FPGA/FPIC switch modules," Proc. ICCD, pp. 394-401, 1995.
- [8] C. -D. Chen, Y. -S. Lee, C. -H. Wu, and Y. -L. Lin, "TRACER-fpga: a router for RAM-based FPGA's," *IEEE Trans. CAD*, March, 1995.
- [9] K. Fujiyoshi, Y. Kajitani, and H. Niitsu, "Design of optimum totally-perfect connection-blocks of FPGA," *Proc. ISCAS*, pp. 221-224, 1994.
- [10] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete Mathematics, Addison-Wesley Pub. Co., 1989.
- [11] H.C. Hsieh, et al., "Third-generation architecture boosts speed and density of field-programmable gate arrays," Proc. CICC, pp. 31.2.1-31.2.7, 1990.
- [12] G. Lemieux and S. Brown, "A detailed routing algorithm for allocating wire segments in field-programmable gate arrays," *Proc. Physical Design Workshop*, 1993.
- [13] C. Lin, M. Marek-Sadowska, and D. Gatlin, "Universal logic gate for FPGA design," Proc. ICCAD, pp. 164-168, 1994.
- [14] M. Palczewski, "Plane parallel A\* maze router and it applications," Proc. DAC, pp. 691-697, 1992.
- [15] J. Rose and S. Brown, "Flexibility of interconnection structures for field-programmable gate arrays," *IEEE JSSC*, pp 277-282, 1991.

- [16] J. Rose, R. Francis, D. Lewis, and P. Chow, "Architecture of programmable gate arrays: The effect of logic block functionality on area efficiency," *IEEE JSSC*, pp. 1217-1225, 1990.
- [17] Y. Sun, T. -C, Wang, C. K. Wong, and C. L. Liu, "Routing for symmetric FPGAs and FPICs," Proc. ICCAD, pp. 486-490, 1993.
- [18] S. Thakur and D. F. Wong, "On designing ULM-based FPGA logic modules," Proc. FPGA '95, pp. 3-9, 1995.
- [19] S. Trimberger (editor), Field-Programmable Gate Array Technology, Kluwer Academic Pub., 1994.
- [20] S. Trimberger and M. Chene, "Placement-based partitioning for lookup-table-based FPGA's," Proc. ICCD, pp. 91-94, 1992.
- [21] Y.-L. Wu and D. Chang, "On the NP-completeness of regular 2-D FPGA routing architectures and a novel solution," *Proc. ICCAD*, pp. 362-366, 1994.
- [22] Y.-L. Wu, S. Tsukiyama, and M. Marek-Sadowska, "Computational complexity of 2-D FPGA routing for arbitrary switch box topologies," Proc. FPGA'94, 1994.
- [23] Xilinx Inc., The Programmable Logic Data Book, 1994.
- [24] K. Zhu, D.F. Wong and Y.-W. Chang, "Switch module design with application to two-dimensional segmentation design," *Proc. ICCAD*, pp. 481-486, 1993.