## Unit 5E: Channel, Clock, and Power/Ground Routing

- Course contents
- Channel routing
- Clock routing
- Power/ground routing
- Readings
- Chapters 9.3 and 9.4


Detailed routing

switchbox routing

## Order of Routing Regions and L-Channels

(a) No conflicts in case of routing in the order of 1, 2, and 3.
(b) No ordering is possible to avoid conflicts.
(c) The situation of (b) can be resolved by using Lchannels.
(d) An L-channel can be decomposed into a channel and a switchbox.

(a)

(b)

(c)

(d)

## Routing Considerations

- Number of terminals (two-terminal vs. multi-terminal nets)
- Net widths (power and ground vs. signal nets)
- Via restrictions (stacked vs. conventional vias)
- Boundary types (regular vs. irregular)
- Number of layers (two vs. three, more layers?)
- Net types (critical vs. non-critical nets)


## Routing Models

- Grid-based model:
- A grid is super-imposed on the routing region.
- Wires follow paths along the grid lines.
- Pitch: distance between two grid lines.
- Gridless model:
- Any model that does not follow this "gridded" approach.

grid-based

gridless


## Models for Multi-Layer Routing

- Unreserved layer model: Any net segment is allowed to be placed in any layer.
- Reserved layer model: Certain type of segments are restricted to particular layer(s).
- Two-layer: HV (horizontal-Vertical), VH
- Three-layer: HVH, VHV


$$
3 \text { types of 3-layer models }
$$

## Terminology for Channel Routing



- Local density at column $i, d(i)$ : total \# of nets that crosses column $i$.
- Channel density: maximum local density
- \# of horizontal tracks required $\geq$ channel density.


## Channel Routing Problem

- Assignments of horizontal segments of nets to tracks.
- Assignments of vertical segments to connect.
- horizontal segments of the same net in different tracks, and
- the terminals of the net to horizontal segments of the net.
- Horizontal and vertical constraints must not be violated.
- Horizontal constraints between two nets: the horizontal span of two nets overlaps each other.
- Vertical constraints between two nets: there exists a column such that the terminal on top of the column belongs to one net and the terminal on bottom of the column belongs to another net.
- Objective: Channel height is minimized (i.e., channel area is minimized).


## Horizontal Constraint Graph (HCG)

- HCG $G=(V, E)$ is undirected graph where
- $V=\left\{v_{i} \mid v_{i}\right.$ represents a net $\left.n_{i}\right\}$
- $E=\left\{\left(v_{i}, v_{j}\right) \mid\right.$ a horizontal constraint exists between $n_{i}$ and $n_{j}$.
- For graph $G$ : vertices $\Leftrightarrow$ nets; edge $(i, j) \Leftrightarrow$ net $i$ overlaps net $j$.


A routing problem and its HCG.


## Vertical Constraint Graph (VCG)

- VCG $G=(V, E)$ is directed graph where
- $V=\left\{v_{i} \mid v_{i}\right.$ represents a net $\left.n_{i}\right\}$
- $E=\left\{\left(v_{i}, v_{j}\right) \mid\right.$ a vertical constraint exists between $n_{i}$ and $\left.n_{j}\right\}$.
- For graph $G$ : vertices $\Leftrightarrow$ nets; edge $i \rightarrow j \Leftrightarrow$ net $i$ must be above net $j$.


A routing problem and its VCG.


## 2-L Channel Routing: Basic Left-Edge Algorithm

- Hashimoto \& Stevens, "Wire routing by optimizing channel assignment within large apertures," DAC-71.
- No vertical constraint.
- HV-layer model is used.
- Doglegs are not allowed.
- Treat each net as an interval.
- Intervals are sorted according to their left-end $x$ coordinates.
- Intervals (nets) are routed one-by-one according to the order.
- For a net, tracks are scanned from top to bottom, and the first track that can accommodate the net is assigned to the net.
- Optimality: produces a routing solution with the minimum \# of tracks (if no vertical constraint).


## Basic Left-Edge Algorithm

```
Algorithm: Basic_Left-Edge(U, track[]])
\(U\) : set of unassigned intervals (nets) \(I_{1}, \ldots, I_{n}\);
\(I_{j}=\left[s_{j}, e_{j}\right]\) : interval \(j\) with left-end \(x\)-coordinate \(s_{j}\) and right-end \(e_{j}\);
track[]: track to which net \(j\) is assigned.
1 begin
\(2 U \leftarrow\left\{I_{1}, I_{2}, \ldots, I_{n}\right\}\);
\(3 t \leftarrow 0\);
4 while \((U \neq \varnothing)\) do
\(5 \quad t \leftarrow t+1\);
6 watermark \(\leftarrow 0\);
7 while (there is an \(I_{j} \in U\) s.t. \(s_{j}>\) watermark) do
\(8 \quad\) Pick the interval \(I_{j} \in U\) with \(s_{j}>\) watermark,
    nearest watermark;
\(9 \quad \operatorname{track}[] \leftarrow t\);
10 watermark \(\leftarrow e_{j}\);
\(11 U \leftarrow U-\left\{l_{j} ;\right.\);
12 end
```


## Basic Left-Edge Example

- $U=\left\{I_{1}, I_{2}, \ldots, I_{6}\right\} ; I_{1}=[1,3], I_{2}=[2,6], I_{3}=[4,8], I_{4}=[5,10], I_{5}=[7$, 11], $I_{6}=[9,12]$.
- $t=1$ :
- Route $I_{1}$ : watermark $=3$;
- Route $I_{3}$ : watermark = 8;
- Route $I_{6}$ : watermark = 12;
- $t=2$ :
- Route $I_{2}$ : watermark $=6$;
- Route $I_{5}$ : watermark = 11;
- $t=3$ : Route $I_{4}$



## Basic Left-Edge Algorithm

- If there is no vertical constraint, the basic left-edge algorithm is optimal.
- If there is any vertical constraint, the algorithm no longer guarantees optimal solution.

result from basic left-edge algorithm 3 tracks

optimal routing: 2 tracks



## Constrained Left-Edge Algorithm

```
Algorithm: Constrained_Left-Edge( \(U\), track[J])
\(U\) : set of unassigned intervals (nets) \(I_{1}, \ldots, I_{n}\);
\(I_{j}=\left[s_{j}, e_{j}\right]\) : interval \(j\) with left-end \(x\)-coordinate \(s_{j}\) and right-end \(e_{j}\);
track[]: track to which net \(j\) is assigned.
1 begin
\(2 U \leftarrow\left\{I_{1}, I_{2}, \ldots, I_{n}\right\}\);
\(3 t \leftarrow 0\);
4 while ( \(U \neq \varnothing\) ) do
\(5 \quad t \leftarrow t+1\);
6 watermark \(\leftarrow 0\);
7 while (there is an unconstrained \(I_{j} \in U\) s.t. \(s_{j}>\)
    watermark) do
8 Pick the interval \(I_{j} \in U\) that is unconstrained,
        with \(s_{j}>\) watermark, nearest watermark;
\(9 \operatorname{track}[]] \leftarrow t\);
10 watermark \(\leftarrow e_{j}\);
\(11 U \leftarrow U-\left\{I_{j}\right\} ;\)
12 end
```


## Constrained Left-Edge Example

- $I_{1}=[1,3], I_{2}=[1,5], I_{3}=[6,8], I_{4}=[10,11], I_{5}=[2,6], I_{6}=[7$, 9].
- Track 1: Route $I_{1}$ (cannot route $I_{3}$ ); Route $I_{6}$; Route $I_{4}$.
- Track 2: Route $I_{2}$; cannot route $I_{3}$.
- Track 3: Route $I_{5}$.
- Track 4: Route $I_{3}$.


track 3 track 4



## Dogleg Channel Router

- Deutch, "A dogleg channel router," 13rd DAC, 1976.
- Drawback of Left-Edge: cannot handle the cases with constraint cycles.
- Doglegs are used to resolve constraint cycle.

- Drawback of Left-Edge: the entire net is on a single track.
- Doglegs are used to place parts of a net on different tracks to minimize channel height.
- Might incur penalty for additional vias.



## Dogleg Channel Router

- Each multi-terminal net is broken into a set of 2terminal nets.
- Two parameters are used to control routing:
- Range: Determine the \# of consecutive 2-terminal subnets of the same net that can be placed on the same track.
- Routing sequence: Specifies the starting position and the direction of routing along the channel.
- Modified Left-Edge Algorithm is applied to each subnet.



## Restricted vs. Unrestricted Doglegging

- Unrestricted doglegging: Allow a dogleg even at a position where there is no pin.
- Restricted doglegging: Allow a dogleg only at a position where there is a pin belonging to that net.
- The dogleg channel router does not allow unrestricted doglegging.

dogleg channel router will fail!
Solution exists!

restricted doglegging dogleg splits a net into subnets.


## Robust Channel Router

- Yoeli, "A robust channel router," IEEE TCAD, 1991.
- Alternates between top and bottom tracks until the center is reached.
- The working side is called the current side.
- Net weights are used to guide the assignment of segments in a track, which
- favor nets that contribute to the channel density;
- favor nets with terminals at the current side;
- penalize nets whose routing at the current side would cause vertical constraint violations.
- Allows unrestricted doglegs by rip-up and re-route.


## Robust Channel Router

- Select the set of nets for the current side by solving the maximum weighted independent set problem for interval graphs.
- NP-complete for general graphs, but can be solved efficiently for interval graphs using dynamic programming.
- Main ideas:
- The interval for net $i$ is denoted by [ $x_{i_{\text {min }}}, x_{i_{\text {max }}}$ ]; its weight is $w_{i}$.
- Process channel from left to right column; the optimal cost for position $c$ is denoted by total[c];
- A net $n$ with a rightmost terminal at position $c$ is taken into the solution if total $[c-1]<w_{n}+\operatorname{total}\left[x_{n_{\text {min }}}-1\right]$.
- Can apply maze routers to fix local congestion or to postprocess the results. (Why not apply maze routers to channel routing directly??)


## Interval Graphs

- There is a vertex for each interval.
- Vertices corresponding to overlapping intervals are connected by an edge.
- Solving the track assignment problem is equivalent to finding a minimal vertex coloring of the graph.



## Weight Computation



$$
\begin{aligned}
& d(1)=1 \\
& d(2)=2 \\
& d(3)=2 \\
& d(4)=3(\text { nets } 2,3,4) \\
& d(5)=2
\end{aligned}
$$

- Computation of the weight $w_{i}$ for net $i$ :

1. favor nets that contribute to the channel density: add a large $B$ to $w_{i}$.
2. favor nets with current side terminals at column $x$ : add $\mathrm{d}(x)$ to $w_{\mathrm{i}}$.
3. penalize nets whose routing at the current side would cause vertical constraint violations: subtract $K d(x)$ from $w_{i}, K=5 \sim 10$.

- Assume $B=1000$ and $K=5$ in the $1^{\text {st }}$ iteration (top side):
- $\mathrm{w}_{1}=(0)+(1)+(-5$ * 2$)=-9$
- Net 1 does not contribute to the channel density
- One net 1 terminal on the top
- Routing net 1 causes a vertical constraint from net 2 at column 2 whose density is 2


## Weight Computation (cont'd)



$$
\begin{aligned}
& d(1)=1 \\
& d(2)=2 \\
& d(3)=2 \\
& d(4)=3(\text { nets } 2,3,4) \\
& d(5)=2
\end{aligned}
$$

- Computation of the weight $w_{\mathrm{i}}$ for net $i$ :

1. favor nets that contribute to the channel density: add a large $B$ to $w_{i}$.
2. favor nets with current side terminals at column $x$ : add $\mathrm{d}(x)$ to $w_{\mathrm{i}}$.
3. penalize nets whose routing at the current side would cause vertical constraint violations: subtract $K d(x)$ from $w_{i}, K=5 \sim 10$.

- Assume $B=1000$ and $K=5$ in the $1^{\text {st }}$ iteration (top side):
- $\mathrm{w}_{1}=(0)+(1)+(-5$ * 2$)=-9$
- $\mathrm{w}_{2}=(1000)+(2)+(-5$ * 3$)=987$
- $\mathrm{w}_{3}=(1000)+(2+2)+(0)=1004$
- $\mathrm{w}_{4}=(1000)+(3)+(-5$ * 2$)=993$


## Top-Row Net Selection



- $\mathrm{w}_{1}=-9, \mathrm{w}_{2}=987, \mathrm{w}_{3}=1004, \mathrm{w}_{4}=993$.
- A net $n$ with a rightmost terminal at position $c$ is taken into the solution if: total $[c-1]<w_{n}+\operatorname{total}\left[x_{n_{\text {min }}}-1\right]$.

| total[1] $=0$ | selected_net[1] $=0$ |
| :--- | :--- |
| total[2] $=\max (0,0-9)=0$ | selected_net[2] $=0$ |
| total[3] $=0$ | selected_net[3] $=0$ |
| total[4] $=\max \left(0, w_{2}+\right.$ total[1] $)=987$ | selected_net[4] $=2$ |
| total[5] $=\max (987,0+1004,0+993)=1004$ | selected_net[5] $=3$ |

- Select nets backwards from right to left and with no horizontal constraints: Only net 3 is selected for the top row. (Net 2 is not selected since it overlaps with net 3.)


## Bottom-Row Net Selection



- $2^{\text {nd }}$ iteration: bottom-row selection

$$
\begin{aligned}
& -w_{1}=(1000)+(2)+(0)=1002 \\
& -w_{2}=(1000)+(2)+(-5 * 2)=992 \\
& -w_{4}=(1000)+(1)+(-5 * 2)=991
\end{aligned}
$$

| total[1] $=0$ | selected_net[1] $=0$ |
| :--- | :--- |
| total[2] $=\max (0,0+1002)=1002$ | selected_net[2] $=1$ |
| total[3] $=1002$ | selected_net[3] $=0$ |
| total[4] $=\max (1002,0+992)=1002$ | selected_net[4] $=0$ |
| total[5] $=\max (1002,1002+991)=1993$ | selected_net[5] $=4$ |

- Nets 4 and 1are selected for the bottom row.


## Maze Routing + Rip-up \& Re-route



- 3rd iteration
- Routing net 2 in the middle row leads to an infeasible solution.
- Apply maze routing and rip-up and re-route nets 2 and 4 to fix the solution.


## Robust Channel Router

```
robust_router (struct netlist M)
{
sat of int cow;
struct solution S;
int total[channel_width + 1], selected net[channel_width .
int top, height, c, r, i;
top - 1;
height \leftarrow density(N);
for (r & 1; r < height; r ~r + 1) {
        for all "nets i in netlist &"
            u
        total[0] - 0;
        for (c-1;c\geqchannel_width; c\leftarrowc+1) {
            selected_ret [c] - 0;
            total[c] - total[c-1];
            if ("xome net }n\mathrm{ hasat top terminal at position c")
```



```
                total[c] - w w + total [ [x mmin
                selected_net[c] \leftarrow n;
            j
            if "some net }n\mathrm{ has a bottom terminal at position c")
```




```
                selected_net[c] [ m;
            y* if */
        }/* for*/
```

```
cow - D;
```

cow - D;
c - channel_width;
c - channel_width;
while (c>0)
while (c>0)
if {selected_net[c]) {
if {selected_net[c]) {
n - selected_net[c];
n - selected_net[c];
cow - cow U {ri;
cow - cow U {ri;
c\leftarrow, x mma
c\leftarrow, x mma
j
j
Hky
Hky
c-c-1;
c-c-1;
solution \& solution U {row};
top - !top;
M~"N
l/* for */
"apply maze couting to eliminate possible ventoal constraint wiolations"
f

```

\section*{The Clock Routing Problem (CRP)}
- Digital systems
- Synchronous systems: Highly precised clock achieves communication and timing.
- Asynchronous systems: Handshake protocol achieves the timing requirements of the system.
- Clock skew is defined as the difference in the minimum and the maximum arrival time of the clock.

- CRP: Routing clock nets such that
1. clock signals arrive simultaneously
2. clock delay is minimized
- Other issues: total wirelength, power consumption, etc

\section*{Clock Routing Problem}
- Given the routing plane and a set of points \(P=\left\{p_{1}\right.\), \(\left.p_{2}, \ldots, p_{n}\right\}\) within the plane and clock entry point \(p_{0}\) on the boundary of the plane, the Clock Routing Problem (CRP) is to interconnect each \(p_{i} \in P\) such that \(\max _{i, j \in P}|t(0, i)-t(0, j)|\) and \(\max _{i \in P} t(0, i)\) are both minimized.
- Pathlength-based approaches
1. H-tree: Dhar, Franklin, Wang, ICCD-84; Fisher \& Kung, 1982. Geometric matching: Cong, Kahng, Robins, DAC-91.
- RC-delay based approaches:
1. Exact zero skew: Tasy, ICCAD-91.
2. Lagrangian relaxation: Chen, Chang, Wong, DAC-96.

\section*{H-Tree Based Algorithm}
- H-tree: Dhar, Franklin, Wang, "Reduction of clock delays in VLSI structure," ICCD-84.


H-tree over 4 points


H-tree over 16 points

\section*{The Geometric Matching Algorithm}
- Cong, Kahng, Robins, "Matching based models for highperformance clock routing," IEEE TCAD, 1993.
- Clock pins are represented as \(n\) nodes in the clock tree ( \(n=2^{k}\) ).
- Each node is a tree itself with clock entry point being node itself.
- The minimum cost matching on \(n\) points yields \(n / 2\) segments.
- The clock entry point in each subtree of two nodes is the point on the segment such that length of both sides is same.
- Above steps are repeated for each segment.
- Apply \(H\)-flipping to further reduce clock skew (and to handle edges intersection).
- Time complexity: \(\mathrm{O}\left(n^{2} \log n\right)\).


\section*{Elmore Delay: Nonlinear Delay Model}
- Parasitic resistance and capacitance start to dominate delay in deep submicron wires.
- Resistor \(r_{i}\) must charge all downstream capacitors.
- Elmore delay: Delay can be approximated as sum of sections: resistance \(\mathbf{X}\) downstream capacitance.
\[
\delta=\sum_{i=1}^{n}\left(r_{i} \sum_{k=i}^{n} c_{k}\right)=\sum_{i=1}^{n} r(n-i+1) c=\frac{n(n+1)}{2} r c .
\]

- Delay grows as square of wire length.

\section*{Wire Models}
- Lumped circuit approximations for distributed RC lines: \(\pi\) model (most popular), \(T\)-model, \(L\)-model.

- \(\pi\)-model: If no capacitive loads for \(C\) and \(D\),
\(A\) to \(B\) : \(\delta_{A B}=r_{1}\left(c_{1} / 2+c_{2}+c_{3}\right)\);
\(B\) to \(C\) : \(\delta_{B C}=r_{2}\left(c_{2} / 2\right)\);
\(B\) to \(D: \delta_{B D}=r_{3}\left(c_{3} / 2\right)\).


\section*{Example Elmore Delay Computation}
- \(0.18 \mu \mathrm{~m}\) technology.: unit resistance \(\hat{\boldsymbol{r}}=0.075 \Omega / \mu \mathrm{m}\); unit capacitance \(\hat{c}=0.118 \mathrm{fF} / \mu \mathrm{m}\).
- Assume \(C_{C}=2 f F, C_{D}=4 f F\).
\(-\delta_{B C}=r_{B C}\left(c_{B C} / 2+C_{C}\right)=0.075 \times 150(17.7 / 2+2)=120 \mathrm{fs}\)
\(-\delta_{B D}=r_{B D}\left(c_{B D} / 2+C_{D}\right)=0.075 \times 200(23.6 / 2+4)=240\) fs
\(-\delta_{A B}=r_{A B}\left(c_{A B} / 2+C_{B}\right)=0.075 \times 100(11.8 / 2+17.7+2+23.6\) \(+4)=400 \mathrm{fs}\)
- Critical path delay: \(\delta_{A B}+\delta_{B D}=640\) fs.


\section*{Delay Calculation for a Clock Tree}
- Let \(T\) be an RC tree with points \(P=\left\{p_{1}, p_{2}, \ldots, p_{n}\right\}, c_{i}\) the capacitance of \(p_{i}, r_{i}\) the resistance of the edge between \(p_{i}\) and its immediate predecessor.
- The subtree capacitance at node \(i\) is given as \(C_{i}=c_{i}+\sum_{j \in S_{i}} C_{j}\), where \(S_{i}\) is the set of all the immediate successors of \(p_{i}\).
- Let \(\delta(i, j)\) be the path between \(p_{i}\) and \(p_{j}\), excluding \(p_{i}\) and including \(p_{j}\).
- The delay between two nodes \(i\) and \({ }_{j}\) is \(t_{i j}=\sum_{j \in \delta(i, j)} r_{j} C_{j}\),
- \(t_{03}=r_{0}\left(c_{1}+c_{2}+c_{3}+c_{4}+c_{1}{ }^{s}+c_{2}{ }^{s}+c_{3}{ }^{s}\right)+r_{2}\left(c_{2} / 2+c_{3}+c_{4}+\right.\) \(\left.c_{2}{ }^{s}+c_{3}{ }^{s}\right)+r_{4}\left(c_{4} / 2+c_{3}{ }^{s}\right)\).

clock tree

delay model


\section*{Exact Zero Skew Algorithm}
- Tasy, "Exact zero skew algorithm," ICCAD-91.
- To ensure the delay from the tapping point to leaf nodes of subtrees \(T_{1}\) and \(T_{2}\) being equal, it requires that
\[
r_{1}\left(c_{1} / 2+C_{1}\right)+t_{1}=r_{2}\left(c_{2} / 2+C_{2}\right)+t_{2}
\]
- Solving the above equation, we have
\[
x=\frac{\left(t_{2}-t_{1}\right)+\alpha l\left(C_{2}+\frac{\beta l}{2}\right)}{\alpha l\left(\beta l+C_{1}+C_{2}\right)},
\]
where \(\alpha\) and \(\beta\) are the per unit values of resistance and capacitance, \(I\) the length of the interconnecting wire, \(r_{1}=\alpha x I, c_{1}=\) \(\beta x I, r_{2}=\alpha(1-x) I, c_{2}=\beta(1-x) /\).



\section*{Zero-Skew Computation}
- Balance delays: \(r_{1}\left(c_{1} / 2+C_{1}\right)+t_{1}=r_{2}\left(c_{2} / 2+C_{2}\right)+t_{2}\).
- Compute tapping points \(x=\frac{\left(t_{2}-t_{1}\right)+\alpha l\left(C_{2}+\frac{\beta l}{2}\right)}{\alpha l\left(\beta l+C_{1}+C_{2}\right)}, \chi(\beta)\) : per unit values of resistance (capacitance); \(l\) : length of the wire;
\[
r_{1}=\beta x I, c_{1}=\beta x I ; r_{2}=\alpha(1-x) I, c_{2}=\beta(1-x) I .
\]
- If \(x \notin[0,1]\), we need snaking to find the tapping point.
- Exp: \(\alpha=0.1 \Omega /\) unit, \(\beta=0.2 F /\) unit. (Find tapping points \(E\) for \(A\) and \(B, F\) for \(C\) and \(D\), and \(G\) for \(E\) and \(F\).)

(

\section*{Simultaneous Retiming and Clock Skew Scheduling}
- Liu, Papaefthymiou, Friedman: Simultaneous retiming and useful clock skew scheduling can further reduce clock period, DAC-99.
- Case 1
- Zero clock skew: clock period \(\phi=23 \tau\).
- Schedule e \(4 \tau\) earlier than \(f\) : clock period \(\phi=19 \tau\).

case 1


\section*{Retiming and Clock Skew Scheduling}
- Case 1
- Zero clock skew: clock period \(\phi=23 \tau\).
- Schedule e \(4 \tau\) earlier than \(f\) : clock period \(\phi=19 \tau\).
- Case 2
- Zero clock skew: clock period \(\phi=16 \tau\).
- Schedule \(f 1 \tau\) earlier than \(e\) : clock period \(\phi=15 \tau\).
- Case 3: optimal effective clock period? No!! Optimal case?
- Zero clock skew: clock period \(\phi=18 \tau\).
- Schedule e \(4 \tau\) earlier than \(f\) : clock period \(\phi=14 \tau\).

case 1


Chang, Huang, Li, Lin, Liu

case 3

\section*{Power/Ground Routing}
- Are usually laid out entirely on metal layers for smaller parasitics.
- Two steps:
1. Construction of interconnection topology: non-crossing power, ground trees.
2. Determination of wire widths: prevent metal migration, keep voltage (IR) drop small, widen wires for more powerconsuming modules and higher density current ( 1.5 mA per \(\mu\) \(m\) width for AI). (So area metric?)

grids

interdigitated trees```

