# Outline

- 1. Fundamental limitation of EDA software
  - 2. Realization of VHDL operator
  - 3. Realization of VHDL data type
  - 4. VHDL synthesis flow
  - 5. Timing consideration

| RTL Hardware Design | Chapter 6 | 1 | RTL Hardware Design | Chapter 6 | 2 |
|---------------------|-----------|---|---------------------|-----------|---|
|                     |           |   |                     |           |   |

3

5

### 1. Fundamental limitation of EDA software

Synthesis Of VHDL Code

- Can "C-to-hardware" be done?
- EDA tools:
  - Core: optimization algorithms
  - Shell: wrapping
- What does theoretical computer science say?
  - Computability
  - Computation complexity

RTL Hardware Design Chapter 6

# Computability

- A problem is computable if an algorithm exists.
- E.g., "halting problem":
  - can we develop a program that takes any program and its input, and determines whether the computation of that program will eventually halt?
- any attempt to examine the "meaning" of a program is uncomputable

RTL Hardware Design Chapter 6

# Computation complexity

- How fast an algorithm can run (or how good an algorithm is)?
- "Interferences" in measuring execution time:
  - types of CPU, speed of CPU, compiler etc.

RTL Hardware Design

Chapter 6

# **Big-O** notation

- *f*(*n*) is *O*(*g*(*n*)):
   if *n*<sub>0</sub> and *c* can be found to satisfy:
   *f*(*n*) < *cg*(*n*) for any *n*, *n* > *n*<sub>0</sub>
- g(n) is simple function: 1, n,  $log_2n$ ,  $n^2$ ,  $n^3$ ,  $2^n$
- Following are O(n<sup>2</sup>):
  - $0.1n^2$
  - $n^2 + 5n + 9$
  - $500n^2 + 1000000$

Chapter 6

RTL Hardware Design

1

# Interpretation of Big-O

- Filter out the "interference": constants and less important terms
- n is the input size of an algorithm
- The "scaling factor" of an algorithm: What happens if the input size increases

|   | ⊏.g.,                         |  |
|---|-------------------------------|--|
|   | <b>Big-</b> <i>O</i> function |  |
| 2 | $n \log_2 n$ $n^2$            |  |

| input size |              |             | Big-(         | ) function   | 1      |              |
|------------|--------------|-------------|---------------|--------------|--------|--------------|
| n          | n            | $\log_2 n$  | $n\log_2 n$   | $n^2$        | $n^3$  | $2^{n}$      |
| 2          | $2 \mu s$    | $1 \ \mu s$ | 2 µs          | 4 μs         | 8 μs   | $4 \ \mu s$  |
| 4          | $4 \ \mu s$  | $2 \mu s$   | $8 \ \mu s$   | $16 \ \mu s$ | 64 µs  | $16 \ \mu s$ |
| 8          | 8 µs         | $3 \mu s$   | $24 \ \mu s$  | 64 µs        | 512 µs | 256 µs       |
| 16         | $16 \ \mu s$ | $4 \ \mu s$ | $64 \ \mu s$  | 256 µs       | 4 ms   | 66 ms        |
| 32         | 32 µs        | $5 \mu s$   | $160 \ \mu s$ | 1 ms         | 33 ms  | 71 min       |
| 48         | 48 µs        | 5.5 µs      | 268 µs        | 2 ms         | 111 ms | 9 year       |
| 64         | 64 µs        | 6 µs        | 384 µs        | 4 ms         | 262 ms | 600,000 year |

| RTL Hardware Design | Chapter 6 | 7 | RTL Hardware Design | Chapter 6 | 8 |
|---------------------|-----------|---|---------------------|-----------|---|
|                     |           |   |                     |           |   |

- Intractable problems:
  - algorithms with  $O(2^n)$
  - Not realistic for a larger n
  - Frequently tractable algorithms for suboptimal solution exist
- Many problems encountered in synthesis are intractable

| RTL Hardware Design | Chapter 6 |
|---------------------|-----------|

· What is the fuss about:

- "hardware-software" co-design?- SystemC, HardwareC, SpecC etc.?

### Theoretical limitation

- Synthesis software does not know your intention
- Synthesis software cannot obtain the optimal solution
- Synthesis should be treated as transformation and a "local search" in the "design space"
- Good VHDL code provides a good starting point for the local search

Chapter 6

10

12

2. Realization of VHDL operator

Logic operator
 Simple, direct mapping

RTL Hardware Design

- Relational operator
  - =, /= fast, simple implementation exists
    ->, < etc: more complex implementation, larger delay</li>
- Addition operator

RTL Hardware Design

· Other arith operators: support varies

Chapter 6

| RTL Hardware Design |
|---------------------|
|---------------------|

Chapter 6

11

| <ul> <li>Operator with two constant operands:         <ul> <li>Simplified in preprocessing</li> <li>No hardware inferred</li> <li>Good for documentation</li> <li>E.g.,</li> </ul> </li> </ul> |      |  |  |  |  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|--|--|--|--|
| <pre>constant OFFSET: integer := 8;<br/>signal boundary: unsigned(8 downto 0);<br/>signal overflow: std_logic;</pre>                                                                           |      |  |  |  |  |
| <pre> overflow &lt;= '1' when boundary &gt; (2**0FFSET-1)</pre>                                                                                                                                | else |  |  |  |  |
| RTL Hardware Design Chapter 6                                                                                                                                                                  | 13   |  |  |  |  |



#### An example 0.55 um standard-cell CMOS implementation

| width    |            |     |       | VE    | IDL of  | perator |        |       |       |     |
|----------|------------|-----|-------|-------|---------|---------|--------|-------|-------|-----|
|          | nand       | xor | $>_a$ | $>_d$ | =       | $+1_a$  | $+1_d$ | $+_a$ | $+_d$ | mux |
|          |            |     |       | are   | a (gate | count   | )      |       |       |     |
| 8        | 8          | 22  | 25    | 68    | 26      | 27      | 33     | 51    | 118   | 21  |
| 16       | 16         | 44  | 52    | 102   | 51      | 55      | 73     | 101   | 265   | 42  |
| 32       | 32         | 85  | 105   | 211   | 102     | 113     | 153    | 203   | 437   | 85  |
| 64       | 64         | 171 | 212   | 398   | 204     | 227     | 313    | 405   | 755   | 171 |
|          |            |     |       |       | delay   | (ns)    |        |       |       |     |
| 8        | 0.1        | 0.4 | 4.0   | 1.9   | 1.0     | 2.4     | 1.5    | 4.2   | 3.2   | 0.3 |
| 16       | 0.1        | 0.4 | 8.6   | 3.7   | 1.7     | 5.5     | 3.3    | 8.2   | 5.5   | 0.3 |
| 32       | 0.1        | 0.4 | 17.6  | 6.7   | 1.8     | 11.6    | 7.5    | 16.2  | 11.1  | 0.3 |
| 64       | 0.1        | 0.4 | 35.7  | 14.3  | 2.2     | 24.0    | 15.7   | 32.2  | 22.9  | 0.3 |
| RTL Hard | ware Desig | ŋn  |       | Ch    | apter 6 |         |        |       |       | 15  |

# 3. Realization of VHDL data type

Chapter 6

16

18

- Use and synthesis of 'Z'
- Use of '-'

RTL Hardware Design

- Use and synthesis of 'Z'
- Tri-state buffer:
  - Output with "high-impedance"
  - Not a value in Boolean algebra
  - Need special output circuitry (tri-state buffer)



Major application:

- Bi-directional I/O pins
- Tri-state bus
- VHDL description: y <= 'Z' when oe='1' else a\_in;
- 'Z' cannot be used as input or manipulated f <= 'Z' and a; y <= data\_a when in\_bus='Z' else</li>

data\_b;

• Separate tri-state buffer from regular code:



Bi-directional i/o pins



RTL Hardware Design

Chapter 6

20

entity bi\_demo is port(bi: inout std\_logic; . . . begin sig\_out <= output\_expression; . . . <= expression\_with\_sig\_in; . . . bi <= sig\_out when dir='1' else 'Z'; sig\_in <= bi; . . .



sig\_in <= bi when dir='0' else 'Z';</pre>

| RTL Hardware Design | Chapter 6 | 21 | RTL Hardware Design | Chapter 6 | 22 |
|---------------------|-----------|----|---------------------|-----------|----|
|                     |           |    |                     |           |    |







• Problem with tri-state bus

- Difficult to optimize, verify and test

 Somewhat difficult to design: "parking", "fighting"

• Alternative to tri-state bus: mux

| with src_select | select  |         |          |
|-----------------|---------|---------|----------|
| data_bus <= i   | i0 when | "00",   |          |
| i               | i1 when | "01",   |          |
| i               | i2 when | "10",   |          |
| i               | i3 when | others; | <br>"11" |
|                 |         |         |          |

Chapter 6

RTL Hardware Design

25

### Use of '-'

• In conventional logic design

- '-' as input value: shorthand to make table compact
 - E.a..

| L.g.,             |                                         |           |       |        |
|-------------------|-----------------------------------------|-----------|-------|--------|
| inp               | out outp                                | ut        | input | output |
| re                | eq code                                 | e         | req   | code   |
| 1.0               | 0 0 10                                  |           | 1     | 10     |
| 1.0               | 0.1 10                                  |           | 01-   | 01     |
| 1 1               | 0 10                                    |           | 001   | 00     |
| 11                | 1 10                                    |           | 000   | 00     |
| 0.1               | 0 01                                    |           |       |        |
| 0.1               | 1 01                                    |           |       |        |
| 0.0               | 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |           |       |        |
| 0.0               | 00 00                                   |           |       |        |
| L Hardware Design |                                         | Chapter 6 |       |        |

| <ul> <li>- '-' as output value: help simplification</li> <li>- E.g.,</li> <li>'-' assigned to 1: a + b</li> <li>'-' assigned to 0: a'b + ab'</li> </ul> |                                                        | plification | Use '-' in VHDL<br>• As input value (against our intuition): |                                                           |     |  |  |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------|-------------|--------------------------------------------------------------|-----------------------------------------------------------|-----|--|--|--|
| input<br><i>a b</i><br>0 0<br>0 1<br>1 0<br>1 1                                                                                                         | output           f           0           1           - |             | "01"                                                         | when req="1" el<br>when req="01-" el<br>when req="001" el | lse |  |  |  |
| RTL Hardware Design                                                                                                                                     | Chapter 6                                              | 27          | RTL Hardware Design                                          | Chapter 6                                                 | 28  |  |  |  |

| • | Fix | #1 | : |
|---|-----|----|---|
|   |     |    |   |

```
y <= "10" when req(3)='1' else
"01" when req(3 downto 2)="01" else
"00" when req(3 downto 1)="001" else
"00";
```

#### • Fix #2:

```
use ieee.numeric_std.all;
. . .
y <= "10" when std_match(req,"1--") else
    "01" when std_match(req,"01-") else
    "00" when std_match(req,"001") else
    "00";
```

| RTL Hardware Design | Chapter 6 | 29 |
|---------------------|-----------|----|
|                     |           |    |

```
Wrong:

with req select

y <= "10" when "1--",

"01" when "01-",

"00" when "001",

"00" when others;
Fix:

with req select

y <= "10" when "100"|"101"|"110"|"111",

"00" when "010"|"011",

"00" when others;
```

Chapter 6

RTL Hardware Design



- · '-' as an output value in VHDL
- · May work with some software

```
sel <= a & b;
with sel select
        '0' when "00",
   y <=
        '1' when "01",
        '1' when "10",
        '-' when others;
```

| RTL Hardware Design | Chapter 6 |
|---------------------|-----------|

## 4. VHDL Synthesis Flow

- · Synthesis:
  - Realize VHDL code using logic cells from the device's library
  - a refinement process
- Main steps:

31

33

- RT level synthesis
- Logic synthesis
- Technology mapping

| RTL Hardware Design | Chapter 6 | 32 |
|---------------------|-----------|----|
|                     |           |    |



# **RT** level synthesis

- Realize VHDL code using RT-level components
- · Somewhat like the derivation of the conceptual diagram
- · Limited optimization

RTL Hardware Design

- · Generated netlist includes
  - "regular" logic: e.g., adder, comparator
  - "random" logic: e.g., truth table description Chapter 6

## Module generator

- "regular" logic can be replaced by predesigned module
  - Pre-designed module is more efficient
  - Module can be generated in different levels of detail

Chapter 6

- Reduce the processing time

# Logic Synthesis

- · Realize the circuit with the optimal number of "generic" gate level components
- · Process the "random" logic
- · Two categories:
  - Two-level synthesis: sum-of-product format

Chapter 6

- Multi-level synthesis

35

RTL Hardware Design

36

• E.g.,



# Technology mapping

- Map "generic" gates to "device-dependent" logic cells
- The technology library is provided by the vendors who manufactured (in FPGA) or will manufacture (in ASIC) the device

| RTL Hardware Design | Chapter 6 | 38 |
|---------------------|-----------|----|
|                     |           |    |

### E.g., mapping in standard-cell ASIC





# E.g., mapping in FPGA

• With 5-input LUT (Look-Up-Table) cells



### Effective use of synthesis software

- Logic operators: software can do a good job
- Relational/Arith operators: manual intervention needed
- "layout" and "routing structure":

RTL Hardware Design

- Silicon chip is 2-dimensional square
- "rectangular" or "tree-shaped" circuit is easier to optimize

Chapter 6



# 5. Timing consideration

Chapter 6

44

- Propagation delay
- Synthesis with timing constraint
- Hazards

RTL Hardware Design

• Delay-sensitive design

## Propagation delay

- Delay: time required to propagate a signal from an input port to a output port
- Cell level delay: most accurate
- Simplified model:

 $delay = d_{intrinsic} + r * C_{load}$ 

45

 The impact of wire becomes more dominant

RTL Hardware Design Chapter 6



# System delay

- The longest path (critical path) in the system
- The worst input to output delay



• "False path" may exists:



- RT level delay estimation:
  - Difficult if the design is mainly "random" logic
  - Critical path can be identified if many complex operators (such adder) are used in the design.

| RTL Hardware Design | Chapter 6 | 49 |
|---------------------|-----------|----|

## Synthesis with timing constraint

- Multi-level synthesis is flexible
- It is possible to reduce by delay by adding extra logic
- Synthesis with timing constraint
  1. Obtain the minimal-area implementation
  - 2. Identify the critical path
  - 3. Reduce the delay by adding extra logic
  - 4. Repeat 2 & 3 until meeting the constraint

| RTL Hardware Design | Chapter 6 | 50 |
|---------------------|-----------|----|
|                     |           |    |

• E.g.,



• Improvement in "architectural" level design (better VHDL code to start with)



53

#### Area-delay trade-off curve



# **Timing Hazards**

- Propagation delay: time to obtain a stable output
- Hazards: the fluctuation occurring during the transient period
  - Static hazard: glitch when the signal should be stable
  - Dynamic hazard: a glitch in transition

RTL Hardware Design

• Due to the multiple converging paths of an output port

Chapter 6

• E.g., static-hazard (sh=ab'+bc; a=c=1)



• E.g., dynamic hazard (a=c=d=1)



### Dealing with hazards

· Some hazards can be eliminated in theory

• E.g.,



RTL Hardware Design Chapter 6

- Delay sensitive design and its danger
- · Boolean algebra
  - the theoretical model for digital design and most algorithms used in synthesis process - algebra deals with the stabilized signals
- · Delay-sensitive design
  - Depend on the transient property (and delay) of the circuit

Chapter 6

- Difficult to design and analyze
- RTL Hardware Design

59

- · Eliminating glitches is very difficult in reality, and almost impossible for synthesis
- Multiple inputs can change simultaneously (e.g., 1111=>0000 in a counter)
- How to deal with it? Ignore glitches in the transient period and retrieve the data after the signal is stabilized

Chapter 6

- E.g., hazard elimination circuit: ac term is not needed
- E.g., edge detection circuit (pulse=a a')



Chapter 6

RTL Hardware Design

60

- What's can go wrong:
  - E.g., pulse <= a **and** (not a);
  - During logic synthesis, the logic expressions will be rearranged and optimized.
  - During technology mapping, generic gates will be re-mapped
  - During placement & routing, wire delays may change
  - It is bad for testing verification
- If delay-sensitive design is really needed, it should be done manually, not by synthesis

RTL Hardware Design Chapter 6 61