In the age of deep submicron design, where 10+ million gates of logic have to fit on a single device running at 250+ MHz, traditional physical-design techniques are not capable of handling these new challenges. The problems with the traditional physical-design techniques can be summarized as
Timing closure is either unachievable or takes too long to finish.
Too many iterations between front end and back end for each design.
Unroutable designs for the target die
As the device geometries shrink to 0.11 micron and beyond, new tools, techniques, and methodologies are needed to
Here in this section, we cover two examples of modern physical-design techniques. The new
 Courtesy of Silicon Perspective, Inc. (A Cadence Company). Portions reprinted with permission.
In the mid-1990s, two important issues were converging to make physical chip designers' lives miserable. First, they were being asked to put more and more on a single chip. The integration challenge really hit home as chip design approached the million-gate
As a result of these two issues, chip physical implementation became a huge problem, especially as geometries shrank to 0.25 microns and below. Logic designers were told that their designs simply couldn't meet timing. Physical designers were dismayed at the lengthy run times of their
The existing methodology promoted an arm's-length working relationship between the front-end logic-design team and the back-end physical-layout team. Each
What did designers do? There seemed to be two choices. One was to resort to overly pessimistic designs with wider than necessary guardband as a safety margin. However, this flew in the face of the demands for higher and higher performance. The other choice was to suffer the pain of 20 or more iterations between synthesis and layout, if they ever brought about a timing closure at all.
What was needed was a tool/methodology combination that effectively bridged the gap between the front-end logic design and the back end, where the tools were falling far short. This new technique needed to provide fast
This section will discuss how First Encounter from Cadence's Silicon Perspective subsidiary can be used to create a silicon virtual prototype, helping designers to reduce iterations and finish large-chip designs much faster.
The silicon virtual prototype is the practical approach to dealing with increased complexity and the challenges of timing closure with deep submicron (DSM) silicon. The silicon virtual prototype creation is the first stage of the back-end design phase, often before all of the front-end design is complete. By creating a full-chip silicon virtual prototype, the design team can immediately validate the physical feasibility of the netlist ”eliminating the back-end design iterations that were required to discover that a chip could not meet timing or some other constraint. Figure 4.9 illustrates the environment for physical prototyping.
The silicon virtual-prototyping stage compresses the physical-feasibility stage down to a few hours so chip designers can view the chip layout once or more each day. Now designers can evaluate many
The creation of the prototype also allows the designers to create realistic timing budgets for all sections of the chip. These timing
In this new methodology, designers may start building their silicon virtual prototype at a very early stage in a chip's development process. For a very large design, the prototype creation can begin when portions of the design are incomplete. Black boxes can be used to estimate those
Figure 4.10 shows the design flow for silicon virtual prototyping. Once the functionality of the chip is fully defined, the first step in this methodology is to perform a quick logic synthesis to create a gate-level netlist. It is assumed that the netlist is functionally clean but that timing is not accurate, so simple wire-load models (WLMs) can be used at this stage. The resulting gate-level netlist plus the timing constraints form the inputs to the creation of the silicon virtual prototype.
The creation of the silicon virtual prototype begins with fundamental full-chip floorplanning activities such as I/O placement, macro placement, and power topology. Major design elements usually are manually placed based on the designer's knowledge of the chip architecture. The remaining elements can be automatically placed for maximum efficiency.
Conventional floorplanning tools start by creating block
For the first pass at floorplanning, a netlist, physical libraries, corresponding synthesis libraries (.lib), top-level constraints, and a technology file (process description) are created and imported. This data can be automatically loaded for succeeding
Conventional approaches to timing-driven layout typically rely on
In contrast, the Amoeba placement engine takes an "implicit" approach to timing control. Throughout the entire floorplanning and placement process, it
The Amoeba engine applies this hierarchical-locality-based approach to a unified floorplanning and placement task. At any given level of a typical hierarchical chip design, intramodule signals account for over 95 percent of all signal
The Amoeba technology only needs to deal with a very drastically reduced number of signal paths at any given stage in the placement process because it applies the physical locality and intramodule/intermodule signal distribution hierarchically and incrementally. This greatly enhances its speed and allows it to more thoroughly explore the possible solution space to develop an optimum solution considering all the factors of timing, area, and power consumption.
The Amoeba technology uses an intelligent-fencing strategy to place circuit cells. Conventional placement approaches confine cells in a design module to a nonoverlapping "fenced" rectangular or rectilinear area. Such rigidity leads to
One of the big benefits of the Amoeba-based approach is that it helps designers make intelligent decisions about allocating timing budgets among different blocks. Designers can easily see if certain
To verify timing closure, designers compare the timing data produced by the prototype against the final
This new methodology
Hierarchical methodologies have been widely adopted in the front-end logical design world. However, designers have hesitated to embrace hierarchical methodologies for physical design because of the challenge of generating accurate timing budgets and pin placements for the blocks. First Encounter provides the intelligence designers need to allocate timing budgets among the blocks and to determine optimal pin placements. Figure 4.11 shows how the full-chip physical prototyping
The silicon virtual prototype is the starting point for creating a physical hierarchy in the design. During the import, all modules are flattened to create the prototype. The standard cells are placed flat at the top; the design is then routed and extracted and the timing is analyzed. This is when the partitioning is implemented to re-create the hierarchy.
The tool creates a directory of data for each partition, including the top level. Each directory contains a netlist, floorplan file, pin assignments, and timing constraints. In addition, the subdirectory for the top
Reaching the optimal block size in large designs often requires two levels of partitioning. The size of these sub-blocks is driven by the capacity of the tools, such as physical synthesis, that seem to perform best on blocks of 100,000 gates. For example, a design of five million gates would be partitioned into 10 blocks of 500,000 gates. Those blocks would then be partitioned into sub-blocks of approximately 100,000 gates.
To be able to perform the second-level partitioning, the constraints that were created as a result of the partitioning must be combined with the multicycle and false-
First Encounter can produce an automatic, optimized assignment of pin locations on chip partitions, using both detailed logical and physical information. This eliminates a
Physical synthesis tools can be used on each block of the hierarchical design. Additionally, block-level place-and-route tools, such as Cadence Silicon Ensemble-PKS or Synopsys' Physical Compiler, are used at the block level. As each block is completed, it is placed back into the silicon virtual prototype to make sure the design is on target for timing, area, and power.
As the design team creates the physical blocks, they are
A key element of chip-level assembly is managing buffers between the design's partitions in order to achieve top-level timing closure. First Encounter works with popular commercial routers to provide a fast, flexible mechanism based on either rules or timing. First Encounter
In-place optimization (IPO) downsizes, upsizes, and
The next step is clock-tree synthesis. First Encounter's clock-tree synthesis option creates a complete clock tree at both the top level and the block level, leveraging the Amoeba-placement algorithm to minimize skew and insertion delay. Even very complex clock structures can be handled.
In hierarchical design flows, most design teams use automated tools to generate clock structures within the chip's partitions. However, top-level clock trees usually are created manually. First Encounter's clock-tree synthesis provides automation at both levels, leveraging the Amoeba
First Encounter's clock-tree synthesis supports gated clocks for use in power-sensitive applications and generates a clock-routing guide for final detailed routing.
Power distribution has become a significant concern in SOC designs. Most designers now routinely over-design their chip power grids in order to prevent IR-drop problems. However, not only does this interfere with signal routing (congestion), but it also increasingly does not guarantee that IR-drop violations will be avoided.
The First Encounter Power Grid Designer option addresses this problem. It enables designers to lay out and analyze the chip's power and ground network early in the physical design cycle, with accurate correlation to the final layout. This detects potential IR-drop problems early in physical design, instead of at the end of the layout cycle. Power Grid Designer
One of the key benefits of developing a silicon virtual prototype is the turnaround speed of each iteration. In a traditional design flow, information about physical feasibility is available only after the completion of place and route followed by final verification. This process typically takes several days. A silicon virtual prototype tool must be able to validate the physical feasibility of a large SOC design in a few hours on a desktop workstation.
 Courtesy of AmmoCore Technology, Inc. Portions reprinted with permission.
Shrinking device size is driving increased gate count in ICs, but the design methodology is not keeping up with the pace. Most design
There are no alternatives to designs over two million gates, but block-based designs have several rigid constraints both in terms of timing as well as physical real estate in a given die.
As shown in Figure 4.12, the block-based design methodology imposes artificial boundary constraints. It is very difficult to meet the critical timing path requirements. While it is easier to meet the block-level timing constraints, it is very difficult to meet chip-level timing requirements. This causes endless iterations to close chip-level timing requirements. In most cases, the chip speed is sacrificed to meet the market window.
The challenge is how to
The new approach, as shown in Figure 4.13, is to break down these rigid boundaries of physical blocks and come up with faster, more fluid solutions. Ammocore Technology, Inc., has come up with a unique approach that allows designers to minimize their floor-planning task, break their design into smaller groups, and process all these groups via parallel processes. This methodology allows designers to gather meaningful data within a few hours without
There are five major steps in the implementation for this new approach. The steps are floorplanning, partitioning, placement, assembly, and verification. Figure 4.14 shows the design flow for this new approach.
In this basic floorplanning step, designers place the hard macros (e.g., RAM, ROM, IP core) and implement power for such macros. There are no requirements for doing month-long floorplan optimizations based on wire-load models. Typically, this can be done within an
The partitioning step not only considers connectivity and timing but also looks at the available spaces in the die. This step breaks the design into manageable pieces and is the key to meet timing and routing requirements in fewer iterations. All these pieces are then implemented using parallel processing, before proceeding with the top-level placement.
Placing all the partitioned pieces at the chip level provides significant insight into the
Assembly happens in two steps, first on the small pieces and then on the chip level. Assembly time is significantly reduced due to the nature of two-step process: Designers can debug any timing issues
Verifying a large design is always tricky. Many design houses struggle to come up with a streamlined process for timing and physical verification before submitting the design to the mask shop. Our approach is to divide the problem into a more manageable size and then process the pieces in a massive parallel fashion. In this way, most of the issues will be resolved before the design is sent for a final verification using traditional tools that takes days, if not weeks.
In summary, traditionally, block-based design methodology is used for large designs, but it imposes many constraints on a designer. Block-based methodology requires several iterations at the block level as well as at the chip level. In many cases, it is very difficult to achieve the performance goals due to the nature of rigid boundaries in the die.
The new approach breaks these boundaries and provides an innovative way to handle design of any gate count. Managing the data and processing it in a massively parallel way reduce debugging time to a ratio of 1:100. The two-step process also